Category: Story

The Spreadsheet Guy in “Spotlight”


Hong Kong Baptist University journalism students interviewing Pulitzer Prize winner Matt Carroll in Hong Kong. Read the The Young Reporter short story here:



Data Journalism Open Lecture

Time: Nov. 2, 2018 (Fri.), 11:00 a.m.-12:20 p.m.

Venue: CVA 506, HKBU

Speaker: Andy Shu

Language: English

DJOP_021118 Final.jpg

Continue reading “Data Journalism Open Lecture”




從第五屆立法會開始,電子投票記錄以XML結構化數據的形式公佈於網站上,供公衆下載。我們通過爬蟲蒐集第六屆立法會議員從2016年11月10日至2018年3月29日做出的電子投票記錄,共27426票紀錄,對於每個議案,議員可能產生五種不同的表決結果——贊成、反對、棄權、缺席或出席(即出席會議,但未投下「贊成」、「反對」或「棄權」中的任何一票;立法會小百科 )。將不同的表決結果數值化後,使用主成分分析法(Principal Component Analysis,以下簡稱為 PCA),計算2萬票紀錄反映出的最大分歧,即主維度(Principal Axis),記為 PA1。同時我們計算每位議員的投票紀錄在 PA1 上的投影值,即主成分(Principal Component),記爲 PC1。PC1 體現議員之間的相對關係,兩位議員得到的分值越接近,則說明他們的投票傾向也越接近。

按照 PC1 分值由小到大排序,就得到一條數據驅動的「政治光譜」(如下圖所示)。在這條「光譜」中間,是從不投票的梁君彥。越靠近梁君彥的人,投票風格就越溫和。越遠離梁君彥的人,投票風格就越激進。而梁君彥的兩邊,按照議員所屬派別塗色,恰恰是建制和泛民兩派人馬,與常識相符。




Continue reading “第六屆香港立法會投票記錄分析(2016-2018)”

Syria’s toxic war on itself

The Middle Eastern nation Syria has been in a state of civil war since last seven years with different groups trying to seize control of the country. The country has become an international battleground where various states and their proxy networks have been continuously clashing with each other. The war has taken the lives of more than 465,000 people so far and displaced more than 12 million, of which 6 million refugees have been dispersed around the world.

About Datasets

A media documentation — the Syrian Archive Dataset is an open source platform that collects, curates, verifies, and preserves visual documentation of human rights violations in Syria. It maintains an extensive video database of all known allegations in which civilians have been reported killed or injured since 2014. Till April 20, 2018, this database includes 4,384 videos which were documented by journalists, citizen reporters and activists.

A recorded death list — the Violations Documentation Center in Syria is one of the largest human rights organisations established in 2011 with staff members and contacts in all governorates and most cities inside Syria.

The complex nature of the war in Syria limits access to open database. And therefore, the data extracted could miss some important information; however, we will be analyzing the situation in Syria with precision by filling some of the gaps with the help of other dataset.

On the morning of April 14, 2018, the US, Britain and France bombarded three government sites in Syria allegedly targeting the chemical weapons facilities. Is it true that Syria has been continuously suffering from its internal turbulence which needs to be intervened by foreign players?

We drew a general picture of the Syria attacks based on the dataset of 329 which were recorded from January 1, 2017 to April 20, 2018.

Part 1 General Picture

Living Hell

In the war-torn country, Aleppo, Damascus, Idlib, Hama and Daraa are the cities documented by both the databases as the locations where most of the violations took place, despite some slight differences on the rankings of these locations.

In the war-torn country, Aleppo, Damascus, Idlib, Hama and Daraa are the cities documented by both the databases as the locations where most of the violations took place, despite some slight differences on the rankings of these locations.

图片 1.png
The media coverage of the locations where most violent incidents happened are highly identical to the locations recorded in the actual death list.

To be specific, the Syrian Archive, which demonstrates media coverage, witnessed most violations in Aleppo (1,920), followed by Idlib (219), Hama (103), Damascus (97) and Homs (39).

The Violations Documentation Centre of Syria, which records the actual registered death list, also presented Aleppo (7,990) as the most violation prone city in Syria, Damascus (6,372) stood tall at second, Idlib (4,434) and Deir Ezzor (2,904), a city which was absent in the media coverage database.

In terms of geographic distribution of the violent incidents in Syria, Aleppo and Idlib are the two cities ranking among the tops in both the documentations and have been the most disputed regions taken up by either rebels or jihadists, thus these are the locations where the Syrian regime and its allies have been concentrating their firepower.

Continue reading “Syria’s toxic war on itself”

Flying in the sky, a report of air crash worldwide


1/2560000 in 2016  VS.  ?  in history

In the past 70 years, Airplane has been an important tool for people to travel long distances. According to IATA annual report In 2016, the major aviation accident rate was 0.39, which was equivalent to only one major accident happen in every 2.56 million flights. This seemingly safe number is built on countless blood and sweat. Step down and turn back a little bit, let’s count the successes and failures in the flying history.

Data source

Data volume

  • 5534


  1. Yearly how many planes crashed? any trend? how many people were on board? how many survived? how many died?
  2. How the distribution of accidents between military and passengers? any insights?
  3. The highest number of crashes by operator and type of aircraft. The relationship between operators and types of airplanes?
  4. Find the airline routes with most accidents and try to find the reasons.
  5. Find any interesting trends/behaviors that we encounter when we analyze the dataset.

History of airplane accidents

Count of accidents by Year


Form the picture, we can see the total accidents trend from low to high before the 1970s. After that, there are some small peaks around 1990 and 2010. But the overall trend after 1990 is gradually going down.

At the beginning of 20 centuries, 1903, Wright brothers invented plane. In 1909, French hold a big flight competition, which threatened the England and other European countries. Even there were many problems with the current planes, the military can’t wait for using it in war. The first time of airplanes’ appearing thus was in Italo-Turkish War. The power of airplane attracted other countries’ military, which leads a huge development in the military aviation industry. From 1914, the first world wartime, airplane mainly used for investigating, transporting, and some peripheral things. At the time of world war II, which is around 1940. Airplanes had widely used in battle. At the same time, World civil aviation organization (IATA) established in Havana, the capital of Cuba in 1945. In 1978, Cater, the president of the USA, signed meaningful a law in the history of American aviation legislation, which is <the airline deregulation act>. The establishment and merger of companies in the US domestic aviation industry, route selection, fare establishment, and even loss-making operations, are basically out of government control and intervention. The number of airplanes grows up fast with the high possibility of air crash occurred. The other reason we consider is that airplane technologies that at that time had weakness and need to improve. With the technologies completed, the amount of air crash will decrease. These situations are obvious after 2000.

Continue reading “Flying in the sky, a report of air crash worldwide”

How China Blockbuster War Movies Capture China’s Nationalism

Summary: China Blockbuster War Movies become more and more popular in the global. The national movie frequently takes on the responsibility of representing the national identity to its citizens or even the world. How does China Blockbuster War Movies Capture China’s Nationalism? What is the opinion of the public? We used Douban as the source to extract data from the film reviews of two movies (“Operation Red Sea” and “Wolf Warrior” ) on the Douban, a popular art consumption and rating website, to figure out the thought of people behind the screen and the future development of Blockbuster War Movies in China. With the help of Python, we made a Wordcloud picture of the frequent words included in the 1,000 pieces of films reviews for each movie. We found that China’s new breed of patriotic hero on the big screen reflects the rise of a superpower and  China’s War Movies is shaping the image of China.

Last month, movie tickets sales in mainland China hit 10.1 billion yuan ($1.6 billion), a box office world record for monthly sales in a single market. No film stood out more than “Operation Red Sea.”, which had grossed almost 2.5 billion yuan in just 13 days by the end of February.


 “Operation Red Sea” is based on the Chinese navy’s March 2015 evacuation of Yemen. Set amid militant unrest in a fictional Middle Eastern country, it tells the stories of a ship’s crew and an assault team as they rescue Chinese citizens and foreign refugees, resolving a potential nuclear crisis along the way.

Similarly, last summer’s blockbuster “Wolf Warrior 2,” ($854 million), tells the stories of a loose-cannon Chinese soldier in an unnamed African country. “Operation Red Sea” tones down the patriotism of last year’s smash hit film ‘Wolf Warrior 2’, playing up gore over glorifying war.

How does China Blockbuster War Movies Capture China’s Nationalism? What are the opinion of the public in China and the globe? Can these films be a step towards making Chinese action films more palatable overseas? We will make some Analysis based on the data.

Continue reading “How China Blockbuster War Movies Capture China’s Nationalism”

National Congress: signs for a clearer sky in Beijing

SummaryIn this article, we collected the data of the air quality in Beijing over the past 5 years.We assumed that the significant changes or improvements in air quality were related to China’s two top Annual Sessions – the Chinese People’s Political Consultative Congress (CPPCC) and the National People’s Congress (NPC).Besides, there were several news and current affairs affected the air quality in Beijing. We will also take a closer look at its air quality at that time to see if there are significant pollutants reductions during those periods.

Four years ago, the Chinese premier, Li Keqiang, said at the National People’s Congress and many more Chinese citizens watching live on state television, “We will resolutely declare war against pollution as we declare war against poverty”.

Since then, cities have cut concentrations of fine particles in the air by 32 percent on average. Just a few months before the premier’s speech, the country has released a national air quality action plan to require all urban areas to reduce concentrations of fine particulate matter pollution by at least 10 percent. As the capital city, Beijing was required to reduce pollution by 25 percent, of which the city set aside a massive $120 billion yuan to achieve this target.

All the statics in this project comes from Online China’s Air Quality , which is a public platform monitoring and analyzing China’s air quality covering statistics from 367 cities with AQI, PM2.5, PM10, S02, N02, O3, CO, temperature, humidity, wind scale, wind direction, satellite cloud picture and so on.

In this project, we mainly focus on Beijing’s air quality including AQI, air pollution category, and main pollutants such as PM2.5 during December 2013 to March 2018.

Continue reading “National Congress: signs for a clearer sky in Beijing”

Li’s family business map and spring layout analysis

Summary: In this article, from the graph of major holding companies and it’s main shareholders under Richard Li Tzar Kai, Victor Li Tzar Kuoi, and Li Ka-shing, we found in what areas are they investing and the complex network of relationships between subsidiaries. By analyzing the annual reports of Victor Li Tzar Kuoi and Richard Li Tzar Kai’s company’s companies, we can determine whether Li Ka-shing made the right decision when he announced his retirement. 


Li Ka-shing, a Hong Kong billionaire who will turn 90 this summer, announced his official retirement in May 2018. In January 2015, “Forbes” magazine announced the ranking of Hong Kong’s richest man-Li Ka-shing’s net assets totaled 33.5 billion U.S. dollars, or 260 billion U.S. dollars, making him the richest man in Hong Kong, only replaced by SF Express’s founder Wang Wei in 2017.

He announced on March 16th that he’ll step down as chairman of CK Hutchison Holding Ltd. and CK asset Holding Ltd., making way for his eldest son, Victor Li Tzar Kuoi. Victor assisted Li Ka-shing to run CK holding Ltd. for many years and his youngest son, Richard Li Tzar Kai mainly invested in communications and media business outside.

Richard Li Tzar Kai

We grabbed the information from Who’s Who and HKEX(The Stock Exchange of Hong Kong Limited) and got the name of major holding companies and main shareholder of companies. after establishing their relationship, we got graph below.

Business Map of Richard Li Tzar Kai

Continue reading “Li’s family business map and spring layout analysis”

Who control the discourse power in 红楼梦?

SummaryDream of the Red Chamber, also called The Story of the Stone, composed by Cao Xueqin, is one of China’s four great classical novels. Long considered a masterpiece of Chinese literature. The novel is generally acknowledged to be the pinnacle of Chinese fiction. We use the graphics tool to analyze the short path, centralization, structure of degree, clustering coefficient and the clique of those data. And try to find out how closely those characters are connected, and who might be the Social Queen or King in this story.

According to analyst Jiang Qi, there are 448 characters in “A Dream of Red Mansions”. We picked 187 characters who are relatively more important than others in this book and use the graphics tool to analyze these data and get the network below to show their relationship in the Dream of the Red Chamber.

The relationship between the 187 characters in the book

Continue reading “Who control the discourse power in 红楼梦?”

Analyze tracking errors between ETF and stock market index in the last decade

5571521786922_.pic_hd.jpgSummary: In this article, under the context of Hong Kong’s Mandatory Provident Fund scheme (MPF), we find that some Exchange Traded Fund (ETF) occupy large holdings in a particular MPF fund. By calculating the tracking errors over the past ten years, the year of 2008 stands out with the highest tracking error, which corresponds with the global financial crisis. In other years with more stable stock markets, ETF performs relatively well. We also calculate returns between two ETFs and the indexes they track respectively; the results turn out to be different.

We use Python as a calculator, rely on pandas to read csv files downloaded from Yahoo Finance, and time series to present data in the form of charts.


Hong Kong has a rapidly aging society. According to the Population Projections released by Census and Statistics Department, HK is expected to have a third of its population that aged 65 and above by the mid of 21st century. There is a growing concern over the issue of social security especially for the elderly, since the burden for the employed population in the future to take care of the retirees will be unbearable.

Continue reading “Analyze tracking errors between ETF and stock market index in the last decade”