Category: Article

第六屆香港立法會投票記錄分析(2016-2018)

本屆立法會波瀾起伏,6名議員先後被取消資格(俗稱「DQ」)。時值2018年3月補選之際,我們進行該項研究,通過本屆投票記錄的數據,爲公衆還原各議員的「政治肖像」。從這份數據驅動的光譜與投票熱力圖,我們可以看出,議員們究竟是言行一致,還是逢場作戲。在公衆熱烈討論「投票率新低」和「田忌賽馬失敗」之餘,我們也要知道,誰當選固然重要,當選後做什麼也許更加值得關注,會議的提案與投票記錄詳細地反應了議員的行爲。

議員投票傾向光譜

從第五屆立法會開始,電子投票記錄以XML結構化數據的形式公佈於網站上,供公衆下載。我們通過爬蟲蒐集第六屆立法會議員從2016年11月10日至2018年3月29日做出的電子投票記錄,共27426票紀錄,對於每個議案,議員可能產生五種不同的表決結果——贊成、反對、棄權、缺席或出席(即出席會議,但未投下「贊成」、「反對」或「棄權」中的任何一票;立法會小百科 )。將不同的表決結果數值化後,使用主成分分析法(Principal Component Analysis,以下簡稱為 PCA),計算2萬票紀錄反映出的最大分歧,即主維度(Principal Axis),記為 PA1。同時我們計算每位議員的投票紀錄在 PA1 上的投影值,即主成分(Principal Component),記爲 PC1。PC1 體現議員之間的相對關係,兩位議員得到的分值越接近,則說明他們的投票傾向也越接近。

按照 PC1 分值由小到大排序,就得到一條數據驅動的「政治光譜」(如下圖所示)。在這條「光譜」中間,是從不投票的梁君彥。越靠近梁君彥的人,投票風格就越溫和。越遠離梁君彥的人,投票風格就越激進。而梁君彥的兩邊,按照議員所屬派別塗色,恰恰是建制和泛民兩派人馬,與常識相符。

1
第六屆立法會(2016-2018)投票傾向光譜(按建制、泛民分類)
(已等比例縮放到第5屆立法會光譜的同等區間)

從相對距離上來看,泛民陣營整體離梁君彥更近,而建制則離得較遠。圖像顯示,建制派整體而言在投票中表現得比泛民更加激進。即建制派的建制立場,要強過泛民的民主立場。

光譜的兩端,分別是鄭松泰和盧偉國,這說明兩人的投票風格差異最大,相比其他議員,他們二者的投票風格最為激進。且根據統計,在投票記錄中,兩人投票意見不相同的提案數(371個)是兩人意見相同提案數量(37個)的十倍之多。

Continue reading “第六屆香港立法會投票記錄分析(2016-2018)”

Flying in the sky, a report of air crash worldwide

Cover.png

1/2560000 in 2016  VS.  ?  in history

In the past 70 years, Airplane has been an important tool for people to travel long distances. According to IATA annual report In 2016, the major aviation accident rate was 0.39, which was equivalent to only one major accident happen in every 2.56 million flights. This seemingly safe number is built on countless blood and sweat. Step down and turn back a little bit, let’s count the successes and failures in the flying history.

Data source

Data volume

  • 5534

Questions

  1. Yearly how many planes crashed? any trend? how many people were on board? how many survived? how many died?
  2. How the distribution of accidents between military and passengers? any insights?
  3. The highest number of crashes by operator and type of aircraft. The relationship between operators and types of airplanes?
  4. Find the airline routes with most accidents and try to find the reasons.
  5. Find any interesting trends/behaviors that we encounter when we analyze the dataset.

History of airplane accidents

Count of accidents by Year

A1.png

Form the picture, we can see the total accidents trend from low to high before the 1970s. After that, there are some small peaks around 1990 and 2010. But the overall trend after 1990 is gradually going down.

At the beginning of 20 centuries, 1903, Wright brothers invented plane. In 1909, French hold a big flight competition, which threatened the England and other European countries. Even there were many problems with the current planes, the military can’t wait for using it in war. The first time of airplanes’ appearing thus was in Italo-Turkish War. The power of airplane attracted other countries’ military, which leads a huge development in the military aviation industry. From 1914, the first world wartime, airplane mainly used for investigating, transporting, and some peripheral things. At the time of world war II, which is around 1940. Airplanes had widely used in battle. At the same time, World civil aviation organization (IATA) established in Havana, the capital of Cuba in 1945. In 1978, Cater, the president of the USA, signed meaningful a law in the history of American aviation legislation, which is <the airline deregulation act>. The establishment and merger of companies in the US domestic aviation industry, route selection, fare establishment, and even loss-making operations, are basically out of government control and intervention. The number of airplanes grows up fast with the high possibility of air crash occurred. The other reason we consider is that airplane technologies that at that time had weakness and need to improve. With the technologies completed, the amount of air crash will decrease. These situations are obvious after 2000.

Continue reading “Flying in the sky, a report of air crash worldwide”

從法案審議時長,回顧2000-2017年的香港立法會

提交法案,製定法律是立法會的重要職能。法案在刊登憲報後,須經過首讀、二讀、三讀才能成為法律。立法會通過一項法案所需的時間,關乎法案的複雜程度與爭議性。將法案的通過日期與刊登憲報日期相减,可得到該法案的審議時長。

從第二届立法會開始至今(2000-2017年),立法會每年平均處理法案數量為40件,其中通過的法案數量為37件,每件通過的法案平均審議時長為256天。

立法會議員們也在趕deadline嗎?

图片 11.png

上圖顯示,2000至2017年,每年通過的法案平均審議時長呈現四年一次的週期性波動,分別於2004、2008、2012與2016年達到最低值,即在這幾個年份,法案的平均審議速度加快,而這四個年份恰是立法會的換屆年。

諮詢立法會秘書處後得知,若某法案在當届立法會任期完結時未處理完畢,即失效,不得拖至下一届立法會。若欲繼續審議該法案,需重新刊登憲報,進行三讀程式。換屆年的法案通過效率有所提高,難道立法會議員們也在趕deadline嗎?

Continue reading “從法案審議時長,回顧2000-2017年的香港立法會”

立法會小百科 Q&A

立法會議員是如何產生的?

第六屆立法會(2016-2020)由70名議員組成,其中35名經地方選區直接選舉產生,其餘35名由功能界別選舉產生。

由地方選區選出的議員

三十五名議員由5個以地方分界劃定的選區(即地方選區)選舉產生。五個地方選區分別為香港島、九龍西、九龍東、新界西及新界東。每個地方選區根據區內人口數目設有5至9個議席。

由功能界別選出的議員

三十五名議員由29個功能界別選舉產生,每個界別分別代表香港一個重要的經濟、社會或專業界別。除了勞工界及區議會(第二)功能界別獲分別分配3個及5個議席外,其餘27個功能界別各獲分配一個議席。

立法會的主席是如何產生的,擔當什麼角色?

立法會主席由立法會議員互選產生,在立法會擔當主持會議、行政及禮儀的角色,並確保立法會會議能順利進行。

立法會的功能是什麼?

立法會在會期內通常逢星期三上午11時在立法會綜合大樓會議廳舉行會議,履行其監察政府工作及制定法律此兩項職能。在立法會會議上所處理的主要事項包括:質詢、聲明、法案、具立法效力的議案及不具立法效力的議案。

法案和議案有什麼區別?

“法案”是新法例的建議或對現有法例的修訂建議。政府及議員可提交法案,以提出制定新法例或修訂現行法例的建議,供立法會審議。法案須在立法會完成三讀程序,才可成為 法律。法案提交立法會後,議員會辯論及表決法案。

立法會事務大多透過“議案”方式處理。議案分為兩類:具立法效力的議案及不具立法效力的議案。

具立法效力的議案

官員及議員可動議具立法效力的議案,供立法會審議。例如:官員可動議制定或修訂附屬法例的議案,以徵求立法會的批准。

不具立法效力的議案

議員可於立法會會議動議辯論不具立法效力的議案,在議案辯論中,議員有機會表達對公共利益事宜的意見,或籲請政府採取某些行動。官員可出席辯論,回應議員的意見。

Continue reading “立法會小百科 Q&A”

轉載:特朗普父女推特解密(ODD-HK-17作品)

【轉載注】特朗普自從上臺以來,一直是媒體和學者關注的焦點。這位「推特治國」的總統,不僅極具話題性,也伴隨着豐富的數據集。這無疑是政治新聞報道中,非常適合數據驅動報道的議題。這篇文章來自兩位港大的同學,初稿形成于 Open Data Day Hong Kong 2017 的黑客松,由 Initium Lab 編輯和發表。轉載此文有兩個契機。一是 Open Data Day Hong Kong 2018 將于3月3日在港大舉行,屆時全港的開放數據行動者、公民科技愛好者、記者、學者、市民將匯聚一堂,發起專案,並在一天的時間內做出原型。部分項目組會在活動之後繼續研發,形成出色的數據應用或者數據報道。這篇文章是一個經典的案例,無論從選題、數據搜集/分析/可視化,還是項目執行,都極具代表性。黑客松讓不同背景的參與者,在高壓下腦力激盪、通力合作,可以很高效地找到有趣的選題,並做出原型。而將原型轉化爲最終作品,往往會花上數倍于黑客松現場的時間,並且需要專業技能的介入。希望通過這篇文章,讓正在努力學習 Python、R、Javascript 的傳播同學看到一種可能性——獨行者最速,衆行者最遠。轉載的第二個契機是,最近NBC發出了有關俄羅斯在Twitter上虛假帳號的數據集和報道。特朗普的崛起讓大量精英階層感覺到是一記耳光,他們慌了,不斷苛責媒體和社交網絡。究竟俄羅斯有沒有從中作祟?作用有多大?爲什麼特朗普的支持者如此之多,但民調竟沒發現?是隨機誤差還是系統誤差?這些疑問會在很長一段時間內不斷閃現,而人們熱衷於各種蛛絲馬跡。可以說,盯住特朗普、盯住Twitter總會有用不完的數據,寫不完的故事。這篇文章是很典型的文本分析于可視化,用R完成,可借鑑處頗多。

原文起


美國新晉總統唐納德·特朗普(Donald Trump)以其極端言論在一眾政客裡獨領風騷。端Lab曾於2016年撰文分析特朗普與其競選對手希拉里·柯林頓(Hillary Clinton)面對媒體採訪時不同的言論風格,發現特朗普發言多用簡單句型,且善於用第二人稱敘事獲取觀眾共鳴。

Continue reading “轉載:特朗普父女推特解密(ODD-HK-17作品)”

轉載:如何標準化數據可視化之「美」?

轉載自 全球深度報導網 Repost from Global Investigative Journalism Network (GIJN)

Original Article(本文由簡體中文轉爲繁體中文)

媒體在數字化轉型中,越來越多地用數據可視化作為呈現方式。但許多可視化的作品只是追求形式上的美感,沒有實現數據可視化真正的功能:清晰有效地傳達信息,使讀者更形象地理解數字背後的含義。媒體推出的可視化新聞良莠不齊,水準忽上忽下,卻少有懂行的同行或讀者給予反饋。本週的數據新聞給讀者介紹由Perceptual Edge網站創始人、信息視覺化專家Stephen Few提出的一個標準化衡量規則,評價可視化是否達到了必要的效果。

Few認為,標準化的規則能讓大眾針對同一個方面給予評價,而不只再用空泛的語言描述一些模糊的印象。他認為,可視化的效果分為兩大類,分別是信息豐富性(Informative)和情緒感染力(Emotive)。前者考量的是數據可視化能讓讀者接收到多少信息,後者則衡量能否讓讀者產生一定的情緒反饋。在這兩大類之下,還有七個細分的小標準。信息豐富性涵蓋了五個維度:實用性、完整性、讀者接受度、真實性和直觀性。情緒感染力有兩層含義:一個是美感,另一個是用戶參與度。

Continue reading “轉載:如何標準化數據可視化之「美」?”

Data News of the Week | U.S. Gun Control Debate and the Media

There were more than four months after the 2017 Las Vegas shooting, which is the deadliest mass shooting* committed by an individual of the States. As the largest gun-holding ratio (per capita) country around the world, USA has long been debating the gun control, yet, no consensus has been able to achieve. That’s also why media play a huge role in formulating the public debate and policy-making of gun control. We will discuss the two following reports by BBC (America’s gun culture in 10 charts) and The Telegraph (One mass shooting every day: Seven facts about gun violence in America) to illustrate the strengths and limitations of media coverage on the gun control.

*Mass shooting: When an attacker killed three or more victims in an indiscriminate rampage. Before 2013, the number was four. Continue reading “Data News of the Week | U.S. Gun Control Debate and the Media”

Data News of the Week | Trump Lies and His Job Promises

It has been more than a year under Donald Trump’s administration. Trump has made a lot of groundless claims and job promises which happen arbitrarily. Some newsagents keep a record of it and visualize the frequency. To criticise the act of the president in a more grounded way, The New York Times compares the lies frequency between Trump’s and the former president Barack Obama. We also dug deeper into one of his promises in his election platform which is job creation and employment enhancement. From which, we adopted a project from ProPublicaThe following will discuss the visualization, the use of words and the effects of the two projects. 

Continue reading “Data News of the Week | Trump Lies and His Job Promises”

[Repost] Reflections on the talk by Prof Ikhlaq Sidhu on Artificial Intelligence

dataXhkbu_2018masterclass_1

Prof Ikhlaq Sidhu on the DataXHKBU workshop on 26 Jan (by Xinzhi Zhang)

Note: This post was originally contributed as an entry to the Communicar Journal Blog on 28 Jan 2018. The author would like to repost it to the D&N Society]

Reflections on the talk by Prof Ikhlaq Sidhu on Artificial Intelligence

“Will AI help the film directors to make better movies? – Yes! But will AI replace the film directors and become directors? – No!” – Prof Ikhlaq Sidhu.

Continue reading “[Repost] Reflections on the talk by Prof Ikhlaq Sidhu on Artificial Intelligence”