Month: November 2017

Data News of the Week | North Korea Tensions

“North Korea” or “Democratic People’s Republic of Korea (DPRK)” are recurrent and frequent headlines in the newspapers. The recent advances in missile technology and nuclear tests threatens the world and creates a lot of geopolitical tensions. Our editor would like to share relevant data projects this week.

The “wholesale” packages

Assuming you are too busy to study all the background information and catch up the latest news, here are two must-read projects that get you up to date in 30 minutes.

☞ Immersive reporting from ESRI StoryMaps: side by side comparison of two Koreas in multiple angles [Link]

image2 Continue reading “Data News of the Week | North Korea Tensions”

5 Signs to Indicate the Future of Hong Kong Media Is Bright

It’s almost the end of 2017 and perhaps it’s the right time to predict the future of Hong Kong media. Although many might think the industry enters another “dark age”, it is not as worse as the majority think. We observed some promising trends of Hong Kong media landscape this year:

Continue reading “5 Signs to Indicate the Future of Hong Kong Media Is Bright”

Recap of Oct 2017 Data Journalism Bootcamp in HKBU

The 2-day Data Journalism Boot Camp was successfully held in HKBU on Oct 26 and Oct 27. The event was sponsored by KAS and the workshop sessions were led by two experienced trainers from DataLEADS. Another highlight of the event was a roundtable discussion chaired by Prof. Ying Chen, where professionals shared their practices, challenges and solutions in the newsrooms.

Data Bootcamp in Oct 2017

Continue reading “Recap of Oct 2017 Data Journalism Bootcamp in HKBU”

wget最簡爬蟲:一行命令助攻調查記者

書寫爬蟲已經成爲數據記者的必備技能。雖然有諸如ScrapingHub、Morph、ParseHub等在線服務,可以一定程度上實現無代碼抓取網頁,但很多時候,還是需要手動編寫爬蟲邏輯。爬蟲書寫分爲兩個部分,第一個是爬,第二個是取。「爬」即是從一個網頁出發,找到它所包含的鏈接,逐一訪問,不斷重複這個過程,最終收穫到需要的頁面。這個過程和人們瀏覽網頁是類似的,有種「順藤摸瓜」的意思。「取」則是從網頁中提取有效信息的過程,將「半結構化」的網頁,轉換爲「結構化」的數據表格。

本文介紹最簡單的爬蟲,只需要一行命令: wget -r

Continue reading “wget最簡爬蟲:一行命令助攻調查記者”

利用Tableau的JOIN功能篩選完整數據片段

做數據新聞經常會需要處理大量缺失數據(Missing Data)。如果原始數據是一張二維表格,那麼這張表格中有很多「空洞」,我們常常希望過濾掉這些「空洞」,留下整行整列,以便在一個限定的範圍內,進行完整的分析工作。

本文來自同學Zoya的投稿,目的是用地圖展示各個國家市政垃圾收集的數量。原始數據來自UN Municipal Waste Collection Dataset,年份覆蓋並不完全(Missing Data)。爲了統一對比標準,項目最終選擇篩選出2002到2012年(共11年)均有數據的國家,再繪製地圖。本教程展示了兩種方法,均有值得借鑑的技巧。法一組合利用Excel、Open Refine、Tableau的基礎功能,最後使用Tableau的JOIN操作,實現了缺失數據的過濾。法二則針對本用例的特殊性,直接在Tableau內部完整整個數據流,用到了「# of Records」這個特殊的計算量。

1

圖:原始數據截圖

Continue reading “利用Tableau的JOIN功能篩選完整數據片段”