Category: Tool

活用多個數據庫做企業背景調查:一篇民間調查的方法解析

原文:我猜你们一定很想了解一下红黄蓝

隨著北京紅黃藍幼兒園的虐童事件的不斷發酵,微博熱搜一度出現:「三種顏色不能上熱搜」熱門話題。中國社交媒體的兩大平台「微博」和「微信公眾號」的網絡大V們仍舊不斷在針對此次事件从不同角度發表各種文章。這次小編帶大家看看一個經常以吐槽科技公司及其產品的科技自媒體「差評君」,在第一時間發表了的一篇針對「紅黃藍幼兒園經營背景」的調查,獲得超過十萬加阅读量的微信公眾號文章。

這篇文章通過網絡公開資料,以熟練運用各種搜索工具作為主要手段,呈現了一個數據調查報道的成功案例。小編進行「逆向工程」,帶大家分析一下這篇文章中使用到的數據庫和調查手段。

Continue reading “活用多個數據庫做企業背景調查:一篇民間調查的方法解析”

Learn Spreadsheet to Mine Data and Jumpstart Your Data Journalism Career – A Sharing by Aimee Edmondson

Aimee Edmondson is now an Associate Professor with Scripps School of Journalism, Ohio University. HKBU students are very lucky to have this knowledgeable and passionate speaker to talk about data journalism this afternoon. Her 12 years in reporting and later acquired statistics and technology are a fine combination for a data journalist. In the world where people are too fascinated by new technology and numerous boot camps are created by non-journalists, Aimee can be a role model for those “traditional journalists” who are moving in this direction.

Why does data matter? In Aimee’s words, you want to be a reporter, not a repeater. Data helps one to verify what the source is saying and find out what is really happening. To be pragmatic, we are seeing more and more JD requiring data analytics skills from investigative reporters. Going beyond the journalism domain, the skills trained by data journalism can well fit into corporate communication, public relation and advertising industry.

Picture: Job boards on IRE, from the slides

To start, one only needs to work on “small data”, with a spreadsheet.

Continue reading “Learn Spreadsheet to Mine Data and Jumpstart Your Data Journalism Career – A Sharing by Aimee Edmondson”

網絡數據包分析:從 Google Maps 獲取 Fusion Table 原始數據

網上經常見到使用 Google Maps 繪製的地圖,如果希望對地圖中的興趣點(Point of Interest,POI)進行二次分析,就需要得到繪製地圖背後的結構化數據。如果是使用 Google Fusion Table 繪製的地圖,可以通過網絡抓包找到 Fusion Table 的ID,進而拼接出原始地址。本文來自同學 Lam Man Kit 的投稿,僅做技術交流。數據記者在使用時,需要注意原始數據的版權。而本地的研究者也需要遵守公平使用原則。本文以 FactWire 的數據報道 「分析182個領展停車場月租收費 9成2貴過房委會同區 最大差距達1.18倍」 爲例。

圖:通過網絡抓包分析 Fusion Table 的 ID

Continue reading “網絡數據包分析:從 Google Maps 獲取 Fusion Table 原始數據”

Embedding interactive rich media on WordPress

Source: Wiki Commons

There are a lot “one-click” tools available online that help you to create good visualisation and export to iframe for embedding into your site. Good use of those tools can better present your content to the readers. Note that the free version of WordPress hosted service does not allow embedding iframe, so they can only rely on shortcodes. For example, one can use is  to embed interactive charts generated from Google Sheets. See more options of available shortcodes for free version here

Data and News Society is operated on a paid plan so we installed the iframe plugin. This makes it possible to enable a wide range of 3rd party visualisation into your project. This tutorial is contributed by Jade Li to demo how to embed interactive content from several common tools. The general workflow is to first export the 3rd party project as iframe, find the URL in the src=”” section, and use  [ iframe src=”” ] to embed it into WordPress.

Continue reading “Embedding interactive rich media on WordPress”

Recap of Oct 2017 Data Journalism Bootcamp in HKBU

The 2-day Data Journalism Boot Camp was successfully held in HKBU on Oct 26 and Oct 27. The event was sponsored by KAS and the workshop sessions were led by two experienced trainers from DataLEADS. Another highlight of the event was a roundtable discussion chaired by Prof. Ying Chen, where professionals shared their practices, challenges and solutions in the newsrooms.

Data Bootcamp in Oct 2017

Continue reading “Recap of Oct 2017 Data Journalism Bootcamp in HKBU”

wget最簡爬蟲:一行命令助攻調查記者

書寫爬蟲已經成爲數據記者的必備技能。雖然有諸如ScrapingHub、Morph、ParseHub等在線服務,可以一定程度上實現無代碼抓取網頁,但很多時候,還是需要手動編寫爬蟲邏輯。爬蟲書寫分爲兩個部分,第一個是爬,第二個是取。「爬」即是從一個網頁出發,找到它所包含的鏈接,逐一訪問,不斷重複這個過程,最終收穫到需要的頁面。這個過程和人們瀏覽網頁是類似的,有種「順藤摸瓜」的意思。「取」則是從網頁中提取有效信息的過程,將「半結構化」的網頁,轉換爲「結構化」的數據表格。

本文介紹最簡單的爬蟲,只需要一行命令: wget -r

Continue reading “wget最簡爬蟲:一行命令助攻調查記者”

利用Tableau的JOIN功能篩選完整數據片段

做數據新聞經常會需要處理大量缺失數據(Missing Data)。如果原始數據是一張二維表格,那麼這張表格中有很多「空洞」,我們常常希望過濾掉這些「空洞」,留下整行整列,以便在一個限定的範圍內,進行完整的分析工作。

本文來自同學Zoya的投稿,目的是用地圖展示各個國家市政垃圾收集的數量。原始數據來自UN Municipal Waste Collection Dataset,年份覆蓋並不完全(Missing Data)。爲了統一對比標準,項目最終選擇篩選出2002到2012年(共11年)均有數據的國家,再繪製地圖。本教程展示了兩種方法,均有值得借鑑的技巧。法一組合利用Excel、Open Refine、Tableau的基礎功能,最後使用Tableau的JOIN操作,實現了缺失數據的過濾。法二則針對本用例的特殊性,直接在Tableau內部完整整個數據流,用到了「# of Records」這個特殊的計算量。

1

圖:原始數據截圖

Continue reading “利用Tableau的JOIN功能篩選完整數據片段”

Map Visualisation for Panama Papers and Offshore Leaks in R

(This is a repost from initiumlab.com by Charlie Chen, Chao Tianyi, click the link to read the original: Map Visualisation for Panama Papers and Offshore Leaks in R)

On May 9, 2016, the International Consortium of Investigative Journalists (“ICIJ” in short), a global network of journalists who collaborates on in-depth and investigative stories, released the long awaited offshore entities database behind Panama Papers Investigation.

So far, the offshore leaks database published by ICIJ includes at least 200,000 offshore entities from Panama Papers, and over 100,000 records from ICIJ’s previous investigations.

The offshore leaks database contains detailed contact postal address of all kinds of entities, offshore or non-offshore ones, officers and intermediaries, based on which we made colored maps revealing the distribution of postal address of people or companies involved in the offshore industry.

Continue reading “Map Visualisation for Panama Papers and Offshore Leaks in R”

小工具有大作用-輕鬆截屏剪裁

(This is a repost from initiumlab.com, click the link to read the original: 小工具有大作用-輕鬆截屏剪裁)

截圖,是全球電腦使用者都熟悉的功能。但是,常用不代表精通。我們整理了平日常用的進階截圖技巧,或許能為大家打開新世界的大門。

技能一:全網頁截圖

截圖大家都熟悉,可是如果想要截取全網頁,而這個網頁比屏幕大怎麼辦?比方說,端傳媒的深度報導,篇篇精彩,但是大都幾千字的篇幅,想要截圖保存,能否快速搞定?

我們可以利用Firefox來實現這個想法。

第一步:打開 Firefox,打開目標網頁,調整到理想的視窗比例。

第二步:進 Tools -> Web Developer -> Toggle ToolsFirefox-capture-1.png

第三步:在彈出的窗口點擊小齒輪,選擇「Take a fullpage screenshot」,然後點擊「照相機」按鈕。

第四步:從此之後,只要打開控制台,點選照相機即可。文件會下載到 Downloads 文件夾內。

Continue reading “小工具有大作用-輕鬆截屏剪裁”

Time Saving One-Liners for Journalists

(This is a repost from initiumlab.com, click the link to read the original: Time Saving One-Liners for Journalists)

For journalists who do not code, after reading this article, you will master 15 one-line commands that can help you handle complex problems in seconds.

Initium Lab has collected various command line tricks as we hack journalism with technology. Here is our editor’s choice so far including: image processing, video processing, PDF manipulation, social network hacking and other useful one-liners. Just open your Terminal, and follow the steps.

1.png Continue reading “Time Saving One-Liners for Journalists”