Author: Pili Hu


Workshop: Forensics and Verification with Storyful on Social Media

This is a three-week training workshop conducted by journalist from Storyful, a social media information gathering/ content producing platform used by many journalists and marketers. The training time is April 11/ 18/ 25 afternoon at CVA703. Digital forensics and verification is also related with data journalist’s day-to-day job, especially many data/ information nowadays come from social media. We will have the second class of the workshop tomorrowWednesday April 18, at 4:30pm in the CVA 703 lab. Please find the following announcement edited from Robin Ewing’s message.

The trainers this week are Storyful journalists Rachel Blundy and Layla Mashkoor.

  • Rachel was a journalist with the SCMP before coming to Storyful
  • Layla worked in communications and marketing.

They will focus on using forensic verification tools to investigative and verify social photos and video and how you can debunk fake or misleading content. This will be a hands-on workshop and great training for those of you who want to work in media or communication.

Prerequisite: The trainers have asked that everyone have a Twitter account and a TweetDeck account. Please open free accounts before class if you don’t already have them.

Continue reading “Workshop: Forensics and Verification with Storyful on Social Media”

轉載:嶄新展示—— 立法會數據一目了然 | 獨家新聞解碼2

端傳媒數據新聞作品《20 萬條投票紀錄帶你解碼香港立法會》搜集本屆立法會議員由 2012 年至 2015 年的立法會投票記錄,分析他們的投票取態和規律,並結合文字、圖表、視頻等 多元的手法,將海量的信息以新穎、簡明、生動的方式呈現在 公眾面前。這是一次頗為成功的數據新聞嘗試,對於今後的數 據新聞發展具有重要的參考價值。該報道獲 2016 年亞洲出版協會(SOPA)卓越資料圖像優異獎。

受《獨家新聞解碼2》邀稿,胡辟礫與巢恬逸就創作過程進行了覆盤,通過本文展示了數據分析與可視化的中間過程,關鍵圖表的取捨決斷,以及不斷強調的「用戶調查」在新聞*產品* 中的應用。Data and News Society 獲授權轉載經書本編輯的終稿。請下載 PDF:《獨家新聞解碼2》195-203 。


原作品於2015年7月製作,彼時上屆立法會尚未完結,數據只覆蓋了2012-2015年。2016年上屆立法會完結後,作者使用同樣的方法對2012-2016年的全部投票進行了分析,並用交互圖表呈現投票傾向的變化。感興趣的讀者可以進一步閱讀上屆立法會有關 投票行爲 和 議案內容 的分析。

Some Scraping Targets and Ideas

This is a casual post to dump some target sites for scraping or just project ideas. Those messages were first sent through COMM7780/JOUR7280 WeChat group. Although we have only explored part of those possibilities this semester, the list is good for future reference. We can bounce off ideas in the comment below and enrich this list.

Top Targets: Movie, Shopping and News

Let’s first have a look at what the students care about from HW2 submission:

Scraping Targets from HW2 submission – COMM7780/JOUR7280

Continue reading “Some Scraping Targets and Ideas”

Key Takes from Jessica Lo’s Sharing on ODD-HK 2018 about Government Data Portal

Open Data Day is an annual celebration of open data all over the world. In the year of 2018, more than 400 cities simultaneously organise hackathons on Mar 3. According to one Hong Kong organiser, Bastien Douglas, most local organisers of ODD are government affiliates. In Hong Kong, communities like OSHK and ODHK lead the organisation every year. One highlight for ODD-HK-18 is the talk from Jessica Lo, the system manager from OGCIO responsible for the open data portal:

Continue reading “Key Takes from Jessica Lo’s Sharing on ODD-HK 2018 about Government Data Portal”


【轉載注】特朗普自從上臺以來,一直是媒體和學者關注的焦點。這位「推特治國」的總統,不僅極具話題性,也伴隨着豐富的數據集。這無疑是政治新聞報道中,非常適合數據驅動報道的議題。這篇文章來自兩位港大的同學,初稿形成于 Open Data Day Hong Kong 2017 的黑客松,由 Initium Lab 編輯和發表。轉載此文有兩個契機。一是 Open Data Day Hong Kong 2018 將于3月3日在港大舉行,屆時全港的開放數據行動者、公民科技愛好者、記者、學者、市民將匯聚一堂,發起專案,並在一天的時間內做出原型。部分項目組會在活動之後繼續研發,形成出色的數據應用或者數據報道。這篇文章是一個經典的案例,無論從選題、數據搜集/分析/可視化,還是項目執行,都極具代表性。黑客松讓不同背景的參與者,在高壓下腦力激盪、通力合作,可以很高效地找到有趣的選題,並做出原型。而將原型轉化爲最終作品,往往會花上數倍于黑客松現場的時間,並且需要專業技能的介入。希望通過這篇文章,讓正在努力學習 Python、R、Javascript 的傳播同學看到一種可能性——獨行者最速,衆行者最遠。轉載的第二個契機是,最近NBC發出了有關俄羅斯在Twitter上虛假帳號的數據集和報道。特朗普的崛起讓大量精英階層感覺到是一記耳光,他們慌了,不斷苛責媒體和社交網絡。究竟俄羅斯有沒有從中作祟?作用有多大?爲什麼特朗普的支持者如此之多,但民調竟沒發現?是隨機誤差還是系統誤差?這些疑問會在很長一段時間內不斷閃現,而人們熱衷於各種蛛絲馬跡。可以說,盯住特朗普、盯住Twitter總會有用不完的數據,寫不完的故事。這篇文章是很典型的文本分析于可視化,用R完成,可借鑑處頗多。


美國新晉總統唐納德·特朗普(Donald Trump)以其極端言論在一眾政客裡獨領風騷。端Lab曾於2016年撰文分析特朗普與其競選對手希拉里·柯林頓(Hillary Clinton)面對媒體採訪時不同的言論風格,發現特朗普發言多用簡單句型,且善於用第二人稱敘事獲取觀眾共鳴。

Continue reading “轉載:特朗普父女推特解密(ODD-HK-17作品)”

Lightning News from Public Data Sets

It is time to break-down the broad concept of “data journalism”. When talking about the combination of data and news, we usually refer to two processes, sometimes conducted in an integral manner. One process is to discover news points from datasets. The datasets can provide a lead for further investigation. The final product does not necessarily reflect the usage of data. It may look the same as normal news products mainly composed of interviews and photos. This is called “data mining” in the science domain. Another process is to present news points using data. There come to all kinds of charts and interactive/ immersive presentations. This is called “data visualisation” in the science domain.

Let’s focus on the “data mining” part in this article. That is to discover news from datasets, or more precisely discover a news lead from datasets. The further development of the entire news story may take much more efforts with a combination of traditional and modern methods. For easier discussion, we treat “news” in the general form: something the audience does not know before reading, a.k.a, something that “appears new”. It could be the status update of a current affair, or it could be the “new knowledge” to the readers (probably be “common knowledge” to experts which we don’t want to waste time debating).

As advocated by the “Road to Jan”: the most profound theory takes the simplest form. As a first step, we try not using programming, or even sophisticated spreadsheet skills. One can readily find some “news” with a bit “nose for news” and be computer literate is good enough. In this article, we will demo a few news points mined by our undergraduate students from Hong Kong government data portal: . It took around 20 minutes in the second class of a data journalism course. We start with a public dataset from the portal, check out the data tables and eyeball if there is anything interesting. The process is so quick that we would like to give it a brand name: Lightning News. One can sharpen his/her news sense and data sense by doing this as daily exercise.

Continue reading “Lightning News from Public Data Sets”

Data News of the Week | Visualising the Blockchain

The Rise and Rise of ICO in late 2017

If you still don’t know what is “blockchain” or what is “bitcoin”, The recent work from Max Galka will assure you this is the high time to do some self-study, or you will miss the birth time of “another Internet”. The idea of ICO, Initial Coin/Chain Offering, is an analogy of IPO. With the inception of “smart contract” capability, fundraising, a process to exchange currency to certificate, can be done in a distributed manner. The “currency” in the chain world can be Ethereum, NEO, BitCoin, … The “certificate” in the chain world is called “token” so the ICO process is also referred to as “token sale”. The convenience of ICO gains rapid growth with crazy capitals pouring into this field. Just check out this interactive/ animated token sale history.

Screenshot: The ICO Market Cap to date (Nov 2017)

Continue reading “Data News of the Week | Visualising the Blockchain”

Workshop | Harvest water quality data for environmental investigation — DIY monitor with Arduino

Data is the key for environmental investigation and monitoring. However it is very hard for ordinary citizens to get access to. Let water quality be example, which is associated closely with our daily life. When serious environmental disasters break out, with limited information disclosure from government, general public can hardly know the truth in time. The motivates us to organise this workshop that enables you to make DIY monitoring devices with open technology.

Continue reading “Workshop | Harvest water quality data for environmental investigation — DIY monitor with Arduino”

Hong Kong Data Journalism Bootcamp 2018

DNNers, Happy new year!

We are pleased to announce a 3 days data journalism bootcamp at the end of January. This is an intensive training to get you onboard this fascinating battle ship in the new media ocean. You will spend a fruitful weekend with 60 students from all Hong Kong higher education institutions. The event adopts a “startup weekend” format and features hands-on experience. Friday evening will see an overview of data journalism and team formation. Teams can work at any time from Friday evening all through Sunday afternoon to finish a data journalism project. Saturday is composed of three structured workshops including data collection/ preprocessing, descriptive statistics and data visualisation. Sunday morning will see some industry practitioners/ community contributors sharing tips/ pointers to further broaden the horizon of participants. Most training sessions are optional and attendees can pickup the preferred skills as needed.

Event overview:

  • Date/Time: Jan 26 (Fri) evening to Jan 28 (Sun) afternoon
  • Venue:
    • Day 1 (Fri): Cheng Yu Tung Building (100m from MTR University Station)
    • Day 2 (Sat)/ Day 3 (Sun): Learning Garden, G/F, University Library, CUHK
  • Audience: Students in Hong Kong higher education institutions
  • Cost: FREE (with HK$ 200 deposit)


Continue reading “Hong Kong Data Journalism Bootcamp 2018”

From graphic designer to data visualisation specialist – A sharing from Will Su

Photo: Will Su on KANTAR Information Is Beautiful 2017

We have invited Will SU Jiahao, the winner of Information Is Beautiful Award 2017 to share his experience on data visualisation. As someone who entered the data visualisation industry with zero knowledge in neither programming nor statistics, he will talk about how he transitioned from being a traditional graphic designer to a data visualisation specialist within the span of one year. The process involves picking up programming skills and becoming comfortable with both front-end web development and back-end dev ops. He will also touch on the exciting process of visualising data, as well as some of the common questions and obstacles new comers may face, how to overcome them, and progressively acquaint themselves with the work-flow of a web-based data visualisation storytelling piece.

Event information:

  • Date: Jan 12 (Fri), 2018
  • Time: 11:30am – 12:10pm
  • Venue: Room 1024, Communications and Visual Arts Building, HKBU

Continue reading “From graphic designer to data visualisation specialist – A sharing from Will Su”