The 2-day Data Journalism Boot Camp was successfully held in HKBU on Oct 26 and Oct 27. The event was sponsored by KAS and the workshop sessions were led by two experienced trainers from DataLEADS. Another highlight of the event was a roundtable discussion chaired by Prof. Ying Chen, where professionals shared their practices, challenges and solutions in the newsrooms.
The following notes are contributed by Bess, Sophie and Roy from the Data and News Society (DNN). We reorganised the crowd-sourced notes into following sections: Tools, Principles, Practices, Cases, Data sources and Links.
- Table Capture ( http://www.georgemike.com/tablecapture/ ). It is a Chrome extension that helps to capture tables from web pages. Use this page for exercise: https://www.tbfacts.org/cases-of-tb/
- Tabula ( http://tabula.technology/ ) . It’s a desktop software with Web UI that extracts structured tables from PDFs. Note that it can only handle text-based PDF. In the case of a photo scan PDF, OCR is needed as a pre-processing step. Download “Financial Flows from Developing countries: 2004-2013” dataset for exercise here: http://www.gfintegrity.org/
- Google Spreadsheet has a built-in function “IMPORTHTML” to retrieve & parse tables from HTML pages. In this way, once the table change on the website, your table in Google document will change accordingly. Try this: =IMPORTHTML(“https://www.tbfacts.org/cases-of-tb/”, “table”,1)
- Canva ( https://www.canva.com/ ). It is a lightweight online tool to make charts, infographics and light animations. The free version already provides numerous frameworks, icons, photos and chart types to use. Sanjit, the trainer of this workshop, uses Canva in his daily production and created a lot of eye-catching pictures on his social accounts.
- Data Wrapper ( https://www.datawrapper.de/ ). It is an intuitive online charting software. You can create an interactive chart within four steps: Choose a source, Choose chart type, Choose column mapping, Customise chart. Do the exercise by copying a dataset from http://www.gfintegrity.org/report/illicit-financial-flows-from-developing-countries-2004-2013/
Screenshot: DataWrapper Website
- The challenge we are faced nowadays shifts to making sense of data, whereas collecting information was hard in the past.
- Ying Chan emphasised the importance of data to journalism in contemporary newsrooms.
- The trainer shared the challenges/qualities to be a good journalist nowadays.
- Get the story idea
- Meet five new people every day
- See stories all around you. The trainer shared his experience of finding an important story idea from a classified ads in an India newspaper, from which he wrote the story of “Behind the Veil”.
- Consult different data sources. For example, how much money China invests to India probably can’t be found in China, but can be easily found in India.
- Build your own database. (editor’s note: build the database of databases, a.k.a. “Meta database”, so that you know where to find what data in need)
- Keep an idea file: write down all possible story ideas whenever you have one.
- Write clearly; Make a headline; Keep it Simple; Bear in mind “every word counts” principle.
- Use Golden Quote. That is, short and direct quotes change the pace of a story, add colour and character. It also illustrates data.
- Some tips in covering breaking news story:
- Monitor social media in the wake of breaking story
- Have a process, practice it often before breaking news happens. For example, the question of whether or not to name the gunman/ the terrorist should have a routine answer.
- Be technically self-reliant (tech, food, wifi…)
- Think about the DANGER! Fatal accidents could follow one after another.
- Get away from the herd mentality
- Use multiple angles to look at the issue
The speakers of the round-table hosted by Prof. Ying Chen each shared their own experience with data in a real newsroom. Here are some highlights.
- Daisy Lee from Citizen News (眾新聞) shared the cases “特首選舉2017”，“醫療資源分布”, “堅離地圖”. The key takeaway is the importance of leveraging external resources. The development of data journalism in the newsroom has long been constrained by lack of resources. Daisy discovered the housing price map project from some hobbyist developers and successfully put internal resources together to turn the project into media-grade one.
- Patrick Boehler from New York Times shared experience working with people from different backgrounds.
- Annie Zhang and Xu Xiaotong from Initium Media shared the “product mindset” widely adopted by Initium Media. In the newly born newsroom, the traditional role of an editor is not performed by a “product manager”. The product perspective is process oriented, not results oriented. When managing the process, one needs to consider cost, timeliness, scale, objective, communication channel as a whole. As a good practice, graphical communication and small talks are encouraged in the team.
- John Siu from HK01 shared some survival tips in a 600-staff organisation. Sizable organisations usually have centralised IT and design team for efficiency purpose. That makes the coordination loop of content production longer than usual and calls for one to self-study multiple skills. As to data journalism projects, John shared some “News Games”, a new format of online journalism that delivers the message in an interesting way. The Calculator of the MTR subsidy (HK01) and the “Housing Affordability Map” (Citizen News & Y714) are examples.
Screenshot: The Housing Price Affordability Map of Hong Kong
- Charge of carpark by the Link and Housing Authority, FactWire News Agency
- ConnectedChina, Reuters
- China’s air pollution in 2014, SCMP
- Zhou Yongkang’s Power Network (“周永康的紅與黑”), Caixin
- How China’s Top Leaders Came Into Power, Bloomberg
- 築地, 朝日新聞
Some common/ general data sources one can consider in reporting:
- Business licenses
- Campaign donations
- Government contracts
- Official annual reports
- Audit reports
- Parliament documents
- Police records
Tips on FOI (For Our Information/ Freedom Of Information) and RTI (Right To Information):
- Do research before filling FOI
- Figure out as precisely as possible what information you want (Editor’s note: and describe it as precisely as possible)
- What federal agency or office is most likely to have that info; Search who are the officers for RTI
- Make sure your questions are sensibly phrased and effective
- Try to use jargon or phraseology that the authority itself uses to refer to the activity or topic you are interested in
- Build a network with FOI activists and NGOs. (Editor’s note: https://accessinfo.hk/ is a community effort in Hong Kong to centralised all FOI requests and government department information)
- Build contacts with FOI officials
- Exercise: File FOI every day, week
- When many FOI requests are responded with datasets, it is important to be able to manage the data
- Declaring as a journalist wouldn’t make a difference — It is optional to tell the identity.
Contributor: Bess, Sophie, Roy
Editor: Pili Hu
Posted by: Bobo Wei