Loads of imagination about programming has been running helter-skelter in my mind before I step into the spacious and well-polished Spectrum studio on the 11th floor of an office building at Sheung Wan in this bright Saturday morning. As someone who has been concentrating only on courses about liberal arts since senior school, I always consider coding as something far away from my daily life.
But today, Chico Xu, Ivy Wang and I, as student reporters, are going to take a glance into this sophisticated business which we once thought was irrelevant to the lives of us and the lives of many, but which actually is, and to a large extent.
The event we are attending is called the Global Pandas Documentation Sprint, a worldwide event held simultaneously in more than 300 countries on March 10, 2018, aiming at improving this Python library’s documentation with clearer explanations and better examples, and trying to leave, at the end of the day, with the library enhanced “in a perfect state,” as put by its official website.
Open Data Day is an annual celebration of open data all over the world. In the year of 2018, more than 400 cities simultaneously organise hackathons on Mar 3. According to one Hong Kong organiser, Bastien Douglas, most local organisers of ODD are government affiliates. In Hong Kong, communities like OSHK and ODHK lead the organisation every year. One highlight for ODD-HK-18 is the talk from Jessica Lo, the system manager from OGCIO responsible for the open data portal: data.gov.hk
Summary: We used API (Application Programming Interface) as the source to extract data from the USGS database in order to analyze the last 50 years and estimate the frequency of earthquakes in Southeast Asia. With the help of Python, the extracted data was exported into CSV file for categorizing different parameters such as by country, magnitude and year.
Application programming interface (API) is commonly used to extract data from a remote website server. In layman term, API is used to retrieve data or information from another program. There are several websites such as Facebook, the USGS, Twitter, Reddit, which offer web-based API helping get information or data.
In order to retrieve data, we will send requests to the host web server where you want to extract the data and tweak parameters like URL in the module to connect to the server. Different websites have different requests format and can easily be accessed through the host’s website.
In our module, we will be extracting the data of earthquakes that hit Southeast Asia in the last 50 years from the web server of USGS using API.
One of the most frequent natural disasters on planet earth is earthquake. The sharp unleash of energy from Earth’s lithosphere generates seismic waves which lead to sudden shaking of the surface of the Earth. This natural disaster has led to the death of thousands of millions of people all around the world.
The strength of earthquakes is measured through Richter magnitude scale or just magnitude. The magnitude is the scale which ranges from 1-10.
The most highly sensitive region in the world prone to the earthquakes are Southeast Asian countries. To find the trend in the region, we extracted 50 years data from USGS by using API and convert the numbers into CSV file through Python coding for a comprehensive understanding of earthquakes situation in Southeast Asia.
It is time to break-down the broad concept of “data journalism”. When talking about the combination of data and news, we usually refer to two processes, sometimes conducted in an integral manner. One process is to discover news points from datasets. The datasets can provide a lead for further investigation. The final product does not necessarily reflect the usage of data. It may look the same as normal news products mainly composed of interviews and photos. This is called “data mining” in the science domain. Another process is to present news points using data. There come to all kinds of charts and interactive/ immersive presentations. This is called “data visualisation” in the science domain.
Let’s focus on the “data mining” part in this article. That is to discover news from datasets, or more precisely discover a news lead from datasets. The further development of the entire news story may take much more efforts with a combination of traditional and modern methods. For easier discussion, we treat “news” in the general form: something the audience does not know before reading, a.k.a, something that “appears new”. It could be the status update of a current affair, or it could be the “new knowledge” to the readers (probably be “common knowledge” to experts which we don’t want to waste time debating).
As advocated by the “Road to Jan”: the most profound theory takes the simplest form. As a first step, we try not using programming, or even sophisticated spreadsheet skills. One can readily find some “news” with a bit “nose for news” and be computer literate is good enough. In this article, we will demo a few news points mined by our undergraduate students from Hong Kong government data portal: https://data.gov.hk . It took around 20 minutes in the second class of a data journalism course. We start with a public dataset from the portal, check out the data tables and eyeball if there is anything interesting. The process is so quick that we would like to give it a brand name: Lightning News. One can sharpen his/her news sense and data sense by doing this as daily exercise.
Civic Exchange (思匯政策研究所,“CE” hereafter, a think tank) would like to collaborate with HKBU JOUR to make series of investigative reports on the issue of open space in Hong Kong. Basically, CE will share several datasets, and provide guidance on how to harvest other public data and how to interpret it, whereas the student teams will conduct reporting on this theme. Data journalism and data-driven storytelling are potential formats.
The Computer Science department and Data & News Society have invited Shan He, a Guangzhou-based civic scientist as well as the project director of Chinese NGO Greenovation Hub, to hold a workshop on harvesting water quality data through simple chemical test kits and DIY water monitoring device for environmental investigation on the 18th of January, 2018. A dozen of mixed students from computer science and journalism background and some interested citizens attended and worked in groups.
It was new for me when I heard anyone can acquire almost any data from HK government for legitimate reasons under that Code on Access to Information.
This code is a response to the notion of “FOI” (For Our Information; Freedom Of Information), which calls for citizens’ free access to government information so that the transparency of government management can be ensured and citizen rights can be protected.
According to Wiki, In 2006, nearly 70 countries went through relative legislation. Among these laws are USA’s FOIA (Freedom of Information Act) and, of course, Hong Kong’s Code on Access to information.
Despite the code in place, a practical question remains. Will those government officers fulfill their duty and do give reply to every single data request? So I decided to give a try on Accessinfo.hk.
Accessinfo.hk is a website positioned as a platform for citizens to post their information requests to authorities and receive feedback. It was initiated by a group of Open Data activist, including Guy Freeman, who is currently data scientist in HK01. The website publishes every question and answer to everyone, and, at the same time, monitors the process. Before localizing the Alaveteli system ( http://alaveteli.org/ ) to Hong Kong, its sister site WhatDoTheyKnow ( https://www.whatdotheyknow.com/ ) had already seen wide application in the United Kingdom.
Data is the key for environmental investigation and monitoring. However it is very hard for ordinary citizens to get access to. Let water quality be example, which is associated closely with our daily life. When serious environmental disasters break out, with limited information disclosure from government, general public can hardly know the truth in time. The motivates us to organise this workshop that enables you to make DIY monitoring devices with open technology.
So far, the offshore leaks database published by ICIJ includes at least 200,000 offshore entities from Panama Papers, and over 100,000 records from ICIJ’s previous investigations.
The offshore leaks database contains detailed contact postal address of all kinds of entities, offshore or non-offshore ones, officers and intermediaries, based on which we made colored maps revealing the distribution of postal address of people or companies involved in the offshore industry.