• Home
  • Event
  • Article
    • Gallery
    • Opinion
    • Resources
  • About
    • Founding Members
    • Steering Committee
    • Executive Officers
  • Newsletter

The Data & News Society

~ news/numbers; stats/stories

The Data & News Society

Tag Archives: data collection

Some Scraping Targets and Ideas

10 Saturday Mar 2018

Posted by Pili Hu in Resources

≈ Leave a comment

Tags

COMM7780/JOUR7280, data collection, Scraping

This is a casual post to dump some target sites for scraping or just project ideas. Those messages were first sent through COMM7780/JOUR7280 WeChat group. Although we have only explored part of those possibilities this semester, the list is good for future reference. We can bounce off ideas in the comment below and enrich this list.

Top Targets: Movie, Shopping and News

Let’s first have a look at what the students care about from HW2 submission:

Scraping Targets from HW2 submission – COMM7780/JOUR7280

Continue reading →

Workshop recap: How Does HKBU Library Preserve Vintage Documents Using OCR?

04 Sunday Mar 2018

Posted by Erin Chan in Event, Tutorial

≈ Leave a comment

Tags

data collection, Digitalization, library, OCR, Scan

Technology has changed our way of researching and our reading habit after the Internet became the popular platform for the release of news and information. The documents and publications from the non-information era are still invaluable for us especially when it comes to referencing and history learning. Yet, these resources are black and white and read all over, which does not fit in today’s mode of information processing. To digitalise these old documents, four students from Baptist University (BU) learned about the technique and usage of software in Optical Character Recognition (OCR) workshop.

IMG_9357

Using OCR machine to preserve information on old documents.

Continue reading →

Data News of the Week | What can we, the 20-year-old, do to change the world?

23 Friday Feb 2018

Posted by jessiepyt in Resources, Tutorial

≈ 1 Comment

Tags

data, data analysis, data cases, data collection, Data Journalism, Datamining, DNW, Heatmap, Strava

Nathan Ruser, a 20-year-old Australian National University student who is majoring in international security with a keen interest in cartography, discovered a fitness app had revealed the locations of secret military sites in Syria and elsewhere. He posted on Twitter about this, did not expect much response.

But the news ricocheted across the internet. Security experts said the Strave app’s “heat map” could be used by hostile entities glean valuable intelligence. The Pentagon said it was reviewing the situation.

How he found the news?

 “Whoever thought that operational security could be wrecked by a Fitbit?” Mr. Ruser, said in an interview with New York Times from Thailand, where he is spending part of the Australian summer break.

When he looked over Syria on Strava’s map — which is based on location data from millions of users, including military personnel, who share their exercise activity — the area “lit up with those U.S. bases,” he said.

Before publicly sharing his findings over the weekend, he discussed them in a private chat group on Twitter, made up of people interested in intelligence and security issues. “I know about two-thirds of what I know about the world from the group chats,” he said.

Continue reading →

My First Application of FOI in Hong Kong

13 Saturday Jan 2018

Posted by graceli618 in Resources, Tutorial

≈ 1 Comment

Tags

data collection, GIJC17, open data, open government

It was new for me when I heard anyone can acquire almost any data from HK government for legitimate reasons under that Code on Access to Information.

This code is a response to the notion of “FOI” (For Our Information; Freedom Of Information), which calls for citizens’ free access to government information so that the transparency of government management can be ensured and citizen rights can be protected.

According to Wiki, In 2006, nearly 70 countries went through relative legislation. Among these laws are USA’s FOIA (Freedom of Information Act) and, of course, Hong Kong’s Code on Access to information.

Despite the code in place, a practical question remains. Will those government officers fulfill their duty and do give reply to every single data request? So I decided to give a try on Accessinfo.hk.

Accessinfo.hk is a website positioned as a platform for citizens to post their information requests to authorities and receive feedback. It was initiated by a group of Open Data activist, including Guy Freeman, who is currently data scientist in HK01. The website publishes every question and answer to everyone, and, at the same time, monitors the process. Before localizing the Alaveteli system ( http://alaveteli.org/ ) to Hong Kong, its sister site WhatDoTheyKnow ( https://www.whatdotheyknow.com/ ) had already seen wide application in the United Kingdom.

accessinfo

Screenshot: accessinfo.hk

Continue reading →

網絡數據包分析:從 Google Maps 獲取 Fusion Table 原始數據

26 Sunday Nov 2017

Posted by Pili Hu in Tool

≈ Leave a comment

Tags

data collection, Network Analysis, Scraping

網上經常見到使用 Google Maps 繪製的地圖,如果希望對地圖中的興趣點(Point of Interest,POI)進行二次分析,就需要得到繪製地圖背後的結構化數據。如果是使用 Google Fusion Table 繪製的地圖,可以通過網絡抓包找到 Fusion Table 的ID,進而拼接出原始地址。本文來自同學 Lam Man Kit 的投稿,僅做技術交流。數據記者在使用時,需要注意原始數據的版權。而本地的研究者也需要遵守公平使用原則。本文以 FactWire 的數據報道 「分析182個領展停車場月租收費 9成2貴過房委會同區 最大差距達1.18倍」 爲例。

圖:通過網絡抓包分析 Fusion Table 的 ID

Continue reading →

wget最簡爬蟲:一行命令助攻調查記者

06 Monday Nov 2017

Posted by Bobo Wei in Resources, Tool

≈ Leave a comment

Tags

crawler, 爬蟲, data collection, scraper, wget

書寫爬蟲已經成爲數據記者的必備技能。雖然有諸如ScrapingHub、Morph、ParseHub等在線服務,可以一定程度上實現無代碼抓取網頁,但很多時候,還是需要手動編寫爬蟲邏輯。爬蟲書寫分爲兩個部分,第一個是爬,第二個是取。「爬」即是從一個網頁出發,找到它所包含的鏈接,逐一訪問,不斷重複這個過程,最終收穫到需要的頁面。這個過程和人們瀏覽網頁是類似的,有種「順藤摸瓜」的意思。「取」則是從網頁中提取有效信息的過程,將「半結構化」的網頁,轉換爲「結構化」的數據表格。

本文介紹最簡單的爬蟲,只需要一行命令: wget -r

Continue reading →

Top Posts & Pages

  • Create Simple Filled Map (HK) in Tableau
  • Hong Kong’s grave marine plastic pollution long left untackled, Greenpeace statistics show
  • Financial Times Chinese Network editor-in-chief joins colloquium
  • Call for Entries! The 2nd University Data Journalism Competition
  • New towns fail to be self-contained as planned, government data shows

Recent Posts

  • A dossier of data journalism teaching strategies: Words from journalism educators worldwide
  • “中国数据可视化大赛”创作者专访:数据“几人行”
  • “Whoah, wait a minute, every reporter needs to be a data reporter”: Conversations with two generations of data journalists at the Los Angeles Times
  • Aaron Mendelson: Would numbers work with radio?
  • 首届“中国数据可视化大赛”启动-数据中的宏观和微观世界
  • Job Opportunity: Market Information Specialist at Unum Networks

Recent Comments

A quick video I made… on New towns fail to be self-cont…
Erin Chan on Create Simple Filled Map (HK)…
National Congress: s… on “Big Data” Tells Y…
Pili Hu on Data News of the Week | Gender…
Pili Hu on Key Takes from Jessica Lo…

Archives

  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Recent Posts

  • A dossier of data journalism teaching strategies: Words from journalism educators worldwide
  • “中国数据可视化大赛”创作者专访:数据“几人行”
  • “Whoah, wait a minute, every reporter needs to be a data reporter”: Conversations with two generations of data journalists at the Los Angeles Times
  • Aaron Mendelson: Would numbers work with radio?
  • 首届“中国数据可视化大赛”启动-数据中的宏观和微观世界

Recent Comments

A quick video I made… on New towns fail to be self-cont…
Erin Chan on Create Simple Filled Map (HK)…
National Congress: s… on “Big Data” Tells Y…
Pili Hu on Data News of the Week | Gender…
Pili Hu on Key Takes from Jessica Lo…

Archives

  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015

Categories

  • Announcement
  • Announcements
  • Article
  • Book
  • Colloquium
  • comment
  • Event
  • Field Trip
  • Gallery
  • general
  • news story
  • Open Lecture
  • Opinion
  • Resources
  • Tool
  • Tutorial
  • Uncategorized

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Recent Posts

  • A dossier of data journalism teaching strategies: Words from journalism educators worldwide
  • “中国数据可视化大赛”创作者专访:数据“几人行”
  • “Whoah, wait a minute, every reporter needs to be a data reporter”: Conversations with two generations of data journalists at the Los Angeles Times
  • Aaron Mendelson: Would numbers work with radio?
  • 首届“中国数据可视化大赛”启动-数据中的宏观和微观世界

Recent Comments

A quick video I made… on New towns fail to be self-cont…
Erin Chan on Create Simple Filled Map (HK)…
National Congress: s… on “Big Data” Tells Y…
Pili Hu on Data News of the Week | Gender…
Pili Hu on Key Takes from Jessica Lo…

Archives

  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015

Categories

  • Announcement
  • Announcements
  • Article
  • Book
  • Colloquium
  • comment
  • Event
  • Field Trip
  • Gallery
  • general
  • news story
  • Open Lecture
  • Opinion
  • Resources
  • Tool
  • Tutorial
  • Uncategorized

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

  • Follow Following
    • The Data & News Society
    • Already have a WordPress.com account? Log in now.
    • The Data & News Society
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...