• Home
  • Event
  • Article
    • Gallery
    • Opinion
    • Resources
  • About
    • Founding Members
    • Steering Committee
    • Executive Officers
  • Newsletter

The Data & News Society

~ news/numbers; stats/stories

The Data & News Society

Category Archives: Tool

wget最簡爬蟲:一行命令助攻調查記者

06 Monday Nov 2017

Posted by Bobo Wei in Resources, Tool

≈ Leave a comment

Tags

crawler, 爬蟲, data collection, scraper, wget

書寫爬蟲已經成爲數據記者的必備技能。雖然有諸如ScrapingHub、Morph、ParseHub等在線服務,可以一定程度上實現無代碼抓取網頁,但很多時候,還是需要手動編寫爬蟲邏輯。爬蟲書寫分爲兩個部分,第一個是爬,第二個是取。「爬」即是從一個網頁出發,找到它所包含的鏈接,逐一訪問,不斷重複這個過程,最終收穫到需要的頁面。這個過程和人們瀏覽網頁是類似的,有種「順藤摸瓜」的意思。「取」則是從網頁中提取有效信息的過程,將「半結構化」的網頁,轉換爲「結構化」的數據表格。

本文介紹最簡單的爬蟲,只需要一行命令: wget -r

Continue reading →

利用Tableau的JOIN功能篩選完整數據片段

05 Sunday Nov 2017

Posted by zoyazhao in Resources, Tool

≈ Leave a comment

Tags

data analysis, JOIN, Tableau, waste

做數據新聞經常會需要處理大量缺失數據(Missing Data)。如果原始數據是一張二維表格,那麼這張表格中有很多「空洞」,我們常常希望過濾掉這些「空洞」,留下整行整列,以便在一個限定的範圍內,進行完整的分析工作。

本文來自同學Zoya的投稿,目的是用地圖展示各個國家市政垃圾收集的數量。原始數據來自UN Municipal Waste Collection Dataset,年份覆蓋並不完全(Missing Data)。爲了統一對比標準,項目最終選擇篩選出2002到2012年(共11年)均有數據的國家,再繪製地圖。本教程展示了兩種方法,均有值得借鑑的技巧。法一組合利用Excel、Open Refine、Tableau的基礎功能,最後使用Tableau的JOIN操作,實現了缺失數據的過濾。法二則針對本用例的特殊性,直接在Tableau內部完整整個數據流,用到了「# of Records」這個特殊的計算量。

1

圖:原始數據截圖

Continue reading →

eCharts Workshop

30 Wednesday Aug 2017

Posted by Bobo Wei in Event, Resources, Tool

≈ Leave a comment

Tags

data, Data Journalism, echart

(This is a repost from initiumlab.com, click the link to read the original: eCharts Workshop)

Time: 4:00 pm – 5:30 pm, Sep 06 (Tue), 2016

Location: Hong Kong

Address: Unit 1907, Prosperity Millennia Plaza, 663 King’s Road, Hong Kong

Speaker: Dorothy Zhang

Summary: eCharts is an easy, powerful charting and visualization solution offering interactive and highly customizable charts to present data. In this workshop, our speaker will illustrate some typical examples and present the key implication applying ECharts on news reporting. Editors who have no programming background would have a general idea about the principle as well as make a simple but customized chart on their own. The program code for those charts is provided too.

Key words: #eChart #DataVisualization #DataJournalism

Continue reading →

Map Visualisation for Panama Papers and Offshore Leaks in R

29 Tuesday Aug 2017

Posted by Bobo Wei in Resources, Tool

≈ Leave a comment

Tags

data, Data Vis, open data, R

(This is a repost from initiumlab.com by Charlie Chen, Chao Tianyi, click the link to read the original: Map Visualisation for Panama Papers and Offshore Leaks in R)

On May 9, 2016, the International Consortium of Investigative Journalists (“ICIJ” in short), a global network of journalists who collaborates on in-depth and investigative stories, released the long awaited offshore entities database behind Panama Papers Investigation.

So far, the offshore leaks database published by ICIJ includes at least 200,000 offshore entities from Panama Papers, and over 100,000 records from ICIJ’s previous investigations.

The offshore leaks database contains detailed contact postal address of all kinds of entities, offshore or non-offshore ones, officers and intermediaries, based on which we made colored maps revealing the distribution of postal address of people or companies involved in the offshore industry.

Continue reading →

小工具有大作用-輕鬆截屏剪裁

29 Tuesday Aug 2017

Posted by Bobo Wei in Resources, Tool

≈ Leave a comment

Tags

Apps Lab about Tech

(This is a repost from initiumlab.com, click the link to read the original: 小工具有大作用-輕鬆截屏剪裁)

截圖,是全球電腦使用者都熟悉的功能。但是,常用不代表精通。我們整理了平日常用的進階截圖技巧,或許能為大家打開新世界的大門。

技能一:全網頁截圖

截圖大家都熟悉,可是如果想要截取全網頁,而這個網頁比屏幕大怎麼辦?比方說,端傳媒的深度報導,篇篇精彩,但是大都幾千字的篇幅,想要截圖保存,能否快速搞定?

我們可以利用Firefox來實現這個想法。

第一步:打開 Firefox,打開目標網頁,調整到理想的視窗比例。

第二步:進 Tools -> Web Developer -> Toggle ToolsFirefox-capture-1.png

第三步:在彈出的窗口點擊小齒輪,選擇「Take a fullpage screenshot」,然後點擊「照相機」按鈕。

第四步:從此之後,只要打開控制台,點選照相機即可。文件會下載到 Downloads 文件夾內。

Continue reading →

Time Saving One-Liners for Journalists

29 Tuesday Aug 2017

Posted by Bobo Wei in Resources, Tool

≈ Leave a comment

Tags

data, Data Journalism

(This is a repost from initiumlab.com, click the link to read the original: Time Saving One-Liners for Journalists)

For journalists who do not code, after reading this article, you will master 15 one-line commands that can help you handle complex problems in seconds.

Initium Lab has collected various command line tricks as we hack journalism with technology. Here is our editor’s choice so far including: image processing, video processing, PDF manipulation, social network hacking and other useful one-liners. Just open your Terminal, and follow the steps.

1.png Continue reading →

關係圖表速成方案:Google Fusion Tables & Kumu

28 Monday Aug 2017

Posted by Bobo Wei in general, Resources, Tool

≈ Leave a comment

Tags

data, data analysis, Google Fusion Tables

(This is a repost from initiumlab.com by Chao Tianyi, click the link to read the original: 關係圖表速成方案:Google Fusion Tables & Kumu)

2015年下半年,中紀委反腐風暴席捲石油界,多名巨頭遭到中央拘捕和調查。相同時期,香港商人徐京華在北京一家酒店被帶走。徐身份神秘,擁有多個國籍和化名,長期從事中國和非州兩國石油貿易。端傳媒曾製作重磅調查報導,揭開徐京華的跨國能源帝國如何運作,以及他和中石化的關係。調查過程中,記者挖掘出了大量人物之間、公司之間、人物與公司之間的材料,此時,繪製一張關係圖顯得非常必要,藉此,記者可以深度挖掘隱匿在網絡中的信息。

如何迅速製作出一張圖呢?本文會提供 Google Fusion Tables 和 Kumu 兩種方案。

不論採用哪種工具製作,最初都需要將原始資料從 research notes 整理成結構化的信息。

raw-research-notes.png

Continue reading →

整日做表沒思路?Google幫你開腦洞

28 Monday Aug 2017

Posted by Bobo Wei in general, Resources, Tool

≈ Leave a comment

Tags

Excel, Google Sheets, Ideation, Source

(This is a repost from initiumlab.com by Chao Tianyi, click the link to read the original: 整日做表沒思路?Google幫你開腦洞)

一個數據分析項目,往往始於摸索一張龐大而陌生的數據表格。嘗試畫些簡易圖表是啟發思路的好辦法,但這絕非易事──想想有多少種圖表:直方圖柱狀圖餅圖折線圖雷達圖……若選取不同的列來分析,每種圖表又有多達幾十種畫法。

Google Sheets 最近推出了一項新功能 Explore,或許能成為探索陌生數據的第一步。Google 介紹稱,Explore 能根據表格內容自動生成盡可能多的圖表,還順帶做些數據分析,挖掘數據的關聯和趨勢[1]。

Explore 的使用方法很簡單,一次點擊就能自動出圖,不時還會附上(它認為)有趣的發現,例如列舉最大值和最小值等。

explore.gif

Continue reading →

Google Sheets 技巧總結

28 Monday Aug 2017

Posted by Bobo Wei in general, Resources, Tool

≈ Leave a comment

Tags

Google Sheets

(This is a repost from initiumlab.com by Chao Tianyi, click the link to read the original: Google Sheets 技巧總結)

與 Excel 一樣,Google Sheets 也是常用的數據清洗和分析工具,不同的是, Google Sheets 還支持在線協操和實時保存功能。當需處理的數據量級不是特別大,或是分析難度不是特別高時,Google Sheets 可以算作最順手、最高效的工具了。以下,我們總結出6個 Google Sheets 的使用技巧,幫助大家更加熟練地掌握它。

Copy and Paste Special

在 Google Sheets 的每個格子裡,數據存在的格式並不僅僅是文字或數字這麼簡單,有時它可能是一串公式,有時又會是邏輯判斷。所以,當需要使用複製粘貼功能時,別忘記確認你粘貼過去的內容是什麼。如果你需要的只是格子裡的值(value,即數字或文字)而非一串公式,那你需要在粘貼時點選「Paste Special」-「Values Only」(或者採用 Command+Shift+V),這樣,就可以防止粘貼一堆亂碼去新的表格了。

paste-special.png

Continue reading →

編輯必備!4個助你挖掘價值訊息的選題工具

25 Friday Aug 2017

Posted by Bobo Wei in Resources, Tool

≈ Leave a comment

Tags

Apps Lab about Tech, Data Journalism, Ideation, New Media, open data, Source

(This is a repost from initiumlab.com by Hu Pili, Ren Xinya, click the link to read the original: 編輯必備!4個助你挖掘價值訊息的選題工具)

從事編輯工作,經常會爲「選題」頭痛 。在巨量資訊中,如何能夠大海撈針、找到最有價值信息?在這裏,我們為大家推薦四款選題必備「神器」:

1. Feedly

一款經典的Feed訂閱工具。如果想提高閱讀質量,僅在社交網絡上被動接收訊息是遠遠不夠的。許多媒體、博客、網站都提供RSS Feed(目前多用Atom Feed),可以用Feedly來彙總。除了方便訂閱、歸類、追蹤之外,Feedly還提供社交熱度的指標,協助發現趨勢議題。此外,分享到Twitter、收藏到Evernote等都很方便。目前,高級搜索、關鍵詞監控等高級功能需要付費,並且免費版也有訂閱數量限制。

Feedly.png

Continue reading →

← Older posts
Newer posts →

Top Posts & Pages

  • Create Simple Filled Map (HK) in Tableau
  • Google Sheets 技巧總結
  • Analyze tracking errors between ETF and stock market index in the last decade
  • The Setup of D&N Society
  • Abortion in China: policy and data

Recent Posts

  • A dossier of data journalism teaching strategies: Words from journalism educators worldwide
  • “中国数据可视化大赛”创作者专访:数据“几人行”
  • “Whoah, wait a minute, every reporter needs to be a data reporter”: Conversations with two generations of data journalists at the Los Angeles Times
  • Aaron Mendelson: Would numbers work with radio?
  • 首届“中国数据可视化大赛”启动-数据中的宏观和微观世界
  • Job Opportunity: Market Information Specialist at Unum Networks

Recent Comments

A quick video I made… on New towns fail to be self-cont…
Erin Chan on Create Simple Filled Map (HK)…
National Congress: s… on “Big Data” Tells Y…
Pili Hu on Data News of the Week | Gender…
Pili Hu on Key Takes from Jessica Lo…

Archives

  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Recent Posts

  • A dossier of data journalism teaching strategies: Words from journalism educators worldwide
  • “中国数据可视化大赛”创作者专访:数据“几人行”
  • “Whoah, wait a minute, every reporter needs to be a data reporter”: Conversations with two generations of data journalists at the Los Angeles Times
  • Aaron Mendelson: Would numbers work with radio?
  • 首届“中国数据可视化大赛”启动-数据中的宏观和微观世界

Recent Comments

A quick video I made… on New towns fail to be self-cont…
Erin Chan on Create Simple Filled Map (HK)…
National Congress: s… on “Big Data” Tells Y…
Pili Hu on Data News of the Week | Gender…
Pili Hu on Key Takes from Jessica Lo…

Archives

  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015

Categories

  • Announcement
  • Announcements
  • Article
  • Book
  • Colloquium
  • comment
  • Event
  • Field Trip
  • Gallery
  • general
  • news story
  • Open Lecture
  • Opinion
  • Resources
  • Tool
  • Tutorial
  • Uncategorized

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Recent Posts

  • A dossier of data journalism teaching strategies: Words from journalism educators worldwide
  • “中国数据可视化大赛”创作者专访:数据“几人行”
  • “Whoah, wait a minute, every reporter needs to be a data reporter”: Conversations with two generations of data journalists at the Los Angeles Times
  • Aaron Mendelson: Would numbers work with radio?
  • 首届“中国数据可视化大赛”启动-数据中的宏观和微观世界

Recent Comments

A quick video I made… on New towns fail to be self-cont…
Erin Chan on Create Simple Filled Map (HK)…
National Congress: s… on “Big Data” Tells Y…
Pili Hu on Data News of the Week | Gender…
Pili Hu on Key Takes from Jessica Lo…

Archives

  • August 2019
  • June 2019
  • April 2019
  • March 2019
  • February 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • June 2018
  • May 2018
  • April 2018
  • March 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • August 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • July 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015

Categories

  • Announcement
  • Announcements
  • Article
  • Book
  • Colloquium
  • comment
  • Event
  • Field Trip
  • Gallery
  • general
  • news story
  • Open Lecture
  • Opinion
  • Resources
  • Tool
  • Tutorial
  • Uncategorized

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

  • Follow Following
    • The Data & News Society
    • Already have a WordPress.com account? Log in now.
    • The Data & News Society
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...