Category: general

Data News of the Week | Paradise Papers

Do you still remember the massive Panama Paper leak in 2016? When 13.4 million financial documents were released in this November, the offshore paradise islands got global attention again. Paradise Papers cover the time period from 1950 to 2016, including the more than 120,000 people and 25,000 offshore companies.

Tech-savvy readers can jump to the database directly. Like before, the dataset is modelled as a graph, namely treating the Officers, Intermediaries and Addresses as nodes and their relationships as links. Neo4j is one widely adopted graph database. Its web user interface, called “neo4j browser”, allows journalists to visually expand and explore a graph. The query language “Cypher” is a superset of relational query (SQL), full-text search and graph pattern matching. Its flexibility and built-in graph algorithms allow experienced journalists to systematically study the underlying graph. The download page on ICIJ includes snapshots of four neo4j databases exported in CSV format.

Continue reading “Data News of the Week | Paradise Papers”

Data News of the Week | Power in China

The closing session of 19th National Congress of the Communist Party of China finished this week. New Politburo Standing Committee presented to the Media, putting Beijing in the centre of world attention. This DNW hand-picks recent data news related to Power in China.

25 year’s political path to Power in China [Link]

Bloomberg Politics made an unconventional data visualisation to show The Path to Power in China. Readers can easily tell running a Big Region is important in China, by reading the following line chart. The chart successfully turned categorical position data into ordinal data by sorting the importance, namely number of people who entered Standing Committee from that position.

1

Continue reading “Data News of the Week | Power in China”

關係圖表速成方案:Google Fusion Tables & Kumu

(This is a repost from initiumlab.com by Chao Tianyi, click the link to read the original: 關係圖表速成方案:Google Fusion Tables & Kumu)

2015年下半年,中紀委反腐風暴席捲石油界,多名巨頭遭到中央拘捕和調查。相同時期,香港商人徐京華在北京一家酒店被帶走。徐身份神秘,擁有多個國籍和化名,長期從事中國和非州兩國石油貿易。端傳媒曾製作重磅調查報導,揭開徐京華的跨國能源帝國如何運作,以及他和中石化的關係。調查過程中,記者挖掘出了大量人物之間、公司之間、人物與公司之間的材料,此時,繪製一張關係圖顯得非常必要,藉此,記者可以深度挖掘隱匿在網絡中的信息。

如何迅速製作出一張圖呢?本文會提供 Google Fusion Tables 和 Kumu 兩種方案。

不論採用哪種工具製作,最初都需要將原始資料從 research notes 整理成結構化的信息。

raw-research-notes.png

Continue reading “關係圖表速成方案:Google Fusion Tables & Kumu”

整日做表沒思路?Google幫你開腦洞

(This is a repost from initiumlab.com by Chao Tianyi, click the link to read the original: 整日做表沒思路?Google幫你開腦洞)

一個數據分析項目,往往始於摸索一張龐大而陌生的數據表格。嘗試畫些簡易圖表是啟發思路的好辦法,但這絕非易事──想想有多少種圖表:直方圖柱狀圖餅圖折線圖雷達圖……若選取不同的列來分析,每種圖表又有多達幾十種畫法。

Google Sheets 最近推出了一項新功能 Explore,或許能成為探索陌生數據的第一步。Google 介紹稱,Explore 能根據表格內容自動生成盡可能多的圖表,還順帶做些數據分析,挖掘數據的關聯和趨勢[1]。

Explore 的使用方法很簡單,一次點擊就能自動出圖,不時還會附上(它認為)有趣的發現,例如列舉最大值和最小值等。

explore.gif

Continue reading “整日做表沒思路?Google幫你開腦洞”

Google Sheets 技巧總結

(This is a repost from initiumlab.com by Chao Tianyi, click the link to read the original: Google Sheets 技巧總結)

與 Excel 一樣,Google Sheets 也是常用的數據清洗和分析工具,不同的是, Google Sheets 還支持在線協操和實時保存功能。當需處理的數據量級不是特別大,或是分析難度不是特別高時,Google Sheets 可以算作最順手、最高效的工具了。以下,我們總結出6個 Google Sheets 的使用技巧,幫助大家更加熟練地掌握它。

Copy and Paste Special

在 Google Sheets 的每個格子裡,數據存在的格式並不僅僅是文字或數字這麼簡單,有時它可能是一串公式,有時又會是邏輯判斷。所以,當需要使用複製粘貼功能時,別忘記確認你粘貼過去的內容是什麼。如果你需要的只是格子裡的值(value,即數字或文字)而非一串公式,那你需要在粘貼時點選「Paste Special」-「Values Only」(或者採用 Command+Shift+V),這樣,就可以防止粘貼一堆亂碼去新的表格了。

paste-special.png

Continue reading “Google Sheets 技巧總結”

哥大資料視覺化大師 Soma 在浸會大學的密技分享

(This is a repost from initiumlab.com by Chia ni Liu, click the link to read the original: 哥大資料視覺化大師 Soma 在浸會大學的密技分享)

Data Journalism 是一門極大的學問,從搜集資料、清理、修正、分析、到找到好故事的過程中大概有幾百種可以學的技能或是知識。在有限的時間和精力下,哥倫比亞大學著名的大師 Jonathan Soma 建議記者至少學好設計,使用適當的手法呈現資料,讓讀者能在最短的時間內吸收資訊。那麼資料視覺化都有哪些機密呢?

核心觀念:

  1. 越多的資料想要呈現,讀者能接收到的資訊越少。
  2. 站在讀者的角度思考,只專注在讀者需要知道的資訊上。

Continue reading “哥大資料視覺化大師 Soma 在浸會大學的密技分享”

交流學習@台灣

近日,受國立臺灣大學和國立政治大學的邀請,香港浸會大學新聞系數據新聞項目的5名同學代表,帶著自己的數據新聞作品到台北參加了为期2日的交流學習。

IMG_6028.jpg

(香港浸會大學和國立政治大學的同學合影留念。)

IMG_6245.jpg

(香港浸會大學和國立臺灣大學的同學合影留念。)

Continue reading “交流學習@台灣”

【转】哥大数据新闻专业毕业后,为什么我最后选择做多媒体?

原文转载自微信公众号刺猬公社,作者贾宸琰,原文请点:哥大数据新闻专业毕业后,为什么我最后选择做多媒体?

导读

李沁灵,毕业于哥伦比亚大学新闻学院数据新闻方向,现于IBT Media纽约总部担任Multimedia Producer(多媒体制作人),制作过批评特朗普网络政策的多媒体作品《Net Neutrality》。此外,李沁灵曾经制作了《基督教科学箴言报》封面报道视频《2016:The Year of Disruption》等多媒体作品。

为什么我学了数据,最后没去编程却成为了多媒体制作人?

Continue reading “【转】哥大数据新闻专业毕业后,为什么我最后选择做多媒体?”

Data News of the Week | Sharing Economy & Other projects

The Sharing Economy is a socio-economic ecosystem built around the sharing of human, physical and intellectual resources.

Whilst the Sharing Economy is currently in its infancy, this is only the beginning: in its entirety and the potential, it is a new and alternative socio-economic system which embeds sharing and collaboration at its heart – across all aspects of social and economic life.

The leading businesses that are advancing the concept of the “sharing economy” are in many respects no longer insurgents and newcomers. The size and scale of Uber, Airbnb and several other firms now rival, or even surpass, those of some of the world’s largest businesses in transportation, hospitality, and other sectors. As the economic power of these technology-driven firms grows, there continue to be regulatory and policy skirmishes on every possible front, across cities and towns spanning the United States, Europe and beyond.

Here is one example in China:

Continue reading “Data News of the Week | Sharing Economy & Other projects”

Data News of the Week | 《香港01》的特首競選數據可視化分析

香港特首選舉還有10天,香港各大媒體都時刻關注這個熱點話題,《香港01》更是利用動畫,數據新聞來呈現這次的選舉全過程。繼上次分享了《香港01》的“特首選舉”網頁以之後,網站的專頁內容比起三個月前更是有所增加和修改,這也對從事數據新聞報道提供了可參考的思路。

網站詳情:香港特首選戰2017

1.png

對比3個月前,“焦點新聞”依舊放在首頁,而項目則新增了“特首戰走勢”、“政綱對比”、“篤篤撐”和“選委圖鑒”四個,把原本的互動小遊戲“特首跑馬仔遊戲”換成了“篤篤撐”。

Continue reading “Data News of the Week | 《香港01》的特首競選數據可視化分析”