data science, data.gov.hk, hackathon, open data, open source, pandas, Python, sprint
Loads of imagination about programming has been running helter-skelter in my mind before I step into the spacious and well-polished Spectrum studio on the 11th floor of an office building at Sheung Wan in this bright Saturday morning. As someone who has been concentrating only on courses about liberal arts since senior school, I always consider coding as something far away from my daily life.
But today, Chico Xu, Ivy Wang and I, as student reporters, are going to take a glance into this sophisticated business which we once thought was irrelevant to the lives of us and the lives of many, but which actually is, and to a large extent.
The event we are attending is called the Global Pandas Documentation Sprint, a worldwide event held simultaneously in more than 300 countries on March 10, 2018, aiming at improving this Python library’s documentation with clearer explanations and better examples, and trying to leave, at the end of the day, with the library enhanced “in a perfect state,” as put by its official website.
What is Pandas? And why should we, as average people, care about it?
Less than two weeks it has been since we got to know about Pandas, an open source software library contributed and widely-applied by people both inside and outside of the coding world for data manipulation and analysis. Put it in the language of average human beings like us, Pandas is basically a magical computer trick that one can use to process and analyze statistics.
What kind of statistics it can process? All kinds, as long as they’re in the right formats like CSV, TSV and SQL. And what areas it has been using in? Any areas, from stock analysis to pricing management of medical services to hotel ratings, wherever a huge amount of data is considered in an insane speed, like operating a column in 0.02 seconds or even less.
Knocking on the door that prints the word “Spectrum”, we stand outside the venue, waiting in curiosity if it will be someone like Thomas A. Anderson to answer the door. Then here comes a blonde young man, who wears very much like your neighbors, in a gray hoodie, a pair of jeans and sneakers. He is Matthew Barkley, the business development manager of Spectrum, along with him is Shivam Gaur, the software engineer of Droste, who’s in a blue strip shirt that is no more alike a TV-featured geek in any way.
We realized that the sprint is a rather small-scale event when we sit around the long wooden table with about a dozen of seats available in this SOHO office. It was around 11.30 a.m. when we started the event. Before moving into the actual coding part, we are walked through several warming-up steps, which we think are quite newcomers-friendly.
A brief video introduction about the standards and principles of the sprint is firstly presented on the projection with several high profiles in the field talking about the significance of the event one after another. Among them includes Wes McKinney, the creator of Pandas, who pointed out that a thousand contributors have been devoting into the documentation perfection and then expressed his expectation that the presence of “a much larger group of developers” could help to make it better.
What’s more important is that it is not only developers from this closed profession are given credit in this event, green hands are also of great importance to better promote the code-sharing idea. As addressed by Mart van de Ven, the principal data scientist and director of Droste and Spectrum who later joined us remotely in a video call, people who just knew Pandas for one or two weeks could also contribute to it.
“It’s about getting more people involved. It’s not about building the capacity on those who are already in. I think any communities should work that way,” Barkley told Ivy Wang in an interview after the sprint.
Why average people are also paying attention to this world renowned geeks’ event And why should they be?
Because this is “the spirit of sharing, mutual assistance, and collaboration”, as said by Xu, who participated in the writing documentation process and learned from the experienced to “step by step to familiarize with data, functions and tool kits”.
Another participant, Shi Shuai, a 29-year-old software engineer from August Robotics, an Australian-based technology company, said the Pandas Documentation Sprint can benefit the Pandas’ standardization, which furthermore will provide convenience to developers.
“Participants will get to know more up-to-date information about what others have been doing in the community, and learn about new ideas and trouble shooting methods in a few hours of their leisure time,” he added.
Open Source Movement in Hong Kong
This sharing spirit is not limited to the coding community. It has also been reflected in many areas in the Hong Kong society, which is generally known as the “Open Source Movement”–a movement urging governments and a few institutions for sharing statistics that are closely related to the public interests.
Questions about the right to access certain data are not frequently put forward, “but I occasionally get questions that stand from open data,” Barclay said, “(it is) something that could be more regulated.” He also talked about the one of the motivations to hold the sprint, “holding events like this to get to know what kind of questions people would ask, and then initiate conversations around it, might push people to change how they make data available.”
“If it’s transparent then everyone knows what’s happening,” Gaur said in an interview with Ivy Wang.
He added that “business can also consume open data, like open source pollution data and weather data. Liberally it can be consumed at the public level, at the corporation level, even at the government level. Everyone can benefit from it just by putting the data on the proper platform.”
For years, the movement has gone virus in many economies like America, Britain, Dubai and even in the neighbor areas like Taiwan. But It is an oddity that Hong Kong, a city which has been crowned the world’s most competitive economy for two consecutive years since 2016 on account of business efficiency, has not yet fully embraced the open source concept.
Two hours it has been since Gaur started helping us set up the working environment and walking around in the office, solving problems for each attendee. It takes a long time for most participants to catch up with the progress though, the efforts made by organizers has motivated beginners like us three to keep learning about the coding and getting more familiar, as much as we can, so as to do some trivial contributions.
“I’m more familiar with some of the packages’ documentation, which allows me to re-examine the packages that I use every day from the user’s point of view,” Xu said, adding that the key to enhancing the users’ understanding of Pandas is to “allow users to imitate [and do some] hands-on operations”.
The sprint draws to an end at around 5 p.m. when everyone leaves with more knowledge about the Pandas documentation and the Open Source Movement in Hong Kong.
A Long Way to Go
According to an HK01 report in March 2017, the Hong Kong Open Data Portal, a system launched by the government in 2011 to make available public facilities data for free download and value-added reuse, contains a large amount of unusable data. They are either redundant datasets categorized by different languages and times, or data presented in the formats of JPG or PDF that are hard to be read and reprocessed.
Hong Kong government has set agenda to promote the open data progress by including it in the 2017 policy address and allocating more than HK$10 billion for technology innovation. But the slow actions taken by the government makes citizens and promoters like Charles Mok, a legislator represents the Information Technology functional constituency, doubt how long we still need to wait.
But for we three who have approached the community, talked and listened to them, and shared our opinions with them, we understand the key point in promoting the Open Source Movement lies in the participation and sharing of every individual in the Hong Kong society.
Resources and Tutorials of the Sprint
- Video introduction of the Global Pandas Documentation Sprint
- Examples of what we did and contributed in this event: * Added extendible dictionary to do the same for other generic numeric operations in module pandas.core.groupby.
- Pandas documentation
Reporters/ Eudora Wang, Chico Xu & Ivy Wang
Editor/ Chico Xu