Summary: Hong Kong is a commercial hub that never sleeps in Asia. There are numerous restaurants feeding workaholics who work overtime and party animals who indulge in strongly beating music and alcohol at midnight. We search on OpenRice, the most popular dining guide website in Hong Kong, and find that there are 942 restaurants still open after 11.30am in Hong Kong. Among them, we crawl the information of 250 most popular ones among them in order to pitch an overall scene of Hong Kong’s midnight dinner.
We try to figure out four points below:
- Where to hunt midnight food in Hong Kong?
- How much do you need to pay a meal at midnight?
- What kinds of food are provided at midnight?
- What kinds of restaurants can you choose at midnight?
After that, we make a recommendation on midnight restaurants based on our analysis results.
How do we get data from OpenRice?
Firstly, We search the restaurants which are still open after 11:30 pm.
It’s suggested that 942 restaurants are in business after 11.30 om but only the top 250 popular restaurants are shown on the website.
Then, We crawl the locations, the price range per person per meal, the food type, the restaurant style and the number of “like” of the 250 restaurants.
Here is our codes:
The first step is defining the main function to get names(shown as “title” in the coding page), locations, price ranges, restaurant styles(shown as “country”), food types(shown as “material”) and the number of “like” of all the restaurants on one page.
Break down:
In the first step, we use “find_all” to find the “location” on one page.
Using the similar methods to find other targeted items on one page.
The second step is defining a function to clean the information we get.
The third step is inputting the URL of the first page and calling the main function to get the clean content of the first page.
The fourth step is using a for loop to get the clean content of the next pages.
The last step is outputting the information we get.
Analysis of the outcome
After getting all the information we want, we use Excel to analyze it and use PhotoShop to visualize the outcome.
Where to hunt midnight food in Hong Kong?
By analyzing its location, we found that midnight restaurants are centered at CBD or shopping centers. There are 53 restaurants in Tsim Sha Tsui, 44 in Causeway Bay and 44 in Mong Kok, Restaurants in the three areas totals 141, almost reaching two-thirds of our sample restaurants. It is interesting to notice that Kwun Tong has more restaurants compared with Central because the latter one is reputable for rich nightlife in Lan Kwai Fong.
In a broader aspect, the number of restaurants in Kowloon is twice of that in Hong Kong. There are only 21 restaurants open in New Territories, which nearly occupies seventy percent of land area in Hong Kong, and there is not even one restaurant opened in Outlying Islands at midnight, which consists of one peninsula and 236 islands. Visitors in Lamma Island and Cheung Chau may have difficulty in feeding themselves at midnight.
How much do you need to pay a meal at midnight?
The vast majority of restaurants cost you 101 to 400 HKD for a meal at midnight. If you want a meal at a cheaper price, there are 19 options. Among them, the Italian dessert restaurant Cafe Paradise(Click the restaurant’s name for detailed information) gets 310 OpenRice likes, ranking No.1.
The only one that charges less than 50 HKD on average is in Yau Ma Tei and features Hong Kong style rice soup such as Lobster Rice Soup and Pork Rib and Rice Soup(Click the restaurant’s name to find detailed information:村爺爺龍蝦湯.泡飯.燉湯專家).
Still, there are 12 restaurants that cost people more than 400 HKD per meal. Among them, CHEF STAGE de Eddy Chu(Click the restaurant’s name to find detailed information) is the most popular one, which serves western style food and alcohol and gets 558 OpenRice likes. If you have an important appointment and want to try a relatively high-end restaurant at midnight, you can take it into consideration.
What kinds of restaurants are provided at midnight?
If you are a consumer who put much emphasis on dining atmosphere, you have a wide range of options at midnight. Forty-nine restaurants serving in western style are still in business after 11.30pm and there are 45 Japanese restaurants, 27 Italian restaurants, and 22 Korean restaurants at midnight.
In a broader perspective, putting the 17 restaurants tagged as “International” aside, the rest 233 restaurants can be divided into three major types: Euramerican, Asian and Chinese.
As we can see, Euramerican restaurants(101) occupy the biggest part of our sample restaurants and most of them are Italian restaurants. Then follows Asian restaurants(84). Among them, Japanese restaurants are the most common ones and it’s relatively difficult for a midnight food hunter to find Southeast Asian food such as Indian or Vietnamese food.
Chinese restaurants occupy the smallest part while the majority of them are Hong Kong style restaurants or Guang Dong restaurants that suit the taste of the local.
There are only 1 Hunan restaurant and 2 Sichuan restaurants that are still open at midnight. For people who have a spicy tooth, it is really difficult to find a midnight meal that suits their tastes.
What types of food are provided at midnight?
In our sample, there are 223 restaurants that provide their food type information. Among them, 43 restaurants serve seafood, which is definitely the most popular food type in Hong Kong, and 35 serve hot pot. Twenty restaurants serve sweet food such as dessert and Cantonese sugar water at night.
If you want to keep your figure have a low-calorie meal at midnight, there are four restaurants serving vegetarian food, three of them having meatless menus and two of them serving salad. Among them, Deer Kitchen(click the restaurant’s name for detailed information) is the most popular one, serving Italian salad and charging you 101 to 200 HKD per meal.
Restaurants Recommendation
Compared with the chilly-head, the sweet tooth has a wide range of options from Spanish dessert restaurants to French dessert restaurants, from the ones in Tsim Sha Tsui to the ones in Yuen Long. In case you are overwhelmed by the options, we list the information of the three most popular dessert restaurants. Click their names and you can book seats if you want:
Dessert restaurants | Location | Price | Restaurant Style | Food type | The number of OpenRice likes |
Espuma | Tsim Sha Tsui | $101-200 | Spanish | dessert | 465 |
Espuma | Cental | $101-200 | Spanish | dessert | 461 |
Cafe Paradise | Prince Edward | $51-100 | Italian | dessert | 310 |
For the same reason, We list the information of the three most popular seafood restaurants and the three most popular hotpot restaurants:
Seafood restaurants | Location | Price | Restaurant Style | Food type | The number of OpenRice likes |
西餐大排檔 Tai Pai Tong Hong Kong | Causeway Bay | $201-400 | Western | seafood | 527 |
The Captain’s House | Tsim Sha Tsui | $201-400 | Western | seafood | 377 |
肚餓了串燒烤焗專門店 Hungry Bird | Tsim Sha Tsui | $101-200 | Hong Kong style | seafood | 372 |
Hotpot restaurants |
Location | Price | Restaurant Style | Food type | The number of OpenRice likes |
八幡屋涮涮鍋 Yahataya Shabu Shabu | Mong Kok | $101-200 | Japanese | hotpot | 282 |
101手工涮涮鍋 101 Grill Bar + Hot Pot | Causeway Bay | $201-400 | Taiwan | hotpot | 273 |
酒鍋 The Drunken Pot | Tsim Sha Tsui | $201-400 | Guangdong | hotpot | 244 |
Useful Knowledge Points:
Using a for loop to tackle web page scrolling issue.
If you want to get multipage information, you can define a main function that can get the content of one page and use a for loop. In the for loop, you can get URLs of multiple pages and then call the main function so that you can scrap all the pages.
Why can we use a for loop to get URLs of multiple pages? Because they change in a specific manner. In our case, URL of the first page is:
https://www.openrice.com/zh/hongkong/restaurants?Seat=2&BookingDate=2018-02-23&TimeSlot=23%3A30
the url of the second page is:
url of the third page is:
… …
Have you noticed it? You can add “&page=2” at the end of the first page’s url to get the second page’s url. Once you can the second page’s url, you can use a for loop to get the rest pages’ urls.
It should be noticed that multiple pages’ urls on different websites change differently, our pattern can’t be used in general. You should find out the specific manner how the urls of your targeted pages change firstly.
Disquise yourself to scrap anti-crawler websites.
Some websites refuse to be scrapped by detecting our user agent. User agent can be simply regarded as the name of our browser. We can disguise ourselves by modifying the “headers” parameter of the “requests.get”.
If the information you want to scrap is not that confidential, use the syntax we provide below is enough.
r = requests.get (‘put your url here’, headers = {‘user-agent’: ‘Mozilla/5.0})
Use the list method extend()
The difference between the list methods “append()” and “extend()” is that:
“append()” appends object at end.
x = [1, 2, 3]
x.append([4, 5])
print (x)
The syntax above gives you: [1, 2, 3, [4, 5]]
“extend()” extends list by appending elements from the iterable.
x = [1, 2, 3]
x.extend([4, 5])
print (x)
The syntax above gives you: [1, 2, 3, 4, 5]
Codes and data
Interested readers can download codes and data here: Group 1-Openrice
Notes from Lecturer
This work encountered and addressed two common challenges in building scraper: 1) Handle pagination; 2) Fake user-agent string. Websites usually adopt a certain pattern for their pages, e.g. using “&page=xxx” parameters. One can use string concatenation or string formatting to spell the full URL to crawl. This is how we gradually start the “big data” journey. Computer programs are good at doing repeated jobs, non-stop. Once we find a way to crawl one page, we can use a loop to crawl multiple pages. Once we find a way to crawl all the pages under one search term, we can use a loop to crawl all search terms. Then the small data becomes big. As to the second issue, there is endless discussion about anti-crawler and anti-anti-crawler. Before we address the technical solution, one should first note the ethics of web crawling. It is very like taking pictures non-interviewees. Shall you do that or not depends on public interest and privacy policy of a jurisdiction area. For our exercise, please reserve the data for educational and research purpose and observe the fair use principle. For the technical side, the most common problem is browser detection. One can tell “requests” module to imitate a browser by changing the user-agent string.
The group spent good efforts in analysing the data using the techniques they mastered before. They used a wide range of visualisations, from pictograph to map. This is a typical multi-dimensional dataset and the group approached every dimension by 1-D analysis. This sets good foundation for future classes. One can try to reproduce the same analysis using Python after learning pandas.
— Pili Hu (Mar 2, 2018)
Author/ZHONG Dai, CUI Simeng, TAN Yiqin
Editor/ Yucan Xu, Pili Hu