, ,

SummaryDream of the Red Chamber, also called The Story of the Stone, composed by Cao Xueqin, is one of China’s four great classical novels. Long considered a masterpiece of Chinese literature. The novel is generally acknowledged to be the pinnacle of Chinese fiction. We use the graphics tool to analyze the short path, centralization, structure of degree, clustering coefficient and the clique of those data. And try to find out how closely those characters are connected, and who might be the Social Queen or King in this story.

According to analyst Jiang Qi, there are 448 characters in “A Dream of Red Mansions”. We picked 187 characters who are relatively more important than others in this book and use the graphics tool to analyze these data and get the network below to show their relationship in the Dream of the Red Chamber.


The relationship between the 187 characters in the book

How can 贾母 connect to a Monk?  – (Short Path)


The short path between 贾母 and 空空道人

As we can see the ‘sp’ in the picture, ‘贾母’, as the most powerful lady in this story, actually seldom goes out of 贾府. Seems that she has no reason to get to know ‘空空道人’ a monk in this book. But after the short path analysis, we find that only through two characters, her grandson ‘贾宝玉’ and old family friend ‘贾雨村’, she can connect the monk. And what’s even ironic is that her most cherished grandchild ‘贾宝玉’ even follow the step of ‘空空道人’ and become a monk in the end.

贾宝玉, Social King in this world –  (Centralization)

Everything is about importance.The charts of centrality measure the different kinds of ‘importance’ of the different characters.


The ‘degree centrality’ shows the association between one node to another. The higher the degree, the more people associated with this character. Since the character has more connections with others, he/she is more important.


Closeness centrality of the main characters

The ‘closeness’ shows the distance between every single node, and in this situation, we can see people who are easier to reach another person may be more powerful.


Betweenness of the main characters

As we can see, ‘贾宝玉’ has most connections with others, and he is the easiest one to know others. Since his high degree and high closeness, clearly, he is the most suitable candidate to be the bridge. ‘贾宝玉’ is the key role who connects everyone in the family and also establish the relationship between outside and inside of the family. So the betweenness of ‘贾宝玉’ almost equals to the sum from the top2 to top5.

Combined with the contents of the book, which is related to his class, status, personality, even his gender and age. At the age of the book, only a young nobleman, with the pursuit of freedom, can meet so many people. Also ‘贾宝玉’ has a caring heart, he always shows his cares and respect for ladies which is very rare at that time. And that’s why he could become the center of the book and be remembered by centuries.


Eigenvector of main characters

Another part is ‘eigenvector’ which can be understood as one’s characteristic value. Surprisingly The best known main roles ‘林黛玉’, ‘薛宝钗’ never appeared in the top five, however, ‘袭人’, although she is just a maid, she has quiet higher closeness, since she is actually the concubine of ‘贾宝玉’, and she represents the power from ‘贾母’ and ‘王夫人’.

To conclude, we can just analyze the frequency of the appearance of the characters, we might speculate who is the ‘queen’ or ‘king’ of social, however, the inner relationship and emotion between people could not easily be told.

The ugly truth in the story – (Structure of degree)


Structure of degree

Degree shows how many edges are associated with the nodes, the degree of the nodes ranging from 1 to 124. ‘贾宝玉’, as the main character of the book, has 124 edges connected with him, which is 36 more than the number of the second one‘王熙凤’. It shows the preferential attachment: new nodes or nodes which don’t have many connections with them, are more likely to connect with the nodes which are already very popular. It means characters who are already famous like ‘贾宝玉’can reach more people than those who are not that famous.

It also tells an ugly truth that in real life it is easy to use your fame to earn more fame. You can be easily connected to another if you are famous enough. For the marginal people, something may be as difficult as climbing up to heaven, but for those famous people, it is only a piece of cake.

What makes 贾宝玉 a passerby? – (Clustering coefficient)

Nodes tend to create tightly knit groups characterized by a relatively high density of ties. It shows how a group of nodes tend to cluster together.

图片 9.png

The local clustering coefficient of a node shows how close its neighbors are likely to be a complete graph. We can find ‘贾宝玉’’s clustering coefficient is only 0.096, much lower than other main characters such as ‘薛宝钗’ and ‘史湘云’, which have 0.848 and 0.838 respectively.

It is because ‘贾宝玉’is the main character of the book, as mentioned before, his node can connect with almost every node in the graph, but the nodes he connected may not connect to other nodes, someone know ‘贾宝玉’may don’t know who others are, that means, there is not much complete graph around ‘贾宝玉’, that’s the reason why the clustering coefficient of ‘贾宝玉’ is not the highest one.

Find out the Ingroup inside the 贾府 – (Cliques)

Clique is a type of community. It is the subgraph. People tend to connect and talk to others who are in the same cliques.

There are 298 cliques for 187 characters in the book. Take cliques 200 as an example: The orange nodes in the graph are ‘贾宝玉’, ‘王熙凤, ‘贾母’, ‘王夫人’, ‘琥珀’, ‘贾链’, ‘彩霞’, ‘平儿’, ‘鸳鸯’.

This clique forms a basic traditional family circle, three generations with their own servants.

图片 10.png



Codes and data:

Interested readers can download codes and data here: Group 8 – 红楼梦

Notes from Lecturer

This article is a good reflection of graph analysis techniques demoed in the class. It will be better if the students go beyond class scope and try to find more alternatives analysis methods for those concepts. Say, there are other definitions/ metrics for centrality. For the community detection part, the graph is hard to divide when there are several super connectors, say 贾宝玉 may connect with almost everyone else in the graph, so the whole graph is knitted in one. One may try to remove such nodes first and conduct graph analysis on the remaining ones, when the community boundary is more apparent.

Assuming the data and codes are correct, there are several good insights.

Contrast of 贾宝玉’s degree and local clustering coefficient bring up the insights that the most well connected person may have hard time fostering closer connection around him/ her.

Compare two well connected nodes, 贾母 and 王夫人. 贾母 has higher closeness centrality, meaning that she is more closer to an average other character in the graph. However, 王夫人 has higher betweenness centrality, meaning that when other nodes want to get referral, passing through 王夫人 is in general easier (mathematicaly sense; i.e. shorter SP). As a journalist, befriending 王夫人 is a good idea to help you reach potential interview target. As a PR specialist, convincing 贾母 to talk can help more efficiently spread your message.

There are two major drawbacks. One drawback is lack of introduction of the data. In graph analysis, the way to build graphs weighs more than the models/ algorithms used in computation. One must always make it clear how the graph is constructed, say what are the nodes, what are the edges, and what are the weights on the edges? If the data is from a third party, try to track their source and cite properly. The second drawback is about writing. There are several typos like “path, centralization” (centrality) and “quiet higher closeness” (quite). A thorough proof-reading is necessary before final publication. The objectivity of writing can also be improved. For example, preferential attachment is a common phenomenon and it is too subjective to call it “ugly truth in real life”.

— Pili Hu (April 15, 2018)


Author / FENG Zhixu, GAO Xing, Niu Qizhen

Editor / Yucan Xu