How researchers around the world are making use of Weibo data

By Shulin Hu, Digital Humanities MA/MSc student, UCL

Zoufan posted her last words on Weibo on 18, March, 2012. She was suffering from a major depressive disorder, and shortly after - committed suicide. Weibo is a microblogging application, launched by Sina Corporation back in 2009, based on user relationships to share, disseminate and get information. In essence, it is similar to Twitter, although it has a number of other useful capabilities. The app has more than 400 million users (compared to Twitter’s 300 million) and features that enable the study of emotional states and responses to the topics being discussed or spread across the web.

Since Zoufan’s last post, other Weibo users gradually  found her account and continued to share their emotions or stories of depression as comments.There is  more than one million now. This caught the attention of Tingshao Zhu and his colleagues from the Chinese Institute of Psychology. Earlier this year, they started investigating this case and devising a strategy for how they could use the weibo data to connect with patients to prevent other suicides.

They found that a significant number of patients with depressive disorders show their suicidal thoughts by posting anonymously. The researchers used Python to scrape and analyze the commentary text and further discovered that those who experience suicidal ideation, interact with others less, and are more inward looking. Specifically, the proportion of emotionally positive words is less than 5%, and the proportion of negative words is more than 80%. They rarely express thoughts about “family” and “future”, but mention “death” and “freedom” frequently.

In a similar study about emotional states on Weibo, Sinan Yang & Jian Xu used BDP to visualise the flow in polarity over time and quickly detect the location of peaks and wanes in the general population’s emotional state. As with many other tools in this space, individuals can use the service of BDP for free but it will be limited in the permitted numbers of API calls and data storage (only 100M). For more storage and diverse visualization options, there are  premium options (details below). Other tools that researchers use to work with weibo data include WeiboEvents by PKUVIS, Weiboreach, Gooseeker and Weibostats.

WeiboEvents by PKUVIS, for example, is great for studying the spread of information and how accounts gain their popularity. One researcher in particular was interested in figuring out how one account for an inanimate object was becoming so popular. Feng Xian used PKUVIS to study @Yutu, the weibo account for the rover that reached the Moon in December 2013, exploring the surface in search of natural resources.

Parasocial interaction

Yutu literally means “jade rabbit”, which refers to the pet rabbit of the Moon goddess in a Chinese myth. On Weibo, @Yutu has a following of over 730,000. It continues to post updates and news of its discoveries, as well as cute cartoons about its history and generally about the universe, explaining complex concepts in a visual way.

nasa-45068-unsplash.jpg

In February 2014, it briefly went quiet during the lunar night, but after recovering from some mechanical difficulties (which were actually happening to the real rover on the moon), it posted the message: "Hi, anybody there?", “I'm the rabbit that has seen the most stars!” This post attracted more than 840,00 reviews and 151,000 likes. A fascinating dataset for research.

Feng Xian grabbed the data from the reviews and reposts and extracted the characters relating to emotional expressions as well as the emojis. He found that 60% of the users post compliments about its joyful ‘personality’ and 19% users were encouraging the rabbit/rover to keep going (as if it were a real person) when the rover itself was facing technical problems on the Moon. The reposting level also indicates a high penetration of Weibo content to the targeted audience. The researcher looked at six layers of reposting on the microblog: the direct reposting number is 2231 (40%), the secondary reposting number is 1780 (32%), and the next four are 735 (13%), 231 (4%), 111 (2%), 490 (9%), indicating that after original reposts by some users, their friends will keep forwarding it based on social relationship circle, similar to a virus spreading.

The authors also investigated the interaction model between the rover account and social media users, to find out how to balance the personified mood with scientific knowledge about the exploration of the universe. Specifically, instead of exhibiting the attitude of imparting professional knowledge as an emotionless machine, this account established an equal relationship with the audience during the virtual interaction process, which helped mobilize their enthusiasm to participate in the discussion. Besides, @Yutu also combines new stories of space rover with classic context of the Moon in China, upgrading its traditional meaning and triggering the dissemination to a further scope.

Other uses of Sina Weibo data

In a different study, a group of researchers from Hong Kong used Sina Weibo data to analyze misinformation. They collected both Twitter and Weibo data to understand the levels and spread of Ebola misinformation in 2013-2014. The researchers had to write a script in order to scrape Weibo data, as an API was not available at the time. They found that only 2% of their sample contained misinformed treatment options, compared to perhaps 50%+ reported in other studies looking at the misinformation spread of Ebola treatments in Guinea, Liberia and Nigeria during the same year.

Since 2014, when the above study was conducted, Sina Weibo released an official and free API for its raw data, and here’s an (English) step-by-step guide to using it. A team of researchers from Wuhan University used the API to extract information about hot spots and movements across the city in order to help with urban planning. Although the API made their data collection easier, they were faced with the challenge of having to request permission from users for this type of data.

There are many more studies and research teams using Sina Weibo data, analyzing behaviors, trends and the spread of information through this network. What is more fascinating, though, and quite different from studies of Twitter and Facebook is the emoji feature on Sina Weibo and the ease of using this response, along with the ‘likes’ to understand people’s tendencies and trends in expressing emotions in response to events or posts. The example with the Zoufan account at the beginning of this blog is most pertinent.

Ultimately, Tingshao Zhu and his colleagues from the Chinese Institute of Psychology wanted to prevent suicides, with more than 300k Chinese ending their lives every year. So they built an algorithm, trained with manually tagged data from the responses to the Zoufan posts, in order to recognize people with high risk of suicide among numerable updates on Weibo and classify the severity automatically. His team aims to use this algorithm, combined with their training in psychology to identify people at high risk of suicide and reach out to provide the support they need. Till now they found 4222 users with depressive disorders and provided further advice for them, which we hope will have a profound influence on treating depression in the long run.

About

ShulinHu.jpg

Shulin Hu is a masters student from the Digital Humanities programme at University College London, and currently working with the SAGE Ocean team. She graduated from Communication University of China, used to be an exchange student at Fu Jen University in Taiwan, majoring in Journalism and Communication. She had internship experiences in China Review News Agency of Hong Kong, Xinhua News Agency, Caixin Media and BMW Brilliance Automotive Ltd., worked as a reporter, a news live director and an intern of public relations department.
You can contact  Shulin by e-mail:  shulin.hu@ucl.ac.uk


More information about Sina Weibo

Weibo is a microblogging application, launched by Sina Corporation on August 14th, 2009, based on user relationships to share, disseminate and get information. Generally, it is similar to Twitter. Sina Weibo has become one of the top 2 social media platforms in China now. As of Q3 2018, it has over 445 million monthly active users.

Other special functions:

Weilingdi (literally, micro fief) : a location-based social networking website based on software for mobile devices, similar to Foursquare

Tuding (literally, pin):  a photo-sharing service, similar to Instagram.

Sina Lady Weibo: specializing in women's interests.

Weibo Data Center (a paid service): in which users can obtain the data of the hot topics they like, check the official data of company accounts, or monitor the data changes of their own account

Data collection tools/applications for Weibo:

WeiboEvents by PKUVIS

A free software, providing both Windows and Mac versions. Developed by: A visualization analysis group from Peking University (Its official website in English: http://vis.pku.edu.cn/)

Functions:

  • Monitor the real-time data and grab raw data for deeper analysis.

  • Data visualization: Each node represents a post of microblog, the connection represents the forwarding relationship, and the process of microblog propagation can be extracted. In the right dashboard, users can choose to filter the keywords contained in content posted before or display the selected data and create customized report.

  • Academic use: Researchers use the application to critically evaluate the cases in journalism and communication field, for example, to locate the reversal point of public opinion within a controversial news, or find the rationales of rapidly spread about heated-discussed topics by observing the propagation routines on social media.

SHuexample1.jpg

Weiboreach

A web-based tool, planning to launch an app on mobile phones in the near future.

Developed by: A technology company, Zhiweidata. co. Ltd

Their business model includes both B2C and B2B.

On the one hand, they cooperate with media research centre in universities such as Communication University of China and Chinese Academy of Sciences, help them with the projects about data mining and media-related analysis. Meanwhile the commercial companies such as Adidas China and Dell China buy their service to launch advertisements more accordingly and attract higher exposure.

On the other hand, for individual researchers, they can buy the service independently as well. The price is various due to complexity of analysis. For example, when the number of reposts is less than 2000, the service will charge ¥0.02 per repost.

Functions:

  • Data analysis: based on the exposure, the user's total rating (weighting average of indicators such as user activity and follower volume), emotional value (positive and negative emotions), and content analysis (propagation tracking).

  • User profile: generate the data of users by gender, districts, active degree, etc.


Shu2.jpg
Shu3.jpg

BDP

Individual users can use the service of BDP for free but it will be limited in the permitted numbers of API and data storage (only 100M). For 1G storage space and linked with 8 API, it cost approximately £8 per month. While for 10G storage and 50 API, with VIP service including technology consulting and more diverse visualization output, users need to pay about £30 per month. More than 1000 business companies such as Nielsen, Alibaba, etc. reached partnership with BDP.

Developed by: Haizhi Network Technology Co. Ltd, a start-up founded in 2013.

Functions:

  • Help researchers deal with databases in an easier way.

  • No need to write python code or SQL, just drag and drop through the dashboard then you can complete professional functions as a data expert.

  • Multi-table association: Drag and drop to achieve multi-table association (join), which is simpler and more convenient than VLOOKUP.

  • Append merge: multiple tables with the same structure, quickly integrated into a new table, eliminating the need for Ctrl + C / Ctrl + V.

These applications can allow researchers to collect and analyse data from the whole Weibo retrospectively, no matter when they start to use it. More information about other tools is shown as following:

Gooseeker

Extracts raw data from the whole database of weibo, covering user information, hashtags, blog post/review/like information, etc.

WeiboStats

An easy analytics dashboard for industry intelligence on Weibo. Developed by: KAWO Co.Ltd, a start-up founded in 2013. Designed for digital agencies, researchers and journalists to use WeiboStats for reports and insights on Chinese Social Media. Users can easily track the performance of multiple Weibo accounts for free.

Existing Weibo datasets:

(mainly for academic use)