Social media data in research: a review of the current landscape

By Lily Davies, Digital Humanities MA/MSc student, UCL


Social media has brought about rapid change in society, from our social interactions and complaint systems to our elections and media outlets. It is increasingly used by individuals and organizations in both the public and private sectors. Over 30% of the world’s population is on social media, we spend most of our waking hours attached to our devices, and with every minute in the US, 2.1M snaps are created and 1M people are logging in to Facebook. With all this use, comes a great amount of data.

We’ve looked at how social media data is being used by social science researchers to identify the challenges they face. It’s a field with an abundance of tools, full of experts and thankfully some training and events to broaden the net. We’ve outlined some of the how and who below. It’s a quick moving space, without a lot of funding opportunities so far. This could change as big tech players get involved; LinkedIn relaunched the Economic Graph Research in 2017 and Social Science One’s first program gives academic researchers access to huge amounts of data generated by Facebook.

How is social media data being used in research?

Researchers are collecting and analyzing social media data with a variety of methods, including sentiment and network analysis. Popular tools have provided insight to healthcare campaigns, politics and the media. Amongst its 300+ citations, Netvizz was used to investigate hate speech and covert discrimination on Facebook. Node XL enabled analysis of Twitter spammers’ behavior and DMI-TCAT (100+ citations) was used to explore trends in political communication by analysis of the social media campaigns of Trump and Clinton in the 2016 US presidential election.

We’ve found and reviewed close to 100 software tools used by researchers working with social media data. There’s tools to scrape data from Twitter, YouTube, Weibo, Facebook, Instagram and Flickr, to analyze or visualize it, often requiring little to no programming experience. Some services are free and some charge a monthly subscription. Papers where the tools are cited are also included.

Personal favourites: DocNow provides a collection of tools which respond to social media’s role in reporting historically significant events, or ‘documenting the now’. Prioritizing ethical collection, use and preservation of social media data in academic research, their crowd-sourced catalog of Tweet ID datasets and ‘rehydration’ tool allow access to historical tweets, whilst ensuring they are used in line with the creator’s decisions.

Over four million Youtube videos are watched each minute in the USA. Webometric Analyst is a key tool if you want insight into those users or videos. Features include building a network diagram of comments and replies or exporting comments and analyzing them with Mozdeh.

Whilst it remains a prototype, MISNIS managed to bypass Twitter API restrictions and collect 80% of flowing Portuguese language tweets. It can also retrieve tweets based on their topic, rather than hashtags or keywords. For researchers without programming expertise too!

Filter the table to see which tool works for you.


Despite its potential for insight, social media data comes with challenges. Twitter’s  2015 API update restricts free access to only 1% of tweets published in the last seven days. Facebook have closed their doors further, preventing third-party apps from accessing user IDs for public pages.

Once you’ve got the data, it is often unstructured and noisy. Data quality can be difficult to assess since it is often user-generated, or combines self-reported and behavioral traces. Users’ online behavior doesn’t necessarily represent their offline beliefs, especially as organizations, public figures, bots and influencers populate  social media and can bias datasets. Given there is a continuous arrival of new data (Twitter receives 500 million tweets in a day), a complete dataset cannot be available. What’s more, the ways we use social media and the popularity of platforms change rapidly.

If you’re working with social media data or any of these tools, we’d love to hear the good bits and the bad. Or if we’ve missed something, fill in the form and let us know.

And if you are just starting with the collection, Jason Radford from Volunteer Science has written a fantastic intro with tips into this space.

Who is working with social media data?

Until 2012, 38% of Twitter research came from the field of computer science, 21% from information science, and 14% from communications. Yet recent years have seen social media data research across a wide array of disciplines, with the arrival of journals like Big Data and Society (2014) and Social Media + Society (2015) showcasing its use in the social sciences. Labs and research groups in Europe and North America highlight its place in social science research too:

The Social Media Lab at Ryerson University works to further our understanding of the benefits and downfalls of social media use. Recent projects have explored social media as a place for education outside the classroom and journalists’ use of social media to infer public opinion. Their social media research toolkit lists tools curated by researchers at the Lab.

Digital Methods Initiative is an internet studies research group based in Amsterdam. It devizes tools and methods to facilitate online platforms like Twitter and Facebook for use in social and political research. The initiative is behind popular tools Netvizz, Issue Crawler and DMI-TCAT.

Wasim Ahmed is a social media specialist based at the University of Northumbria. His research focuses on social media analytics and public health. He has written an an overview of tools and is hosting a training day for Social Network Analysis and Node XL in August.

Visual Social Media Lab at the University of Sheffield analyses social media images and online visual cultures. Researcher Ray Dranville used Pulsar Platform to gather data for his research in to the intersections between iconography and social media images.

Social Data Lab at Cardiff University is ESRC funded and works to democratize access to big social data across sectors. They developed and continue to support  COSMOS, which provides ethical access to social media data for social science researchers.

How can you stay on top of developments and research?

There is a growing number of events and conferences which present social media data research. As well as opportunities from Social Science One and LinkedIn’s Economic Graph Research Program, social media specific tasks at the Semantic Evaluation Workshop 2019 show the increasing interest in these datasets. The CodaLab competitions HatEval and OffensEval task entrants to identify offensive posts and hate speech on Twitter.

Meet those who are pushing the boundaries in social media research at the 10th International Conference on Social Media & Society, taking place in Toronto in July this year. Keynote speaker Tarleton Gillespie will present Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. No doubt these hidden decisions will shape social media datasets and the validity of their use in research.

International Network for Social Network analysis (Sunbelt) is taking place in Montreal in June. Sunday 23rd June will see the Cyber & Social Media Networks Session, but papers using social media data will be presented across the five days.

Oslo hosts the International Conference on Data Mining and Social Media Analysis in June too. Papers range from Language Borrowing among Algerian University Students Using Online Facebook Conversations to Real Time Classification of Political Tendency of Twitter Users.

We will be at the International AAAI Conference on Web and Social Media (ICWSM) which is taking place in Munich, Germany June 11-14th. It brings together diverse research with the common theme of online social media. You can get a handle of social media tools for your own research with SAGE Campus’ new Social Media Methods course. Come and see us at the stand to receive a 25% discount.


Lily Davies is a masters student from the Digital Humanities programme at UCL and is working with the SAGE Ocean team. Her current academic research focuses on civic technology tackling homelessness in the UK.

Want to learn more about using social media data? Upskill yourself or a group of researchers with SAGE Campus’ Collecting Social Media Data online course.