Learning data science from a sociological background

By Maria Holcekova, PhD student in Sociological Research 

Being a data scientist with a sociological background is extremely valuable in trying to answer research questions to advance contemporary humanity. It goes beyond programming skills or just applying algorithms to data. It requires knowledge of the theoretical concepts of the issue at hand as well as an understanding of the suitability and limitations of the data and analysis techniques used. Restructuring the traditional methods of quantitative social research as well as up-skilling the well-established and budding sociologists will go a long way in gaining new insights into the existing research questions but also developing new ones.


As most quantitative social scientists I started my career in a more traditional domain—survey research—but with the dawn of big data and advanced computing, I’ve started keeping up with the opportunities in data science. I am at the beginning of my career as a researcher, both in statistics and data science, which allows me to gain and combine skills from both of these areas. I started off by doing an MSc in Survey Methods for Social Research at the University of Essex, where I gained a solid base in social statistics, both in theory and practice. I’ve been building on these skills ever since and expanding to the field of data science, primarily to satisfy my own curiosity but also to utilize the most effective research skills out there. I have already used certain data science methods like machine learning and natural language processing for the analysis of survey data, but the availability of big data in the social sciences makes for very exciting times ahead.

For me, the availability of big data is a fairly new frontier that will take some time to make use of. This is primarily because the skills and analytical thinking needed is slightly different from traditional quantitative social research. Survey data would generally be collected for a specific purpose, to study specific social phenomena, although there would be space for creative research questions. Big data, on the other hand, provides us with information before necessarily establishing their purpose. While there are some issues that will need to be resolved before adding big data to the golden standard of quantitative social research methods, such as sampling and data quality, the potential scale and precision with which we can study human behavior exceeds what was possible a decade ago.

Data science, more generally, allows us to use new techniques to answer longstanding questions in the pursuit of using social science for addressing the emerging real-world problems and achieving positive social impact. In addition, together with the existence of big data, it could be possible to research new areas of social research such as network analysis beyond household connections, longitudinal analyses without big attrition problems, or digital behaviors which are quite unique to the contemporary societies.

In terms of acquiring the skills needed to be a data scientist in sociology, I’d say the best place to start is by looking at what other people have been doing to build up a good understanding of the analytical methods:

Journal articles: Sociological Research & Methods, International Journal of Data Science and Analytics, American Journal of Epidemiology, SAGE Little Green Books

Blogs and newsletters: AI Weekly, Data Machina, Reddit, SAGE Ocean, Data Science for Social Good.

Conferences: useR!, Women in Statistics and Data Science.

Courses: SAGE Campus, Mind Project, Datacamp.

I personally like to go outside of my field to find new inspiration that I can apply to the social sciences; my particular go-to is biomedical research. In order to go from knowledge to experience it is essential to learn programming languages such as R and Python, both of which have been increasingly used in academic research and desired by industry.


Maria Holcekova is a PhD student in sociological research at the University of Essex researching the contemporary youth transitions from education into and within the labor market with particular focus on different forms of precarity.