By Daniela Duca, Product Manager, SAGE Ocean
Last week, a mix of PhD students, early career and tenured researchers met in Cologne to discuss their latest projects around bias and discrimination on social media, and the algorithms underpinning many of the most pervasive services we use today.
Like at any other conference, we were handed our lanyards at registration, but unlike anything I’ve experienced before, these had an RFID tag in them. We had been invited to participate in a sociopatterns study. This was a conference true to the core mission of computational social science. The organizers, GESIS, together with the CPT in Marseille and the ISI Foundation, were collecting data and we were all data donors, ethically and voluntarily agreeing to share our interactions with our fellow conference attendees and speakers.
Gender Bias Online: Social Media and Vision API
The symposium kicked off with a keynote from Christo Wilson, a professor of computer science at Northeastern University, who shared the results of an algorithm audit exercise focused on gender bias. He and his team collected data from three different recruitment engines (Indeed, Monster and CareerBuilder) on 35 job title searches in 20 cities across the US. He found no direct discrimination, which is to say that there is no evidence that these search engines infer gender from the information that a candidate shares, nor do they discriminate based on this information. However, there is significant discrimination at an individual level, where men are frequently appearing higher up in search results than women.
Gender bias was a recurring theme throughout the two day symposium. Elizaveta Sivak and Ivan Smirnov from HSE in Moscow found that parents are more likely to praise and share images of sons rather than daughters on social media. (They used the openly available and un-restricted data from the Russian VKontakte). Leo Ferres and colleagues from the Chilean Universidad del Desarrollo used network data from Telefonica to look at differences between men and women in terms of income levels and physical movement outdoors. Assuming that 68% of people will live in cities by 2050, gender gaps in mobility will constitute a big problem. The results of the study are even more worrying: regardless of how close to public transport they live, women move less the lower their predicted income bracket.
Carsten Schwemmer from the University of Bamberg used the Google Vision API to identify any biases it may have when tagging images of male and female Members of Congress (MC); and they turn out to be extensive. An image of a female MC is far more likely than one of a male to have the tag hairstyle, and is almost never tagged with public speaking.
Political Engagement, Fake News and the Media
The themes of politics and society’s engagement with news and political views through the media and social platforms were at the core of many of the studies discussed and presented, both as papers and posters. Sandra González-Bailón spoke about news consumption patterns in the US, UK and Spain. Her mapping of these patterns showing how consumption in the US is fragmented (people are more likely to use a variety of resources), in the UK it is centralized (more likely to get news from one or two sources), and Spain sits somewhere in between. Her aim is to scale up this study to 23 more countries. Friedolin Merhout from Duke University used Google AdWords to study the relationship between anti-Muslim and pro-ISIS internet searchers.
Irfan Chaudhry talked about racist and counter-racist discourse in comments on public Facebook pages within the Canadian context. The good news is that there is less overt racist discourse than expected in Canada. Reza Babaei proposed a new methodology for prioritization within media fact checkers; especially pertinent since news now propagates much faster through social media. On the spreading of misinformation, Priya Kumar on behalf of Melodie Yun-Ju Song and Anatoliy Gruzd from Ryerson University talked about the overwhelming availability of anti-vaccine sentiment videos on YouTube. If you are already looking for anti-vaccine videos, you are 69% more likely to be recommended more anti-vaccine videos than those discussing the benefits of immunization.
In another YouTube study presented at the symposium, Max Mozes and his colleagues tried to trace the sentiment paths of video vlogs and measure any differences that may be due to the gender of the blogger. Among 27,000 transcripts they found a few prominent clusters, almost like Kurt Vonnegut’s Shapes of Stories. The paper is available here.
Probably one of the most interesting, and at the same time disruptive to many of the researchers attending, was Jurgen Pfeffer’s short talk on the Twitter API. We all know that the Twitter API provides a sample of the data, either 1% or, if you’re lucky, 10%. Almost all academic work in this area relies on the vaguely described sampling method and presumes its randomness. Jurgen and his colleagues were able to reverse engineer the API and prove that the sampling method is really a time-based one. Jurgen described potential limitations that this finding poses to Twitter studies, especially now that there are a variety of platforms that allow you (or bots) to schedule tweets and ensure they don’t fall within the API’s time window.
Ethical AI and Algorithmic Discrimination
Sociologist Mona Sloane from New York University argued that ethics is not enough to counter bias and discrimination in AI. She urged us to look at intersectional inequality, definitions, epistemology, how the social sciences can support this area and really think about what it means to be human. The closing keynote speaker on day one, data scientist Sara Hajian, touched on similar arguments and asserted that in order to make an algorithm fair, we need to improve on multiple points; anywhere from the data we feed to the model to the algorithm’s output. (This is a great place to start.)
Emre Kıcıman from Microsoft Research closed the symposium with a continued discussion on the origins of algorithmic bias. He advises considering bias relative to the task and revealed how even covert biases, embedded in our behavior, can be amplified by algorithms. Among current best practices, and similar to Sara’s recommendations, Emre notes that we need to understand the task, stakeholders, and any harm of potential errors; the team compositions; how the data was collected, its provenance and sampling method; and finally the models and the results.
SAGE sponsored the best poster awards, which went to:
Assessing the Bias of the New Facebook API (Justin Chun-Ting Ho from University of Edinburgh who managed to save some data before the Facebook restrictions came in)
Political Perceptions of Us vs. Them + Influence of Digital Media Usage (Laura Burbach, Martina Ziefle, Andre Caldero Valdez from RWTH Aachen University)
Ride with Me – Ethnic Discrimination, Social Markets + Sharing Economy (Jasper Tjaden, Carsten Schwemmer and Menusch Khadjavi)
As Product Manager, Daniela works on new products within SAGE Ocean, collaborating with startups to help them bring their tools to market. Before joining SAGE, she worked with student and researcher-led teams that developed new software tools and services, providing business planning and market development guidance and support. She designed and ran a 2-year programme offering innovation grants for researchers working with publishers on new software services to support the management of research data. She is also a visual artist, with experience in financial technology and has a PhD in innovation management.