Humans broke the internet, understanding them better might help fix it

By Timo Hannay

Here's a multiple-choice question: Is the internet (a) the most open, egalitarian and empowering means of communication ever devised, or (b) a dystopian nightmare populated by hucksters, trolls and miscellaneous abusers of human rights? The answer is, of course, (c) all of the above and much else besides. This stark contrast between the internet's light and dark sides has become a defining characteristic of the digital age, but is not an inevitable consequence of the mostly innocuous technologies on which it's built. Rather, it is the product of their bewilderingly diverse and eccentric user base – otherwise known as humanity.

 Illustration - woman pushing over smartphones

We are complex, unpredictable beings, which perhaps explains why it has turned out to be so amazingly hard to realise the internet's potential benefits without also succumbing to its pitfalls. In human hands, the internet's principal strengths – its sheer versatility and pervasiveness – have resulted in a huge number of unintended and unanticipated consequences. Many of these are welcome: the endless feast of words, images and sound catering for almost every conceivable taste; the sense of mass connectedness that comes from exchanging thoughts during a public spectacle; the explosion of creativity that has breathed life into old forms (such as the essay or the radio broadcast) and given flight to new ones (such as at the intersections between novels and games, or between research papers and databases).

Any such cornucopia of wonders might be expected to create a few adverse consequences, just as the rich world is battling with obesity now that food is so plentiful. Too much information and entertainment is surely better than too little, just as obesity is far less of a problem than its opposite. Yet the downsides of the internet have been much more surprising and profound than that analogy suggests. This great leveler of opportunity has resulted in some of the most overweening monopolies the world has ever seen. This medium of unfettered self-expression has delivered oppressive surveillance and invasions of privacy of a kind that were previously the sole preserve of police states. This enabler of a better informed, more engaged citizenry has instead come to be seen as subverting democracy itself.

How has it come to this? In part through over-optimism and complacency, not least among technologists like me. But mostly – and ironically – through a lack of information. This may seem like a perverse claim: surely we're drowning in the stuff. In a sense we are, but like the shipwrecked sailor dying of thirst at sea, we lack the right kind of knowledge to save ourselves. Specifically, we lack insights about how the internet is changing us and the societies in which we live.

The reasons for this are not hard to fathom. Unlike the information explosion in fields such as medicine or earth science, which have taken place largely in the public domain, the richest data sets on human activities are overwhelmingly owned and controlled by commercial companies (along with certain, usually secretive, government agencies). This means that they have been mostly unavailable to the very domain experts who are best able to analyse and interpret what's going on – and ultimately to help the rest of us understand this too. It's as if the Hubble Space Telescope were owned by Boeing, or the Large Hadron Collider by General Electric.

This is why we should welcome a new initiative described by Gary King and Nate Persily and implemented by the US Social Science Research Council, Facebook and a variety of other collaborating organisations. In short, it provides a means for academics to conduct independent research using Facebook data, while at the same time keeping the source information confidential and subjecting all activities to legal and ethical oversight.

Admittedly this setup has a number of shortcomings. For a start, it limits access to established academics, with decisions made by senior members of that same community. Politicians confronted with an uncomfortable result are bound to dismiss them as a cloistered, privileged elite with little knowledge of the real world. And since experts have sunk in public esteem by almost as much as policymakers, such objections may well resonate.

Another, more justified criticism is that we still lack ways for the subjects of the data to provide informed consent for its use. This is made all the more complicated by the fact that divulging information about yourself often also reveals information about other people with whom you're associated. (This isn't unique to online data; it also applies to genetic information, for example.) Such a Gordian knot of complex competing interests won't be untangled anytime soon because it requires not just a suitable legal framework but also public acceptance, which can only be achieved with years of real-world experience and open debate.

Yet, when you compare it with the other options, this model seems a lot more attractive. Are we to ban anyone from ever analysing this kind of data? Surely not. Are we to limit such work to employees of the companies concerned, or perhaps release the data to all-comers? We've tried both of those approaches and they didn't work out so well. The onus on critics of the proposed model, then, is not to point out its shortcomings, which certainly exist, but to suggest better alternatives.

Some might even go so far as to argue that we should keep academics out altogether, and that the correct response to big tech's overreach is legally imposed regulation. More effective privacy and competition laws are certainly required, but not as an alternative to deeper insights derived from the data. Indeed, legislation ought to be informed by such research. In any case, one look at Mark Zuckerberg's recent testimony before the US Congress should be enough to convince anyone that politicians don't have all the answers – most of them don't even appear to have particularly good questions.

It's also important to appreciate that this isn't mainly about data leaks or even individual privacy. Those are valid topics of investigation and debate, but in the grand scheme of things they are no more than ripples on the ocean surface. We also need to understand how societies function (or don't), and how technological and other developments are changing them for good or ill. These are the great tides whose forces, while less conspicuous, will be far more powerful in shaping our future.

Crucially, the principles of this initiative seem very broadly applicable. While its first implementation has taken the form of an alliance between social scientists and Facebook, this isn't just about social media or even big tech. For decades all sorts of organisations – from banks and retailers to telecommunications companies and government agencies – have been gathering information about our incomes, spending habits, opinions, social activities, movements and much else besides. It is often said that big technology firms' information data harvesting activities are unprecedented, but this is really only true in terms of their brazenness. In a sense they may have done us an unintended favour by lifting the lid on a long-running covert trend that significantly predates them and continues to extend well beyond their organisations.

So let this be the first of many such initiatives. Your move, Google – also Microsoft, Amazon, Vodafone, HSBC, Walmart and governments everywhere.

Timo Hannay is the founder of SchoolDash and a non-executive director of SAGE Publishing. The views expressed here are his own.


SSRC launches Social Data Initiative & Facebook provides academics with access to data

Last week marked a milestone for social science and industry partnerships, with Facebook announcing an initiative to give scholars access to its data in order to help them assess social media’s impact on elections.

The move, which sees the tech giant partnering with the Social Science Research Council and seven major nonprofit foundations, has been largely welcomed by the research community as a positive step towards enabling academic research and establishing regulation. However, some ethicists have, understandably, expressed concern around privacy and consent issues.

Simultaneous to the Facebook announcement, the SSRC announced the formation of a new Social Data Initiative to “examine the problems, explore questions about the responsible use of social network data, and generate insights to inform solutions”. A welcome and much-needed response to the revelations around the misuse of Facebook data.

 Facebook account

The Facebook initiative is the first to utilise A New Model for Industry-Academic Partnerships, devised by Gary King and Nate Persily as a way to make industry data available to social science researchers via an independent, transparent peer-review process.

Through this model, Facebook and the initiative’s funders will select an independent commission of trusted academics, which will be overseen by the SSRC. This commission will then receive access to Facebook data and use this to identify appropriate research questions within a general topic area (in this case the impact of social media on democracy). Once the questions have been agreed, the commission will announce an open grant competition, inviting independent academics to apply for funding and (privacy-preserving) data access. Proposals will go through a peer review process, conducted by a subcommittee of the commission. Researchers who are awarded funding and data access will then be free to publish their findings as they see fit, without Facebook’s pre-approval.

Funding will be provided by a range of politically diverse US-based foundations: the Alfred P. Sloan Foundation, the Charles Koch Foundation, the Democracy Fund, the John S. and James L. Knight Foundation, the Laura and John Arnold Foundation, the Omidyar Network, and the William and Flora Hewlett Foundation.

Beyond social media

King & Persily’s model offers an innovative approach to a major problem, for industry and academia alike. For technology companies, it provides an opportunity to engage with social scientists and embed informed social objectives within business decisions, ultimately improving governance and social outcomes.

For social scientists, it could be the catalyst for a new era of social research. Whilst there has been significant appetite within the social science community to engage with big data research for sometime, gaining access to the right kind of data has proven a consistent barrier.

And it’s not just social media and technology companies that hold vast quantities of human data. Organisations of all kinds have been collecting and storing information about groups and individuals for years. From CCTV footage to credit card records, this data holds great potential to advance scientific discovery and improve our understanding of society. Of course, making this kind of data available has significant risks and challenges - from the protection of individual privacy and proprietary company information to ensuring the independence of the scientific process. King and Persily’s proposal might not be a perfect solution, but it’s certainly a welcomed step in the right direction.

Next steps

Beyond the Facebook partnership, the formation of the Social Data Initiative should be seen as an important milestone for social science research and industry-academic partnerships in general. But there are many more opportunities to improve support for social scientists working with big data.

In addition to programs like this, we believe that collaborations between social scientists and technologists on an individual level can also provide substantial opportunities for furthering social research. To this end, we are planning a number of events in 2018/19 to bring together people from academia, tech and policy to help form new collaborations and ensure a continued conversation around the future of social science. This includes working with Facebook and O’Reilly to run the second Social Science Foo Camp in 2019.

We also believe it is vital that social scientists are equipped with the necessary skills and tools to answer these new types of research questions. SAGE Ocean is developing a range of resources to support researchers and help the social science community make the most out of the opportunities big data holds.

Timing is everything

To quote Gary King: 

"One might reasonably wonder whether now is, in fact, the time to discuss a data sharing program between internet companies and academics... [but] now is precisely the time to have this conversation and to set up structures that protect users’ privacy while allowing independent academic analysis of social media data."

We couldn't agree more.

Have an opinion? Let us know your thoughts. We’re launching a new “Ask the experts” blog series to showcase your viewpoints on the latest news in social science and big data research.

Our first question is: “What impact do you think Facebook’s “data breach” will have on academic research and academic researchers?”

Send your one to two paragraph response to by the 23rd April for the opportunity to be featured.

Quanteda Studio, MiniVAN and Digital DNA Toolbox awarded Concept Grants

 SAGE Concept Grants

We are delighted to announce the first three winners of our Concept Grants program.

Each has been awarded $35k to develop their ideas and help more social scientists to work with big data. 

The winners are:


Quanteda Studio is designed to be a powerful, flexible, and user-friendly text analytic software tool that requires no programming experience to use and will run as a web application. “Quanteda” is short for the quantitative analysis of textual data, and this new application will be built on the power of the open-source quanteda R package for processing and analyzing text.

Text mining and text analytics has exploded in recent years. Technically able data scientists have a wealth of sophisticated tools for mining information from the troves of available textual data, in the form of computer programming languages and software libraries written for those environments. The downside of this sophistication, however, is that users with no programming experience in R, Python, or Java have no access to these tools.

Quanteda Studio is being developed by Kenneth Benoit from the London School of Economics, the creator of quanteda and an expert applications and methods of text analysis for the social sciences. This new tool will make the power of quanteda widely accessible, enabling social scientists to access and use the package’s text analytics capabilities through a graphical user interface that requires no programming.

Kenneth Benoit said:

“I’m delighted to have been awarded this seed grant to develop a prototype, and delighted to be working with such an experienced and innovative partner of academic research and publishing as Sage.”


MiniVAN will be an easy-to-use tool that will support non-specialist social scientists in the visual analysis of their networks and in the online publication of their results.

Networks are becoming increasingly popular in the social sciences as interfaces for exploratory data analysis. The "Visual Analysis of Networks" (VAN) allows academics to explore large relational datasets without having to deal with the full complexity of graph mathematics. A key barrier remains, however, for the adoption of this approach: current VAN tools are either too complicated or unable handle the growing size of the datasets that are typical in the digital social sciences.

MiniVAN aims to solve this problem by providing a tool for the visual analysis of networks that is accessible to academics with little knowledge of mathematics or coding and yet able to scale up to output graphs containing hundreds of thousands of nodes.

MiniVAN is being developed by Tommaso VenturiniJonathan Gray and Guillaume Pique from the Public Data Lab (PDL), a European network of researchers which seeks to facilitate research, democratic engagement and public debate around the future of the data society. SAGE Publishing partnered with the Institute for Policy Research at the University of Bath to support the establishment of the Public Data Lab in 2017.

The MiniVAN project will draw on the team’s previous open source projects, including GephiSigmajs and Graphology - and will form part of this ecosystem of tools. In line with the Public Data Lab’s spirit of openness, the PDL is seeking to develop MiniVAN in collaboration with the digital social science community. If you have any ideas or needs for this tool, please get in touch via  

Tommaso Venturini said:

“The Public Data Lab is honoured to receive the SAGE Ocean Concept Grant. Such funding provides a unique opportunity to extend our research on Visual Network Analysis and to deliver a tool that will help other social scientists to experiment with this technique.

The PDL believes in the active intervention of social scientists in the future of data society. Digital technologies are not just objects of study that we observe from the outside, but sociotechnical devices that we should investigate critically and repurpose creatively in order to facilitate social research and promote political participation.

SAGE's support will help us to open a discussion with scholars interested in network analysis and to develop an open-source tool that comply with their needs and wishes.”


The Digital DNA Toolbox will use bioinformatics techniques to provide researchers with a set of cutting-edge tools that can be used for many things, including assessing the veracity, trustworthiness, and reliability of content (and content producers) in online social networks and beyond.

Issues related to the diffusion of fake news, rumors, hoaxes, as well as the diffusion of malware and viruses in online social networks have become so important as to transcend the virtual ecosystem and interfere with our businesses and societies. Currently, we are unable to effectively deal with these issues. However, recent advances in theoretical data science, as well as the development of big data systems capable of processing the huge volume of online social networks data, gives us the unprecedented opportunity to tackle these critical and multidisciplinary issues.

The Digital DNA Toolbox will provide a novel approach to modeling online user behavior by extracting and analyzing DNA-inspired sequences from users’ online actions. These well-known DNA analysis techniques can then be used to discriminate between legitimate and malicious accounts.

DDNA is being developed by Stefano Cresci and Maurizio Tesconi from the Institute for Informatics and Telematics, Italian National Research Council.

Stefano Cresci said:

"We are very excited of the possibility to develop the digital DNA toolbox, thanks to the prestigious award of the SAGE Ocean Concept Grant. We firmly believe that current problems related to the assessment of credibility and reliability of content (and content producers) require a multidisciplinary approach. To this end, this funding will contribute to bridge the gap between big data and social scientists, empowering the latter with state-of-the-art algorithms and analysis techniques that would otherwise be confined within the computer science community. We look forward to working together with SAGE and other social scientists in order to deliver efficient, easy-to-use tools, and to make an impact on our society."

SAGE Ocean will be awarding Concept Grants again in 2019. To stay up to date with the latest news and ensure you receive the next call for applications, subscribe to the Big Data Newsletter.

SAGE Ocean Speaker Series: Kimberly A. Houser

We were extremely lucky to kick off the first edition of the SAGE Ocean Speaker Series last week with a talk from tech attorney and social media law professor, Kimberly A. Houser.

 SAGE Ocean Speaker Series #1
 Speaker series pizza

Kimberly presented her research into the use of big data by the IRS, sparking engaged debate from the audience around the legality of such practices; surveillance and privacy; machine learning and "black box" algorithms. There was also much discussion around the GDPR and its potential impact when it comes into force on the 25th May - another of Kimberly's current research topics.

It was a fantastic start to the series and we look forward to hearing from more social science, big data and technology experts over the coming months.

You can find out more about Kimberly's research into the IRS here. 


Ignorance and Interdisciplinary Work: Field Notes from the Social Science Foo Camp

By Tom Kecskemethy, Executive Director of the American Academy of Political and Social Science

 Social Science Foo Camp

The first-ever “Social Science FOO Camp” was held a couple of weeks ago at the headquarters of Facebook in Menlo Park, California. The weekend camp was a hard-to-describe “un-conference,” but for the uninitiated, picture about 250 academics, corporate data analysts, big data enthusiasts, science communicators, and science and policy wonks from the public and private sectors gathering to discuss and develop good ideas. To foster unfettered conversation and candidness, the ideas generated at the FOO are free to circulate after camp, but not the identities of those who advanced them.

As far as I can tell, the organizers’ theory of action begins and ends at something like this: good stuff is going to happen if you physically gather a bunch of people with different perspectives on data, science, and social progress, and turn them loose to talk about whatever they want.  There have been FOOs since 2003 on a variety of technology and future related subjects, so the organizers – in this case, Facebook, O’Reilly Media and SAGE Publishing – are likely pretty confident in that theory of action (FOO, by the way, stands for “Friends of [Tim] O’Reilly,” referring to the CEO of O’Reilly Media, the progenitor of the concept).  My experience at the FOO camp for the social sciences was quite positive, and an interesting kind of perspective on the work of the social science community settled on me over the course of the event. I’m still working to put my finger on the specifics, but the general idea goes as follows.

Foremost, it’s always good to be reminded of the limitations of your own perspective.  At the FOO camp, conversations ran the gamut from discussion of bias in the application of algorithms to social interventions, to figuring out how to mine ubiquitous administrative data while maintaining personal privacy, to improving the reproducibility of research, to sussing out the ways in which social theory can guide analytics in the digital age.

The work was productive and invigorating, and a principle reason for this (at least in my view) was precisely because the mix of people was so eclectic – no single person could possibly know even a majority of the others who were there. There was no such thing as unified worldview that tied the participants together, or a shared epistemology that gave everyone a common point of reference. This diversity included many of the traditional markers, but also one often overlooked at more formal gatherings: having these conversations were very senior and seasoned “old heads,” mid-career scientists and scholars, and junior academics and analysts. All told, the group comprised an extraordinary wealth of knowledge – and perspectives – that no single participant could have.

Sitting through it, Arthur Lupia’s book about science and political communication, Uninformed, came to mind. The book starts by forcing readers to confront their own ignorance: “[w]hen it comes to political information, there’re two groups of people. One group understands that they are almost completely ignorant of almost every detail of almost every law and policy under which they live. The other group is delusional about how much they know. There is no third group.” 

Lupia’s isn’t excoriating readers about their own ignorance, but reminding educators and architects of knowledge to constantly take stock of how little we know, and to focus on the most useful questions and projects that can be applied to important problems at hand. Discipline-based knowledge is good to have, but understanding what to do with that knowledge – making knowledge both useful and used – is difficult work and inherently interdisciplinary.  We social scientists, as it turns out, know surprisingly little about how to get it done.

In this day and age, the broader social science community is deeply interested in (read: “full of foreboding about”) things like the future of academic knowledge, integrity of government statistics, collection and use of non-government data and the extent to which the public / political apparatus values knowledge about individual and social behavior. Accordingly, it’s appropriate for those of us working at the intersection of social science and public affairs to spend time identifying, defining, and teasing out such problems and challenges, and some of the topics that surfaced for discussion at the FOO camp reflected the general anxiety.  Sessions included topics like “social media echo chambers,” “dealing with fake news,” “making universities more useful to democracy” and “can data-driven government policy actually happen?”

Yet FOO made it crystal clear to me that there are better things to do with one’s time than spend the majority of it defining what’s wrong with the world. There are plenty of opportunities for action, and the camp reinforced that the work of the academic disciplines should not end at explanation and “problematization.” To the extent that we are all concerned with the advancement of the human condition, as a community of science educators we must go beyond explaining challenges and work to develop accessible and usable solutions. This means applying the disciplines of the mind that we value so dearly to the challenges of science use.

At the very first session of the FOO camp, I found myself in a conversation with someone in private industry who was working with various colleagues to figure out ways to more productively connect people online. I was struck by the extent to which this person’s work lacked a profit motive, and was deeply thought-through, collaboratively handled, and solicitous of points of view from outside their company.  I remember thinking that the social science community does itself a disservice by not engaging private partners in our work, and that private sector suffers as well if it does not attend to the economic, cultural, and demographic challenges of the world, working in partnership with academic institutions to make their products better. Making public / private sector interplay more the rule than the exception – not cordoned off to events like a FOO camp – is a sociological challenge that we need to address systematically and intentionally. I’d like to believe that we are a creative and resilient community, with more than enough capacity to find our way.

SAGE Ocean Speaker Series #1 - Big Data & The IRS: The End of Privacy

 Big Data & The IRS: The End of Privacy - SAGE Ocean Speaker Series #1

We're excited to announce the launch of the SAGE Ocean Speaker Series!

We'll be hosting a series of free events throughout 2018 to explore the intersection of social science, big data and technology. 

Our first event will take place next week in London, featuring tech attorney and social medial law professor, Kimberly A. Houser

Kimberly will be talking about her research into the US tax authority’s use of big data, the resulting privacy issues and the potential violations of US federal law.

Call for Speakers!

If you're interested in speaking at one of our future events and engaging with the social science, publishing and tech communities in London, we'd love to hear from you.

Please send us a message and we'll get back to you with more information.

Bev Skeggs on social media siloing

Bev Skeggs on social media siloing

"Basically 90 percent of Facebook profit is made from advertising — selling your data to advertising companies so that they can place an advert on your browser..." says Bev Skeggs in a new interview with Social Science Bites. Bev Skeggs joins the podcast in order to reveal interesting new findings in her research that studies how social networks were structuring or restructuring friendships. 

What aspiring data scientists are looking for in hiring companies

"Positions in data science require a unique set of job skills that many professionals simply don’t possess.  The level of programming knowledge, understanding of statistics and business sense make for a difficult position to fill. Because of this, many businesses find it difficult to hire appropriately for the position of data scientist." Kayla Matthews gives pointers on ways that companies, looking for data scientists, could stand out in this demanding market for data engineers. 

Sandy Pentland on social physics

"Alex 'Sandy' Petland tells interviewer Dave Edmonds about the origins of social physics in the days before widespread good data and solid statistical methods and explains how it blossomed as both a field and for Pentland’s own research. Full interview on Social Science Space"