Tools & Technology

Who’s disrupting transcription in academia?

Transcribing is a pain, recent progress in speech recognition software has helped, but it is still a challenge. Furthermore, how can you be sure that your person-identifiable interview data is not going to be listened and transcribed by someone who wasn’t on your consensus forms. The bigger disruptor is the ability to annotate video and audio files

The five pitfalls of document labeling - and how to avoid them

Whether you call it ‘content analysis’, ‘textual data labeling’, ‘hand-coding’, or ‘tagging’, a lot more researchers and data science teams are starting up annotation projects these days. Many want human judgment labeled onto text to train AI (via supervised machine learning approaches). Others have tried automated text analysis and found it wanting. Now they’re looking for ways to label text that aren’t so hard to interpret and explain.

SMaPP-Global: An interview with Josh Tucker and Pablo Barbera

In April this year a special collection examining social media and politics was published in SAGE Open. Guest edited by Joshua A. Tucker and Pablo Barberá, the articles grew out of a series of conferences held by NYU’s Social Media and Political Participation lab (SMaPP) and the NYU Global Institute for Advanced Study (GIAS) known as SMaPP-Global. Upon publication Joshua Tucker said ‘the collection of articles also shows the value of exposing researchers from a variety of disciplines with similar substantive interests to each other's work at regular intervals’. Interdisciplinary collaborative research projects are a cornerstone of what makes computational social science such an interesting field. We were intrigued to know more so caught up with Josh and Pablo to hear more.

No more tradeoffs: The era of big data content analysis has come

For centuries, being a scientist has meant learning to live with limited data. People only share so much on a survey form. Experiments don’t account for all the conditions of real world situations. Field research and interviews can only be generalized so far. Network analyses don’t tell us everything we want to know about the ties among people. And text/content/document analysis methods allow us to dive deep into a small set of documents, or they give us a shallow understanding of a larger archive. Never both. So far, the truly great scientists have had to apply many of these approaches to help us better see the world through their kaleidoscope of imperfect lenses.

Instead of seeing criticisms of AI as a threat to innovation, can we see them as a strength?

At CogX, the Festival of AI and Emergent Technology, two icons appeared over and over across the King’s Cross location. The first was the logo for the festival itself, an icon of a brain with lobes made up of wires. The second was for the 2030 UN Sustainable Development Goals (SDGs), a partner of the festival. The SDG icon is a circle split into 17 differently colored segments, each representing one of the goals for 2030—aims like zero hunger and no poverty. The idea behind this partnership was to encourage participants of CogX—speakers, presenters, expo attendees—to think about how their products and innovations could be used to help achieve these SDGs.

2018 Concept Grant winners: An interview with MiniVan

Following the launch of the SAGE Ocean initiative in February 2018, the inaugural winners of the SAGE Concept Grant program were announced in March of the same year. As we build up to this year’s winner announcement we’ve caught up with the three winners from 2018 to see what they’ve been up to and how the seed funding has helped in the development of their tools.

In this post we chatted to MiniVan, a project of the Public Data Lab.

Social media data in research: a review of the current landscape

Social media has brought about rapid change in society, from our social interactions and complaint systems to our elections and media outlets. It is increasingly used by individuals and organizations in both the public and private sectors. Over 30% of the world’s population is on social media. We spend most of our waking hours attached to our devices, with every minute in the US, 2.1M snaps are created and 1M people are logging in to Facebook. With all this use, comes a great amount of data.

2018 Concept Grant winners: An interview with Ken Benoit from Quanteda

We catch up with Ken Benoit, who developed Quanteda, a large R package originally designed for the quantitative analysis of textual data, from which the name is derived. In 2018, Quanteda received $35,000 of seed funding as inaugural winners of the SAGE Concept Grants program. We find out what challenges Ken faced and how the funding helped in the development of the package.

2018 SAGE Concept Grant winners: An interview with the Digital DNA Toolbox team

Following the launch of the SAGE Ocean initiative in February 2018, the inaugural winners of the SAGE Concept Grant program were announced in March of the same year. As we build up to this year’s winner announcement we’ve caught up with the three winners from 2018 to see what they’ve been up to and how the seed funding has helped in the development of their tools.

In this post, we spoke with the Digital DNA Toolbox (DDNA) winners, Stefano Cresci and Maurizio Tesconi about their initial idea, the challengers they faced along the way and the future of tools for social science research.

How researchers around the world are making use of Weibo data

Zoufan posted her last words on Weibo on 18, March, 2012. She was suffering from a major depressive disorder, and shortly after - committed suicide. Weibo is a microblogging application, launched by Sina Corporation back in 2009, based on user relationships to share, disseminate and get information. In essence, it is similar to Twitter, although it has a number of other useful capabilities. The app has more than 400 million users (compared to Twitter’s 300 million) and features that enable the study of emotional states and responses to the topics being discussed or spread across the web.

Collecting social media data for research

Human social behavior has rapidly shifted to digital media services, whether Gmail for email, Skype for phone calls, Twitter and Facebook for micro-blogging, or WhatsApp and SMS for private messaging.  This digitalization of social life offers researchers an unprecedented world of data with which to study human life and social systems. However, accessing this data has become increasingly difficult.

Matchmaking tools: Augmenting the relationship between research and industry

On a Friday evening in 1922, you could turn on the radio in Schenectady NY and hear Hermann Briggs talking about the latest research and discoveries around common disease and illnesses. Radio, and later TV, were the most exciting and widest reaching media platforms where research knowledge could be shared with the public.

Today, researchers have access to a whole host of media (podcasts, YouTube channels, Ted Talks, etc.) to talk about their research and how it can be fun or useful for the public.

Artificial intelligence for entity resolution with Jeff Jonas

Jeff Jonas, the world’s foremost expert on entity resolution and the inventor of the original NORA (Non-Obvious Relationship Awareness) technology developed for Las Vegas casinos, brings entity resolution to life. This unique technology (an IBM spin-out) is a purpose built real-time AI for delivering human quality entity resolution – determining “who is who” and “who is related to who” – without training, tuning or experts.

Watch the technology in action and see how it works.

Event roundup: Future or fad? VR in social science research

At the end of February we ran a most enthralling event experience. Three panelists, two hosts and about 20 attendees all put their headsets on from their labs, offices and homes to join a virtual classroom decorated with trees, a castle, a slightly scary tiger and a hippo, to talk about the future of VR in social science research.

Virtual reality headsets for testing and research

This blog post outlines what headsets you can use for our next event.

There are currently 3 types of hardware to access visually and audio-immersive experiences: headsets that connect to your PC, headgear that works with your mobile phone, and standalone devices. Besides varying in price, they also differ in their capabilities and hence are intended for different use cases.