Text Wash is awarded the 2019 SAGE Concept Grant

June 2019

SAGE’s Concept Grants program was developed as part of the SAGE Ocean initiative to fund innovative software solutions that support social science researchers to work with big data and new technologies.

We’re delighted to announce that the 2019 SAGE Concept Grant has been awarded to Text Wash, a new software tool that anonymizes personally identifiable text data, making it accessible to social scientists without compromising its usability for research.

Text Wash is being developed by Dr Bennett Kleinberg, Maximilian Mozes and Dr Toby Davies from the Department of Security and Crime Science at University College London, UK. The Concept Grant will enable the team to get the tool off the ground and promote ethical and intelligent data sharing practices.

Data sharing is one of the main impediments to truly relevant computational social science research. Our aim is to unlock the potential of hard-to-access data—such as police reports or patient interviews—as a means of addressing important societal challenges.
— Bennett Kleinberg, Assistant Professor in Data Science at the Department of Security and Crime Science at UCL

When it comes to doing research with text data, many datasets are protected through ethics boards’ restrictions (e.g. interviews, crowdsourced texts) and wider data protection frameworks such as GDPR (e.g. police reports, patient files). As a result, such unique datasets are rarely shared, so that research using text data often focuses on readily available data at the expense of data that could help answer more pressing research questions.

Where they are shared, current approaches to anonymize these data render the texts unusable for follow-up research. Text Wash solves this problem by enabling the anonymization of text data without compromising its quality. It does this by using natural language processing and machine learning to identify and replace sensitive information while preserving the semantic and grammatical structures in text. Importantly, personally identifiable information is determined in close collaboration with data protection officers from the government and the police.

We were particularly impressed by Text Wash and selected it as the winner based on the importance and prevalence of the challenge it addresses, and its potential for wide-ranging impact.
— Katie Metzler, Associate Vice President of Product Innovation at SAGE

Text Wash will be available as an R-package and a standalone software for non-technical users.

Read the full press release here.


Update from the 2018 Concept Grant Winners

Concept Grants.jpg

In 2018, SAGE Ocean awarded Concept Grants to support the development of three new research tools for social scientists. We recently caught up with the winners for an update on how the projects are progressing. Follow the links below to read the full interviews on our blog.

Quanteda Studio, LSE, UK

A powerful, flexible, and user-friendly text analytic software tool that will require no programming experience to use and will run as a web application.

MiniVAN, Public Data Lab, FR/UK

An easy-to-use tool that will support non-specialist social scientists in the visual analysis and in the online publication of networks. 

Digital DNA Toolbox (DDNA), IIT-CNR, IT

A toolbox that will use bioinformatics techniques to provide researchers with a set of cutting-edge tools that can be used for many things, including assessing the veracity, trustworthiness, and reliability of content (and content producers) in online social networks and beyond.


We will be awarding Concept Grants again in 2020. To stay up to date with the latest news and ensure you receive the next call for applications, subscribe to the Big Data Newsletter using the signup box below, or follow us @SAGEOceanTweets.