Exploiting Social Media to Address Fundamental Human Rights Issues Massimo Poesio ♠ Ayman Alhelbawy♣,♠ Chris Fox ♠ Udo Kruschwitz♠ ♣ Minority Rights Group, London, UK ♠ University of Essex, Colchester, UK Abstract This invited talk provided an overview of some of our work in relation to extracting meaningful knowledge from social media feeds to help in addressing human rights issues highlighting the potential that the rise of ‘big data’ offers in this respect looking at both sides of the coin regarding big data and human rights: how big data can help human rights work, but also the potential dangers that can originate from the ability to analyse massive amounts of data very quickly. The primary focus of our work is on applying natural language processing methods to turn large-scale unstructured and partially structured data streams into actionable knowledge. Keywords: Human Rights, Social Media, NLP, Arabic NLP 1. Overview mated analysis of ‘big data’ and social media and develops Vast amounts of social media data are being generated ev- new approaches to humanitarian and human rights work. ery second. This represents a paradigm shift in publishing from largely carefully edited data to user-generated content 3. Knowledge Transfer Partnership which, as a result, has rapidly changed the way people ex- The second part of our keynote talk focussed on a practi- change and consume information as well as how they com- cal application of NLP techniques to support human rights municate. Managing such data streams comes with many work in a collaboration between the University of Essex challenges as has been discussed extensively in the research and Minority Rights Group International (MRG)7 . This literature. Nevertheless, it also offers new opportunities. project is funded by InnovateUK8 through a Knowledge One such opportunity is the potential to more easily detect Transfer Partnership (KTP) project. The aim of this project and document human rights violations. In fact, these de- is to provide support to civilian-led reporting of human velopments have already resulted in changes to how human rights violations, in the context of MRG’s involvement in rights organisations work. The ‘investigator on the ground’ the Ceasefire Centre, and in particular in the Ceasefire Iraq will not be completely replaced but there are many new project9 . This project complements the objectives of the modes of identifying evidence of human rights violations. more general HRBDT project, exploring the contributions Social media such as Facebook, YouTube and Twitter are of big data – and in particular, social media – to the identi- ideal platforms to push content to the world. Obviously, fication of human rights violations. there is a big challenge in validating any such postings. Specific objectives of the collaboration with MRG are, first Progress in natural language processing (NLP) means that of all, to develop a portal that will make it possible to collate off-the-shelf tools can now be used to quickly assemble a reports of human rights violations sent by civilians using a processing pipeline that takes social media data and turns variety of formats, from SMS to emails to social media. it into structured knowledge. We are primarily interested The portal10 , currently undergoing beta-testing and soon to in this type of processing pipeline but that needs to be seen go live, will allow personnel by MRG and associated orga- as part of a bigger picture. Two research projects we are nizations to view and analyse reports of human right viola- involved in illustrate the point. tions sent by civilians. Second, the project aims to develop tools to filter and anal- 2. Human Rights, Big Data and Technology yse this type of information. The analysis techniques devel- oped so far, and at the moment tested with tweets, include The Human Rights, Big Data and Technology (HRBDT) methods for detecting human rights violations reports using project1 is an interdisciplinary research project funded by machine learning-based text categorization to classify text ESRC2 based mainly at the Human Rights Centre3 of the (e.g., tweets) according to a classification scheme which, in University of Essex with partners that include the World our case, includes categories such as human right violation Health Organisation4 , the Harvard FXB Center for Health reports (for tweets such as “The army of Assad in Dam- and Human Rights5 and the Geneva Academy for Interna- ascus committed a terrible massacre claiming the lives of tional Humanitarian Law and Human Rights6 . A core ac- dozens of children in their school”), reporting of general tivity of one of the four workstreams is to explore and ap- violence (as in “Four people injured as a result of a brawl ply the potential of natural language processing to the auto- in Darb Alarbaeen”), or reporting of an accident (e.g., “At 1 7 http://www.hrbdt.ac.uk http://minorityrights.org 2 8 http://www.esrc.ac.uk https://www.gov.uk/government/ 3 http://www.essex.ac.uk/hrc/ organisations/innovate-uk 4 9 http://www.who.int/ http://minorityrights.org/what-we-do/ 5 https://fxb.harvard.edu ceasefire-project/ 6 10 http://www.geneva-academy.ch http://iraq.ceasefire.org/ least 24 dead in the sinking of boat for illegal immigrants off the coast of Istanbul”). As part of the project, a dataset of over 15,000 Arabic tweets was collected and annotated according to these categories, and used to train a classi- fier to recognize such categories in text (Alhelbawy et al., 2016). The objective is to apply classifiers of this type to filter the data collected through the portal and/or to gather additional evidence not directly sent to the portal. 4. Conclusions The emergence of ‘big data’ in the form of social media is affecting all parts of life. This development offers a lot of new challenges but also opportunities such as the applica- tion of natural language processing techniques to detect and document human rights violations. NLP tools have matured to a level that they can easily be applied, are scalable and robust. This stream of work offers the additional benefit that it applies state-of-the-art technology to practical appli- cations that will have a measurable impact on the quality of life of many people. Acknowledgements The Human Rights, Big Data and Technology project is funded by Economic and Social Research Council grant ES/M010236/1. We also acknowledge support from Inno- vateUK through a Knowledge Transfer Partnership (KTP) project between MRG and the University of Essex, partner- ship number 9488. 5. References Alhelbawy, A., Poesio, M., and Kruschwitz, U. (2016). Towards a corpus of violence acts in arabic social me- dia. In Proceedings of LREC.