-

The American Journal of Tropical Medicine and Hygiene 103 (2020) 1621-1629. [4] G. Pennycook

Minder: a Tool for Social Media Monitoring and its Use for Detecting COVID-19 Misinformation

Marcos Fernández-Pichel

0 1

David E. Losada

david.losada@usc.es 0 1

Juan C. Pichel

juancarlos.pichel@usc.es 0 1 0 Big Data , Real Time, Web Streams, Credibility 1 Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela , Rúa

2022

2621 1621 1629

In this work, we introduce Social Minder, a Big Data platform for Social Media monitoring that allows massive extraction of textual information, and stands on a modular and scalable architecture for eficient real-time and batch processing. This demo is oriented to present a use case that provides users with estimates of credibility for webpages linked in Social Media. Social Minder can serve multiple research and commercial purposes but we use it here for identifying COVID-19 related misinformation posted on

https //citius usc es/equipo/investigadores-en-formacion/marcos-fernandez-pichel (M Fernández-Pichel)

COVID-19 Twitter.

1. Introduction

Social Media (SM) has become one of the main sources of information for end-users [ 1 ]. However, processing SM data is a challenge and doing it in real time is critical for many added-value applications. For example, according to Twitter [2], the number of daily posted tweets is higher than 500 million (around 5,787 tweets per second). Companies and researchers need tools able to digest this huge amount of information and present it in a convenient and understandable way.

However, SM can be a source of misinformation, which is specially damaging when it comes to health-related content. During the COVID-19 pandemic, for example, dubious and poor quality information about the disease and its treatments was broadcasted on SM [3, 4], sometimes resulting in situations of personal harm [5]. In this work, we present Social Minder1, a Big Data platform for batch and real-time social media monitoring, which has been adapted to detect misinformation about COVID-19. nEvelop-O LGOBE (J. C. Pichel) CEUR Workshop Proceedings 1.1. Related tools SM analytical tools are often constrained to work from data provided by oficial APIs, as Batrinca and Treleaven showed in their thorough survey [6]. One of the advantages of Social Minder is that it allows massive extraction of tweets with its own crawler [7] and works with a modular and scalable architecture that can eficiently ingest large amounts of textual data (see Section 2).

Existing tools for social media monitoring, such as Social Mention, provide a rigid set of functionalities (e.g., general statistics about queries). Social Minder difers from these because it includes a real-time credibility estimation module with self-developed technology. This module was built in the context of previous experimental studies [8], which are freely available for the research community2.

Although there are some existing initiatives for real-time credibility analysis on Twitter [9], to the best of our knowledge, our platform is the first to integrate this functionality into a complete monitoring system expandable to other web sources, not only Twitter.

Related to our use case, the study by Sharma and colleagues [10] also addressed COVID-19 misinformation on Twitter. However, the main diference here lies in the way that misinformation is detected. These authors proposed a manual annotation technique, based on certain expressions and hashtags, while Social Minder incorporates an automatic algorithm, see Section 2 for more details.

2. Architecture

Social Minder was built on the top of eXtream [11], which is a Big Data framework that permits advanced users to design their own processing topologies. Social Minder is an evolution oriented to the end-user, providing a dashboard for non-expert users. Its system architecture consists of a fixed consumption topology that interconnects several containerised modules (see Figure 1). The functionality of each one is briefly explained below: • A Twitter crawler [7] that injects text streams into the topology. For a given query, it ifrst tries to recover all historical tweets, and then starts to consume in real-time. • A sentiment analysis module based on VADER [12], a rule-based classification technology. • A credibility estimation module that uses a self-developed classification technology, based on our experimental results [8]. It consists of a support vector machine trained on three credibility classification datasets [ 13, 14, 15]. Since the training data comes from the Web Search domain, only the web pages linked in the tweets are assessed for credibility.

Tweets that do not contain any link are skipped. • A timestamp-aggregator module that groups texts by diferent temporal granularities (hour, day, month) to perform the analysis. • Four parallel computation modules that perform diferent statistical analysis tasks (count texts, extract keywords using TF-IDF techniques, compute aggregated sentiment and credibility) for all temporal granularities available. 2https://github.com/MarcosFP97/Health-Rel

• Two final modules that aggregate results and write them on permanent storage (Mongo

DB3 database). • A dashboard for non-expert end-users that shows statistics and graphs per query for diferent granularities.

3. COVID-19 misinformation use case

Social Minder can serve multiple research or commercial purposes. For example, one can develop new SM applications by modifying the profiles of interest. This demo focuses on a use case of Social Minder oriented to monitor misinformation posted on Twitter about COVID-19. We exemplify the tool with a dashboard associated to a sample of covid-related tweets obtained in 20201.

Social Minder allows to filter the Twitter stream either by account or by a textual filter. We illustrate this by considering four cases: two reputed accounts (“@who_europe”, “@dhscgovuk”) and two filters ( “coronavirus treatment”, “alternative medicine coronavirus”). One can expect that the two accounts publish more reliable contents, while the tweets associated to the filters include more dubious information. The sample used to run this demonstration was extracted during the first lockdown period (May 2020) over the full Twitter stream.

The dashboard consists of an upper part with configurable elements, general statistics, keywords extracted from the tweets (computed using TF/IDF) and an initial graph that plots the number of tweets (see Figure 2). The user can choose to analyse tweets submitted by an account (“issued” in the interface) or “mentions” to an account or to a given keyword query. For this demo, we pre-configured some example queries and the user can click on them and obtain the corresponding results. The granularity of the analysis is also configurable (days, weeks, months). In this upper part, general count statistics and keywords provide the user with a first glimpse of the account/topic in social media.

The bottom part consists of bar graphs that represent the evolution of the sentiment and the credibility of the posted contents (see Figure 3). Using this tool, one can observe, for example, that @who_europe tends to publish more credible contents (as estimated by the classifier) that the contents associated to tweets that mention words like “coronavirus treatment”.

It might be surprising that some contents from a reputed organisation such as the WHO are classified as “highly uncredible”. This may be due to false negatives in our predictive technology, which has still room for improvement. However, as mentioned above, the tool identifies general trends and, in general, is able to distinguish the relative quality of authoritative accounts versus more dubious contents (e.g.,“alternate medicine coronavirus”).

4. Conclusions

In this work, we presented an end-user oriented tool called Social Minder. It allows monitoring Twitter but it could be expandable to other web sources, and it provides diferent estimations (like sentiment or credibility) that can be useful for commercial or research purposes, like monitoring a company’s account or analysing misinformation trends. Some core modules, such as the experiments2 that inspirited our credibility estimation technology or the Twitter crawler4 are freely available to the community.

In this demo, we have shown one possible use case, but this technology could be adapted to monitor new dynamic text streams, new queries, and/or add new modules, just to name a few.

Acknowledgements

The authors thank the support obtained from: i) project RTI2018-093336-B-C21 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación & ERDF), ii) project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia, Unión Europea-Next GenerationEU), and iii) Consellería de Educación, Universidade e Formación Profesional (accreditation 2019-2022 ED431G-2019/04, ED431C 2018/29) and the European Regional Development Fund, which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University System.

[1]

Reuters

Insitute , University of Oxford, Reuters Digital News Report 2021 , 2021 (accessed September 07 , 2021 ). URL: https://reutersinstitute.politics.ox.ac.uk/digital-news -report/