<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The American Journal of Tropical Medicine and Hygiene
103 (2020) 1621-1629.
[4] G. Pennycook</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Minder: a Tool for Social Media Monitoring and its Use for Detecting COVID-19 Misinformation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcos Fernández-Pichel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David E. Losada</string-name>
          <email>david.losada@usc.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan C. Pichel</string-name>
          <email>juancarlos.pichel@usc.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Big Data</institution>
          ,
          <addr-line>Real Time, Web Streams, Credibility</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela</institution>
          ,
          <addr-line>Rúa</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>2621</volume>
      <fpage>1621</fpage>
      <lpage>1629</lpage>
      <abstract>
        <p>In this work, we introduce Social Minder, a Big Data platform for Social Media monitoring that allows massive extraction of textual information, and stands on a modular and scalable architecture for eficient real-time and batch processing. This demo is oriented to present a use case that provides users with estimates of credibility for webpages linked in Social Media. Social Minder can serve multiple research and commercial purposes but we use it here for identifying COVID-19 related misinformation posted on</p>
      </abstract>
      <kwd-group>
        <kwd>https</kwd>
        <kwd>//citius</kwd>
        <kwd>usc</kwd>
        <kwd>es/equipo/investigadores-en-formacion/marcos-fernandez-pichel (M</kwd>
        <kwd>Fernández-Pichel)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>COVID-19
Twitter.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Social Media (SM) has become one of the main sources of information for end-users [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However,
processing SM data is a challenge and doing it in real time is critical for many added-value
applications. For example, according to Twitter [2], the number of daily posted tweets is higher
than 500 million (around 5,787 tweets per second). Companies and researchers need tools able
to digest this huge amount of information and present it in a convenient and understandable
way.
      </p>
      <p>However, SM can be a source of misinformation, which is specially damaging when it comes to
health-related content. During the COVID-19 pandemic, for example, dubious and poor quality
information about the disease and its treatments was broadcasted on SM [3, 4], sometimes
resulting in situations of personal harm [5]. In this work, we present Social Minder1, a Big Data
platform for batch and real-time social media monitoring, which has been adapted to detect
misinformation about COVID-19.
nEvelop-O
LGOBE
(J. C. Pichel)
CEUR
Workshop
Proceedings
1.1. Related tools
SM analytical tools are often constrained to work from data provided by oficial APIs, as Batrinca
and Treleaven showed in their thorough survey [6]. One of the advantages of Social Minder is
that it allows massive extraction of tweets with its own crawler [7] and works with a modular
and scalable architecture that can eficiently ingest large amounts of textual data (see Section 2).</p>
      <p>Existing tools for social media monitoring, such as Social Mention, provide a rigid set of
functionalities (e.g., general statistics about queries). Social Minder difers from these because it
includes a real-time credibility estimation module with self-developed technology. This module
was built in the context of previous experimental studies [8], which are freely available for the
research community2.</p>
      <p>Although there are some existing initiatives for real-time credibility analysis on Twitter [9],
to the best of our knowledge, our platform is the first to integrate this functionality into a
complete monitoring system expandable to other web sources, not only Twitter.</p>
      <p>Related to our use case, the study by Sharma and colleagues [10] also addressed COVID-19
misinformation on Twitter. However, the main diference here lies in the way that
misinformation is detected. These authors proposed a manual annotation technique, based on certain
expressions and hashtags, while Social Minder incorporates an automatic algorithm, see Section
2 for more details.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Architecture</title>
      <p>Social Minder was built on the top of eXtream [11], which is a Big Data framework that permits
advanced users to design their own processing topologies. Social Minder is an evolution oriented
to the end-user, providing a dashboard for non-expert users. Its system architecture consists of
a fixed consumption topology that interconnects several containerised modules (see Figure 1).
The functionality of each one is briefly explained below:
• A Twitter crawler [7] that injects text streams into the topology. For a given query, it
ifrst tries to recover all historical tweets, and then starts to consume in real-time.
• A sentiment analysis module based on VADER [12], a rule-based classification technology.
• A credibility estimation module that uses a self-developed classification technology, based
on our experimental results [8]. It consists of a support vector machine trained on three
credibility classification datasets [ 13, 14, 15]. Since the training data comes from the
Web Search domain, only the web pages linked in the tweets are assessed for credibility.</p>
      <p>Tweets that do not contain any link are skipped.
• A timestamp-aggregator module that groups texts by diferent temporal granularities
(hour, day, month) to perform the analysis.
• Four parallel computation modules that perform diferent statistical analysis tasks (count
texts, extract keywords using TF-IDF techniques, compute aggregated sentiment and
credibility) for all temporal granularities available.
2https://github.com/MarcosFP97/Health-Rel</p>
      <p>• Two final modules that aggregate results and write them on permanent storage (Mongo</p>
      <p>DB3 database).
• A dashboard for non-expert end-users that shows statistics and graphs per query for
diferent granularities.</p>
    </sec>
    <sec id="sec-4">
      <title>3. COVID-19 misinformation use case</title>
      <p>Social Minder can serve multiple research or commercial purposes. For example, one can
develop new SM applications by modifying the profiles of interest. This demo focuses on a use
case of Social Minder oriented to monitor misinformation posted on Twitter about COVID-19.
We exemplify the tool with a dashboard associated to a sample of covid-related tweets obtained
in 20201.</p>
      <p>Social Minder allows to filter the Twitter stream either by account or by a textual filter. We
illustrate this by considering four cases: two reputed accounts (“@who_europe”, “@dhscgovuk”)
and two filters ( “coronavirus treatment”, “alternative medicine coronavirus”). One can expect
that the two accounts publish more reliable contents, while the tweets associated to the filters
include more dubious information. The sample used to run this demonstration was extracted
during the first lockdown period (May 2020) over the full Twitter stream.</p>
      <p>The dashboard consists of an upper part with configurable elements, general statistics,
keywords extracted from the tweets (computed using TF/IDF) and an initial graph that plots
the number of tweets (see Figure 2). The user can choose to analyse tweets submitted by an
account (“issued” in the interface) or “mentions” to an account or to a given keyword query. For
this demo, we pre-configured some example queries and the user can click on them and obtain
the corresponding results. The granularity of the analysis is also configurable (days, weeks,
months). In this upper part, general count statistics and keywords provide the user with a first
glimpse of the account/topic in social media.</p>
      <p>The bottom part consists of bar graphs that represent the evolution of the sentiment and the
credibility of the posted contents (see Figure 3). Using this tool, one can observe, for example,
that @who_europe tends to publish more credible contents (as estimated by the classifier) that
the contents associated to tweets that mention words like “coronavirus treatment”.</p>
      <p>It might be surprising that some contents from a reputed organisation such as the WHO are
classified as “highly uncredible”. This may be due to false negatives in our predictive technology,
which has still room for improvement. However, as mentioned above, the tool identifies general
trends and, in general, is able to distinguish the relative quality of authoritative accounts versus
more dubious contents (e.g.,“alternate medicine coronavirus”).</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>In this work, we presented an end-user oriented tool called Social Minder. It allows monitoring
Twitter but it could be expandable to other web sources, and it provides diferent estimations
(like sentiment or credibility) that can be useful for commercial or research purposes, like
monitoring a company’s account or analysing misinformation trends. Some core modules, such
as the experiments2 that inspirited our credibility estimation technology or the Twitter crawler4
are freely available to the community.</p>
      <p>In this demo, we have shown one possible use case, but this technology could be adapted to
monitor new dynamic text streams, new queries, and/or add new modules, just to name a few.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The authors thank the support obtained from: i) project RTI2018-093336-B-C21 (Ministerio de Ciencia e Innovación,
Agencia Estatal de Investigación &amp; ERDF), ii) project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033,
Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación y Resiliencia,
Unión Europea-Next GenerationEU), and iii) Consellería de Educación, Universidade e Formación Profesional
(accreditation 2019-2022 ED431G-2019/04, ED431C 2018/29) and the European Regional Development Fund, which
acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela
as a Research Center of the Galician University System.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Reuters</given-names>
            <surname>Insitute</surname>
          </string-name>
          , University of Oxford,
          <source>Reuters Digital News Report</source>
          <year>2021</year>
          ,
          <year>2021</year>
          (accessed
          <issue>September 07</issue>
          ,
          <year>2021</year>
          ). URL: https://reutersinstitute.politics.ox.ac.uk/digital-news
          <source>-report/</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>