<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>WaterMM: Water Quality in Social Multimedia Task at MediaEval 2021</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stelios Andreadis</string-name>
          <email>andreadisst@iti.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilias Gialampoukidis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aristeidis Bozas</string-name>
          <email>arbozas@iti.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Moumtzidou</string-name>
          <email>moumtzid@iti.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Fiorin</string-name>
          <email>roberto.fiorin@distrettoalpiorientali.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Lombardo</string-name>
          <email>francesca.lombardo@distrettoalpiorientali.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasios Karakostas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Norbiato</string-name>
          <email>daniele.norbiato@distrettoalpiorientali.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefanos Vrochidis</string-name>
          <email>stefanos@iti.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Ferri</string-name>
          <email>michele.ferri@distrettoalpiorientali.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioannis Kompatsiaris</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eastern Alps River Basin District</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Information Technologies Institute - Centre of Research and Technology Hellas</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper describes the “WaterMM: Water Quality in Social Multimedia” Task at MediaEval 2021. The overall aim of the task is to analyse the textual content of social media data that express real-world issues. The focus is specifically on water quality, safety and security, which is a fundamental part of life sustainability. Participants of this task are required to classify the social media posts of a bilingual dataset as relevant or not relevant to water-related problems, while they can optionally combine textual features with visual. The automatic prediction of posts could enhance the quality of crowd-sourced information, consequently supporting situational awareness in the water sector.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        With the rise of social media in the everyday life of people around
the world, a very broad range of topics is now discussed online.
The widespread availability of public social media posts has paved
the way for developing Artificial Intelligence solutions that
exploit crowd-sourced information. The scientific community has
particularly focused on emergency, disaster and crisis management
[
        <xref ref-type="bibr" rid="ref1 ref7">1, 7</xref>
        ] where the use of social media data can be really beneficial to
detecting threats, monitoring situations, and enhancing response.
Despite the fact that research aims attention mostly at sudden crisis,
i.e. natural or human-caused disasters that occur without warning,
another highly interesting domain is the creeping crisis, i.e. a threat
to life-sustaining systems that evolves over time and space and is
foreshadowed by precursor events [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Such a type of crisis could
threaten, for example, water quality, safety and security.
      </p>
      <p>Among the various topics discussed on Twitter, it is anticipated
that users will also post tweets that refer to water quality. The
acquisition of posts containing citizen complaints on the condition
of drinking water (as an addition to traditional means, e.g. phone
calls) or news coverage about water-related issues could support
situational awareness in a water distribution network.</p>
      <p>However, within the post stream it is expected that a number
of posts containing water-quality-related keywords does not refer
to actual cases of polluted water. To minimize the incoming noise,
automatic prediction of a post’s relevance is required. Filtering out
irrelevant posts will improve the quality of the information that
interested organisations, such as water utilities or water protection
agencies, receive from social media. Estimating the relevance of a
tweet faces two further challenges. First, the textual information
of a tweet (i.e. Twitter message) may have a diferent relevance
to the examined topic in comparison to its visual information (i.e.
Twitter image). Secondly, the text of the tweets may be in multiple
languages, which requires independent processing and training.</p>
      <p>
        The potential contribution of relevance prediction to situational
awareness in the water sector has motivated the organisation of the
“WaterMM: Water Quality in Social Multimedia” Task1 at MediaEval
2021. As a continuation of the Multimedia Satellite Task (2017-2019)
[
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3–5</xref>
        ] and the Flood-related Multimedia Task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the WaterMM
Task focuses exclusively on social media data and shifts the domain
of application from floods to water quality, safety and security. The
overall goal of the task is to tackle the aforementioned challenges
and use textual information (as well as visual information and
metadata) from a bilingual dataset of Twitter posts in order to
identify tweets that refer to concerns about water.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>TASK DESCRIPTION</title>
      <p>The WaterMM Task deals with the analysis of social media posts
from Twitter with regards to issues of water quality, safety and
security. The participants of this task are provided with a set of
Twitter post IDs in order to download the text, the attached image
(if it exists) and the metadata of tweets that have been selected
with keyword-based search that involved words/phrases about the
quality of drinking water (e.g. strange color, smell or taste, related
illnesses, etc.). Nevertheless, the occurrence of such phrases in a
tweet might not necessarily reflect a case of water contamination.</p>
      <p>The objective of this task is to build a binary classification system
that will be able to distinguish whether a post is relevant or not to
water-quality issues. An example of a relevant tweet is shown in
Fig. 1, while an irrelevant tweet in Fig. 2. Participants can tackle the
task using text features, image features, metadata, or a combination
of the above, and they are allowed to submit up to 5 runs:
•
•
•
•</p>
      <p>Required run 1: automated using textual information only
Optional run 2: automated using fused textual and visual
information
Optional run 3: automated using fused textual and visual
information as well as other metadata
General runs 4 &amp; 5: everything automated allowed,
including using data from external sources
3</p>
    </sec>
    <sec id="sec-3">
      <title>DATASET DESCRIPTION</title>
      <p>The dataset of the task is a set of social media posts collected
from Twitter during one year, i.e. from May 2020 to April 2021,
by searching for English and Italian keywords inside the tweet
text about water quality (e.g. issues with drinking water, signs
of water pollution, illnesses related to water, etc.). The keywords
have been proposed by the Eastern Alps River Basin District, who
are responsible for hydrogeological defense, which involves the
protection of water resources and aquatic environments, in the
Eastern Alps partition of North-East Italy. For reasons of brevity,
we present here only the most frequently matched keywords for
both languages in Table 1, while the complete list is provided to
participants along with the dataset in the task’s repository2. The
bilingual dataset is separated into two sets: the development-set that
contains 8,000 posts and the test-set with 2,000 posts. In order to be
fully compliant with the Twitter Developer Policy, only the IDs of
the tweets are distributed to the participants. Thus, it was ensured
at the time of releasing the dataset that all tweets were still online.</p>
      <p>The ground truth of the dataset reflects the relevance of a tweet
(relevant / not relevant) and has been manually collected with
human annotation. The annotation has been realized again by the
Eastern Alps River Basin District. Apart from their valuable
expertise on the domain, they were also able to annotate tweets in their
native language, i.e. Italian. It should be noted that each tweet has
been annotated by a single person and not by multiple annotators.</p>
      <p>Initially, solely the ground truth for the development-set is
released, since the ground truth for the test-set is used in the
evaluation stage and will be available only after the completion of
MediaEval 2021. Participants are provided with key-value pairs of
Tweet ID and ground truth label for the relevancy (0=not relevant/
1=relevant). In particular, 1,374 tweets (17.18%) of the
developmentset are relevant and 6,626 (82.82%) are not relevant to water quality,
showing that it is a quite imbalanced training dataset and
participants should consider this issue.
4</p>
    </sec>
    <sec id="sec-4">
      <title>EVALUATION</title>
      <p>F1-Score is selected as the oficial metric for evaluating the binary
classification of tweets as relevant (1) and not relevant (0) on the
test set, since this measure is the harmonic mean between precision
and recall, taking both metrics into account. Participants are also
encouraged to carry out a failure analysis of their results in order
to gain insight in the mistakes that their classifiers make.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work has been supported by the EU’s Horizon 2020 research
and innovation programme under grant agreements H2020-832876
aqua3S, H2020-883484 PathoCERT, and H2020-101004157 WQeMS.
2https://github.com/multimediaeval/2021-WaterMM/blob/main/dataset/keywords.
json</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>David E</given-names>
            <surname>Alexander</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Social media in disaster risk reduction and crisis management</article-title>
          .
          <source>Science and engineering ethics 20</source>
          ,
          <issue>3</issue>
          (
          <year>2014</year>
          ),
          <fpage>717</fpage>
          -
          <lpage>733</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Stelios</given-names>
            <surname>Andreadis</surname>
          </string-name>
          , Ilias Gialampoukidis, Anastasios Karakostas, Stefanos Vrochidis, Ioannis Kompatsiaris, Roberto Fiorin, Daniele Norbiato, and
          <string-name>
            <given-names>Michele</given-names>
            <surname>Ferri</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>The flood-related multimedia task at mediaeval 2020</article-title>
          .
          <source>In Proceedings of the MediaEval 2020 Workshop</source>
          , Online.
          <fpage>14</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Bischke</given-names>
            <surname>Benjamin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Helber</surname>
            <given-names>Patrick</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            <given-names>Zhengyu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borth Damian</surname>
          </string-name>
          , and others.
          <source>2018. The Multimedia Satellite Task at MediaEval</source>
          <year>2018</year>
          :
          <article-title>Emergency response for flooding events</article-title>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber, Simon Brugman, Erkan Basar,
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Martha</given-names>
            <surname>Larson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          .
          <source>The Multimedia Satellite Task at MediaEval</source>
          <year>2019</year>
          :
          <article-title>Estimation of Flood Severity</article-title>
          .
          <source>In Proc. of the MediaEval 2019</source>
          Workshop (Oct.
          <fpage>27</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2019</year>
          ). Sophia Antipolis, France.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber, Christian Schulze, Venkat Srinivasan, Andreas Dengel, and
          <string-name>
            <given-names>Damian</given-names>
            <surname>Borth</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The Multimedia Satellite Task at MediaEval</article-title>
          <year>2017</year>
          .. In MediaEval.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Arjen</given-names>
            <surname>Boin</surname>
          </string-name>
          , Magnus Ekengren, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Rhinard</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Hiding in plain sight: Conceptualizing the creeping crisis</article-title>
          .
          <source>Risk, Hazards &amp; Crisis in Public Policy</source>
          <volume>11</volume>
          ,
          <issue>2</issue>
          (
          <year>2020</year>
          ),
          <fpage>116</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yan</given-names>
            <surname>Jin</surname>
          </string-name>
          , Brooke Fisher Liu, and Lucinda L Austin.
          <year>2014</year>
          .
          <article-title>Examining the role of social media in efective crisis management: The efects of crisis origin, information form, and source on publics' crisis responses</article-title>
          .
          <source>Communication research 41</source>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>74</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>