=Paper=
{{Paper
|id=Vol-3181/paper23
|storemode=property
|title=NLP Techniques for Water Quality Analysis in Social Media Content
|pdfUrl=https://ceur-ws.org/Vol-3181/paper23.pdf
|volume=Vol-3181
|authors=Muhammad Asif Ayub,Khubaib Ahmad,Kashif Ahmad,Nasir Ahmad,Ala Al-Fuqaha
|dblpUrl=https://dblp.org/rec/conf/mediaeval/AyubAAAA21
}}
==NLP Techniques for Water Quality Analysis in Social Media Content==
<pdf width="1500px">https://ceur-ws.org/Vol-3181/paper23.pdf</pdf>
<pre>
      NLP Techniques for Water Quality Analysis in Social Media
                             Content
                                    Muhammad Asif Ayub 1 , Khubaib Ahmad 1 , Kashif Ahmad2 ,
                                              Nasir Ahmad1 , Ala Al-Fuqaha2
        1 Department of Computer Systems Engineering, University of Engineering and Technology, Peshawar, Pakistan.
         2 Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa

                                        University, Qatar Foundation, Doha, Qatar.
           {khubaibtakkar,asifayub836}@gmail.com,{kahmad,aalfuqaha}@hbku.edu.qa,n.ahmad@uetpeshawar.edu.pk

ABSTRACT                                                                                This paper provides a detailed description of the methods pro-
This paper presents our contributions to the MediaEval 2021 task                     posed by team CSE-Innoverts for the water quality analysis rep-
namely ”WaterMM: Water Quality in Social Multimedia”. The task                       resented in the MediaEval task. The dataset provided for the task
aims at analyzing social media posts relevant to water quality with                  covers multi-modal information including textual, visual, and meta-
particular focus on the aspects like watercolor, smell, taste, and re-               data. However, images are available for very few posts. Moreover,
lated illnesses. To this aim, a multimodal dataset containing both tex-              the majority of the available images are not relevant. Thus, we
tual and visual information along with meta-data is provided. Con-                   mainly focus on textual information by proposing four different
sidering the quality and quantity of available content, we mainly                    solutions as detailed in Section 2.
focus on textual information by employing three different models
individually and jointly in a late-fusion manner. These models in-                   2   PROPOSED APPROACHES
clude (i) Bidirectional Encoder Representations from Transformers                    In total, we submitted 4 different runs by employing three differ-
(BERT), (ii) Robustly Optimized BERT Pre-training Approach (XLM-                     ent Neural Networks (NNs) architectures, namely BERT [3], XLM-
RoBERTa), and a (iii) custom Long short-term memory (LSTM)                           RoBERTa [5], and LSTM, individually and jointly in a late fusion
model obtaining an overall F1-score of 0.794, 0.717, 0.663 on the                    scheme. Run 1 is based on the late fusion where we jointly em-
official test set, respectively. In the fusion scheme, all the models                ployed the models by aggregating the classification scores obtained
are treated equally and no significant improvement is observed in                    with the individual models. Figure 1 provides the block diagram
the performance over the best performing individual model.                           of the proposed methodology for Run 1. Run 2, Run 3, and Run 4
                                                                                     are based on the individual models namely BERT, XLM-RoBERTa,
1    INTRODUCTION                                                                    and LSTM, respectively. The details of the individual model based
In recent years, social media has emerged as a valuable tool and plat-               solutions are provided below.
form to discuss and convey concerns over different challenges and                         • BERT-based Solution (Run 2): In this proposed solu-
daily life issues [1]. The literature covers a diversified list of societal,                tion, we rely on a pre-trained BERT model, which is fine-
environmental, and technological topics, such as racism and hate                            tuned on the data development set provided by the task
speech [6], public health [7], natural disasters and rehabilitation [8],                    organizers. Before proceeding with fine-tuning the model,
and technological conspiracies [4], discussed in social media outlets.                      necessary pre-processing is performed, using Tensorflow li-
More recently, there have been debates in social networks on envi-                          braries, to bring the data in the required form to be used for
ronmental issues especially the quality of air and drinking water                           training the model. Since it is a binary classification task,
in different parts of the world. The discussions generally revolve                          we used Binary Cross entropy loss function with Adaptive
around the topics like strange color, smell, bad taste, and diseases                        Moments (Adam) optimizer.
caused by polluted water. This information could help in several                          • XLM-RoBERTa-based Solution (Run 3): In this approach,
ways. For instance, it can serve as valuable feedback for public                            we rely on the multilingual pre-trained XLM-RoBERTa
authorities on the water distribution network. However, extracting                          model that is fine-tuned on the development set. As a
information from such informal sources is very challenging. It is                           first step, the input text is tokenized in the pre-processing
possible that social media posts containing water-quality-related                           phase. A pre-trained model is then fine-tuned on the pre-
keywords do not represent discussions on polluted water. In this                            processed data using Adam optimizer with a binary cross-
regard, Machine Learning (ML) and Natural Language Processing                               entropy loss function.
(NLP) techniques could be employed to automatically analyze and                           • LSTM-based Solution (Run 4): In this approach, we rely
filter out irrelevant posts. In order to explore the potential of ML                        on a custom LSTM model. The model is composed of three
and NLP techniques in this challenging problem, a task namely ”Wa-                          layers including an input, LSTM, and output layer. We used
terMM: Water Quality in Social Multimedia” has been introduced                              this model as a baseline for our experiments. However, the
in the benchmark MediaEval 2021 competition [2].                                            model obtained encouraging results on the development
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons          and was thus utilized in the fusion scheme.
License Attribution 4.0 International (CC BY 4.0).
MediaEval’21, December 13-15 2021, Online                                               We also cleaned the data before feeding into the models by remov-
                                                                                     ing URLs, account handles, emojis, and unnecessary punctuation.
MediaEval’21, December 13-15 2021, Online                                                                                               M. Asif et al.

  Input Text                  Models
                                                                           be the low-performing models as all the models are treated equally
                                                                           by simply aggregating the obtained posterior probabilities. This
                                               Late Fusion
                                                                           limitation could be addressed by using merit-based fusion where
                                                                           weights are assigned to the contributing models based on the per-
                                                         Predicted_Label
                                                                           formance of the model.

                                                                           Table 2: Evaluation of our proposed solutions on the test set
                                                                           in terms of micro precision, recall, and f1-score.

                                                                                         Runs      Precision      Recall     F1-Score
                                                                                         Run 1     0.732          0.866      0.794
                                                                                         Run 2     0.732          0.866      0.794
   Figure 1: Block diagram of the proposed methodology.                                  Run 3     0.606          0.877      0.717
                                                                                         Run 4     0.565          0.801      0.663
Moreover, in all the proposed solutions, we used an up-sampling
technique to balance the dataset.
                                                                           4    CONCLUSIONS AND FUTURE WORK
3 RESULTS AND ANALYSIS                                                     The quantity and quality of the images associated with the social
3.1 Evaluation Metric                                                      media posts were not good enough to contribute to the task. Thus,
                                                                           we focused on the textual information only by employing several
For the evaluation of the proposed methods, we used four different
                                                                           NNs based solutions. In total, four different solutions including a
metrics, namely (i) accuracy, (ii) micro precision, (iii) micro recall,
                                                                           fusion and three individual models based solutions. In the current
and (iv) micro F1-score. Precision, recall, and f1-scores are the
                                                                           implementation, we used a simple fusion mechanism by simply ag-
official metrics while accuracy has been used as an additional metric
                                                                           gregating the posterior probabilities obtained with each individual
for the evaluation of the methods on the development set.
                                                                           model.
                                                                              In the future, we aim to employ more sophisticated fusion schemes
3.2     Experimental Results on the Development
                                                                           by assigning merit-based weights to the contributing models. We
        Set                                                                also aim to make use of the additional information available in the
Table 1 provides the experimental results of our proposed solutions        form of metadata in our future fusion-based solutions.
on the development set. To this aim, a separate validation set com-
posed of 1,810 samples is used. Run 1 represents our fusion-based          REFERENCES
solutions while Run 2, Run 3, and Run 4 represent our solutions             [1] Kashif Ahmad, Konstantin Pogorelov, Michael Riegler, Nicola Conci,
based on the individual models namely BERT, RoBERTa, and LSTM,                  and Pal Halvorsen. 2019. Social media and satellites: Disaster event
respectively. On the development set, overall better results are ob-            detection, linking and summarization. MULTIMEDIA TOOLS AND
tained with the BERT-based solution obtaining an overall F1-score               APPLICATIONS 78, 3 (2019), 2837–2875.
and accuracy of 0.950 and 0.929, respectively. The least performance        [2] Stelios Andreadis, Ilias Gialampoukidis, Aristeidis Bozas, Anasta-
in terms of F1-score and accuracy are observed for RoBERTa.                     sia Moumtzidou, Roberto Fiorin, Francesca Lombardo, Anastasios
                                                                                Karakostas, Daniele Norbiato, Stefanos Vrochidis, Michele Ferri, and
                                                                                Ioannis Kompatsiaris. 2021. WaterMM:Water Quality in Social Multi-
Table 1: Evaluation of our proposed solutions on the develop-
                                                                                media Task at MediaEval 2021. In Proceedings of the MediaEval 2021
ment set in terms of precision, recall, f1-score, and accuracy.                 Workshop, Online.
                                                                            [3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
       Runs    Precision     Recall     F1-Score     Accuracy                   2018. Bert: Pre-training of deep bidirectional transformers for language
       Run 1   0.950         0.925      0.938        0.914                      understanding. arXiv preprint arXiv:1810.04805 (2018).
       Run 2   0.949         0.950      0.950        0.929                  [4] Abdullah Hamid, Nasrullah Shiekh, Naina Said, Kashif Ahmad, Asma
       Run 3   0.862         0.900      0.881        0.836                      Gul, Laiq Hassan, and Ala Al-Fuqaha. 2020. Fake news detection in
                                                                                social media using graph neural networks and NLP Techniques: A
       Run 4   0.885         0.947      0.915        0.885
                                                                                COVID-19 use-case. arXiv preprint arXiv:2012.07517 (2020).
                                                                            [5] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi
                                                                                Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoy-
3.3     Experimental Results on the Test Set                                    anov. 2019. Roberta: A robustly optimized bert pretraining approach.
Table 2 provides the official results on the test set in terms of pre-          arXiv preprint arXiv:1907.11692 (2019).
                                                                            [6] Ariadna Matamoros-Fernández and Johan Farkas. 2021. Racism, Hate
cision, recall, and f1-score. Overall better results are obtained for
                                                                                Speech, and Social Media: A Systematic Review and Critique. Television
BERT among the individual model-based solutions while the least                 & New Media 22, 2 (2021), 205–224.
scores are observed for the LSTM based solution. However, inter-            [7] Salman Bin Naeem, Rubina Bhatti, and Aqsa Khan. 2021. An explo-
estingly, no significant improvement in the performance for the                 ration of how fake news is taking over social media and putting public
fusion-based solution over the best-performing individual models-               health at risk. Health Information & Libraries Journal 38, 2 (2021),
based solution has been observed. One of the possible reasons could             143–149.
WaterMM: Water Quality in Social Multimedia                                                  MediaEval’21, December 13-15 2021, Online


 [8] Naina Said, Kashif Ahmad, Michael Riegler, Konstantin Pogorelov,      detection in social media and satellite imagery: a survey. Multimedia
     Laiq Hassan, Nasir Ahmad, and Nicola Conci. 2019. Natural disasters   Tools and Applications 78, 22 (2019), 31267–31302.

</pre>