=Paper= {{Paper |id=Vol-2936/paper-39 |storemode=property |title=UAICS at CheckThat! 2021: Fake news detection |pdfUrl=https://ceur-ws.org/Vol-2936/paper-39.pdf |volume=Vol-2936 |authors=Ciprian Gabriel Cusmuliuc,Matei Alexandru Amarandei,Ioana Pelin,Vlad-Iulian Cociorva,Adrian Iftene |dblpUrl=https://dblp.org/rec/conf/clef/CusmuliucAPCI21 }} ==UAICS at CheckThat! 2021: Fake news detection== https://ceur-ws.org/Vol-2936/paper-39.pdf
UAICS at CheckThat! 2021: Fake news detection
Ciprian G. Cusmuliuc, Matei A. Amarandei, Ioana Pelin, Vlad I. Cociorva and Adrian Iftene
“Alexandru Ioan Cuza” University, Faculty of Computer Science, Iasi, Romania


                   Abstract
                   Social media growth in recent years has facilitated an enhancement in human communication.
                   Platforms such as Facebook and Twitter are now ever-present in our lives, influencing how we
                   speak, think and act. The growth of fake news greatly impacts this phenomenon as it lowers
                   one’s trust in the content presented. One such example is related to the 2016 U.S. presidential
                   election campaign where fake news was a deciding factor in tipping the balance of power. It is
                   hence of critical importance to develop tools that detect and combat such destructive content.
                   CLEF 2021 CheckThat! Task 3 tries to address the problem of fake news, posing a challenge
                   to develop systems that could detect if the main claim made in an article is true, partially true,
                   false, or other. Our team participated in this task with 5 models, ranking 6th place with an F1-
                   macro of 0.44 and a model based on Gradient Boosting; in this paper we will present our
                   methods, runs and results but also discuss future work.

                   Keywords 1
                   Fake news detection, LSTM, Bi-LSTM, BERT, RoBERTa, Random Forest, Gradient
                   Boosting, Naïve Bayes, KNN.

1. Introduction

   Recent advances in computing, that date at the beginning of the millennium, have drastically
changed human interaction, people no longer tend to meet in real life to maintain contact with friends;
furthermore, the COVID-19 pandemic has accelerated this movement by forcing everybody to dialogue
via digital means for months at a time. The main facilitators of this movement are social media
platforms, that have seen massive usage spikes in the past decade, radically changing how we speak,
read news, watch videos and so on, this freedom however comes at a cost. Allowing everybody almost
unlimited reachability and free hand to post however they please is a big advantage, but it is also very
dangerous; the classical example is related to the 2016 U.S. presidential election campaign where a
mixture of social profiling and fake news have led to surprising electoral results (this result contrasts
with the 2020 U.S. elections where social media platforms have banned many ads2). Considering the
previous argument, it is obvious that we need automated methods that analyze the posts and flag them
for fake or misleading content.
   CLEF CheckThat! 2021 Task 3a [1] [2] [17] [18] has exactly the goal expressed in the previous
section; the task definition been that: “given the text of a news article, determine whether the main
claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and also detect
the topical domain of the article”. In the competition we submitted 5 different models and overall ranked
6th.
   This paper describes the participation of team UAICS, from the Faculty of Computer Science,
“Alexandru Ioan Cuza” University of Iasi, in Task 3a at CLEF 2021. The remaining of this paper was
organized as follows: Section 2 details the models we developed and the submitted runs and then

1
 CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
EMAIL: gabriel.cusmuliuc@info.uaic.ro (Ciprian G. Cusmuliuc)
ORCID: 0000-0003-0758-3061 (Ciprian G. Cusmuliuc)
               ©️ 2021 Copyright for this paper by its authors.
               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
               CEUR Workshop Proceedings (CEUR-WS.org)


2
    https://www.bbc.com/news/technology-54369303
Section 3 details the results we obtained and finally Section 4 concludes this paper and presents future
work.

2. Methods and runs

    In this section we will detail the submitted models; 5 models have been developed in the search of
finding the best one, we relied on state-of-the-art methods such as LSTM, Bi-LSTM, BERT, RoBERTa
but also experimented with a few novel methods based on more traditional techniques such as Gradient
Boosting, Naïve Bayes, KNN and Random Forest. In future sections we will take a look at state-of-the-
art techniques, analyze the dataset as well as discuss our models and preprocessing.


    2.1.        State of the art

    Research interest in fake news classification has grown exponentially in just a few years.
Identification efforts have been very diverse but they all can be summarized in 3 big categories as [3]
outlines: creator and user analysis, social context analysis and news content analysis.
    Creator and user analysis focuses on extensive analysis of user accounts in order to identify
malicious behaviors. Malicious user accounts behave differently from authentic users; thus,
identification is possible. Different user categorization can be achieved using different techniques: user
profiling analysis [4][5], temporal and posting behavior analysis [6], credibility-related analysis [7],
and sentiment-related analysis [8]. Considering user information was not available in the CheckThat!
dataset, these techniques would not have been possible to apply.
    Social context analysis tries to study how the news disseminates in the social environment, meaning
how quick and wide the data is share/distributed and how users interact with each other, having 2 big
research areas: user network analysis (users with high interaction with the news creator can be used to
predict the truthfulness of the news) [9] and distribution pattern analysis (analysis of the information
spread in the network) [10]. Just like creator and user analysis, social analysis is not feasible on this
task, not to mention that this technique is not used often. Many approaches choose to analyze the news
itself.
    News content analysis in contrast to creator and user analysis does not focus on who posts rather on
what they post. In [11] they used a multitude of neural networks in combination with GloVE embedding
to predict the label of a news article; the best result was with a Bi-LSTM, accuracy of 0.91, but notable
results were obtained with CNN (0.90) and vanilla RNN (0.78). [12] takes a different approach based
on machine learning, implying Naïve Bayes, Gradient Boosting and Random Forest in order to identify
a series of 10000 tweets collected in August 2012, concluding that Random Forest is the best algorithm
with an accuracy of 96%. Finally [13] uses the most novel techniques at this time, BERT [14]; they
start off by tokenizing the input string, then padding after which feeding it to a pre-trained large cased
BERT model to perform the classification which yields an accuracy of 0.69 on a test dataset.
    Knowing thus what the best models are but also what their limitations were we proceeded with
training them in order to see a result.

    2.2.        Training and test dataset analysis

    The training and test dataset have been provided by the organizers and examples can be seen in
Table 1 and 2. The training dataset consisted of 945 labeled articles and the test dataset had 365
unlabeled articles. This small number of articles proved to be a disadvantage to neural network models
as we did not use any other additional datasets.
Table 1
Training dataset example
       public_id                       text                           title                 our rating
       c7ea6a6e                New evidence ties           Flooding of Coast,                  False
                             COVID-19 creation to          Caused by Global
                              research funded by          Warming, Has Already
                                    Fauci?                      Begun.
Table 2
Training dataset example
                    public_id                         text                         title
                     58bea1db                 Second patient cured        Lisa Page Squeals: DNC
                                               of HIV, say doctors.           Server Was Not
                                                                             Hacked By Russia.

        In Figure 1 a dataset analysis is done; taking the Task 3a batches we plot them in order to gain
   some insight in the collection. In the left size of the figure a word cloud view of most frequent words
   in the dataset has been build, with the biggest topics being related to politics and COVID-19. The
   right part of the figure also confirms the latter assumption as there we can see the most frequent
   words, such as “trump”, “covid19” and so on (the plots have been done with tokenized data).
        A problem that was identified early on and will greatly impact the results is relat-ed to label
   imbalance. Figure 2 shows in different representations how many articles are available with a certain
   label, unfortunately since False is the most common one, automatically the algorithms will be biased
   in that direction (0-False, 1-Other, 2-Partially False, 3-True).




Figure 1: Left – word cloud view of most frequent words in the dataset; Right – Bar Plot of most
frequent words in the dataset.




Figure 2: Label distribution in the dataset (0 - False, 1 - Other, 2 - Partially False, 3 - True).
     2.3.      Models
        2.3.1. 3Layer Model

    The first model, and the one which proven to be the most performant, has been named “3Layer
Model” because of its use of 3 different preprocessing methods and 3 different Machine Learning
algorithms used.
    In the data preparation phase, there have been a series of alterations over the dataset. The public_id
field has been removed, the two training batches have been combined as well as the title field and text,
punctuation signs have been removed as well as stop-words, dashed and underscores and lastly the text
has been lowercased and lemmatized.
    The feature extraction phase consisted of three approaches:
    • Clean text is a bigram (a contiguous sequence of n items, where n is 2), the training column will be called
      clean_text;
    • POS Tagging on text column using spaCY3 to obtain the POS form), the training column will be called
      POS_text;
    • Semantic Analysis is done using Stanford’s Empath Tool4 [15] to categorize the words in the articles by
      their lexicon and approximate which articles that are fake predominantly use a certain lexicon (this column
      was named semantics_text). An example can be seen in appendix A.
   Besides the three aforementioned techniques we created a fourth one by weighting them as follows: clean_text:
0.5, POST_tagging: 0.15 and semantic_text: 0.35 (these values have been determined experimentally).
   On the columns mentioned earlier, clean_text, POS_text and semantics_text, in order to feed the data to the
M.L. algorithms we applied TF-IDF.
    As for the models used, they consisted of Naïve Bayes, KNN, Random Forest and Gradient
Boosting. In the results section we will discuss the hyperparameter tuning in relationship to the result;
in the end the most performant variant consisted of Gradient Boosting combined with the weighted
representation of clean text, POS tagging and semantic analysis.

          2.3.2. BERT

   Another model developed is based on BERT which yielded great results in many state of the art
systems [13].
   Data preparations for this method consisted of shuffling the training articles, concatenation of the
batches, merging the title and text columns and eliminating public_id (it was redundant to training).
Other operations have consisted of punctuation signs removal, lemmatization, mandatory text padding
and a special BERT tokenizing process.
   As for the model, we used bert-large-uncased (24-layer, 1024 hidden dimensions, 16 attention
heads, 336M parameters) from HuggingFace5 and begun the fine-tuning process. A problem
immediately apparent was the size of the dataset as BERT requires many training entities. We used
AdamW Optimizer (fine-tuned the learning rate as well as possible, 6e-6 yielding the best results), 3
epochs and a batch size of 3.
   Figure 2 presents the Training and Validation loss over the epochs; training set contained 70% of
the data, 20% for testing and 10% for validation. Apendix B shows a snippet of the BERT classifier.




3
          https://spacy.io/
4
          https://github.com/Ejhfast/empath-client
5
          https://huggingface.co/transformers/model_doc/bert.html
Figure 3: Training and Validation loss of BERT.

          2.3.3. RoBERTa

    Since RoBERTa [16] proves to be better than BERT in some scenarios, we were eager to use it and
compare the results. The pre-trained RoBERTa has been taken from HuggingFace as well, we used the
model ‘roberta-base’6.
    The data processing is similar to BERT. The dataset has been split as follows: 70% of data is for
training, 20% testing and 10% validation. Hyperparameters used are: text sequence is 256, batches are
of 32 elements. Code samples are available in appendix C.

          2.3.4. LSTM

   The fourth implemented model is LSTM. Training and testing have been done on an 80-20 split.
The data processing involves combining the title and text columns and then applying
SnowballStemmer7 from NTLK8 to stem the text. The text has also been tokenized using Keras’s
Tokenizer.
   Feature extraction uses Word2Vec as it preserves semantic meaning of words in documents, using
the embedding matrix resulted we fed it to the model.
   The model is built with Tensorflow and it’s a combination of the following layers:
    • Embedding layer;
    • Dropout layer with a dropout rate of 0.3;
    • LSTM layer with 100 units with a recurrent dropout (fraction of the units to drop for the linear transformation
      of the recurrent state) of 0.2 and a dropout of 0.2 (fraction of the units to drop for the linear transformation
      of the inputs);
    • Dense layer with 4 units (because we predict 4 labels) and using SoftMax activation function.
   The loss function used was sparse categorical cross entropy with Adam optimizer. The total params of the
model were 2,648,304. The optimum number of epochs found were 8 and the batch size 16. We used callback
functions such as ReduceLROnPlateau9 to reduce the learning rate if the accuracy does not improve and early
stopping to halt training if the model does not improve.

          2.3.5. Bi-LSTM

  The fifth and final implemented model is an improvement effort on the previous LSTM network.
The dataset split was: 90% training and 10% validation.


6
          https://huggingface.co/roberta-base
7
          https://www.nltk.org/_modules/nltk/stem/snowball.html
8
          https://www.nltk.org/
9
          https://keras.io/api/callbacks/reduce_lr_on_plateau/
   The title and text columns were merged in a single column, just like all the models. The newly
formed total column was then processed by removing every stop word and lemmatizing it using NLTK.
Finally, the sentences were converted to lowercase and had their whitespaces removed.
   The text was tokenized using the Keras Tokenizer. The word index generated length was 27401. For
extracting the features, we used GloVe embedding (Global Vectors for Word Representation) with 100
dimensions. Training is performed on aggregated global word-word co-occurrence statistics from a
corpus, and the resulting representations show case interesting linear substructures of the word vector
space.
   For building the model we used Tensorflow. The model was build using the Bidirectional LSTM
architecture. We experimented with a lot of combinations of layers but the one that gave the best results
during the validation stage was the following (in order):
  • Embedding layer with the input dimension equaling the word index length (27401), output dimension
    equaling the number of embedding dimensions (100) and the input length equaling the maximum sentence
    length from the training test.
  • Bidirectional LSTM layer with 64 units and return sequences set to true.
  • Bidirectional LSTM layer with 32 units.
  • Dropout layer with dropout rate equaling 0.25 to better handle the overfitting due to the small dataset.
  • Dense layer with 4 units (because it predicts 4 labels) and softmax.

  The loss function we used was sparse categorical cross entropy with Adam optimizer. The total params of the
model was 2,866,156. We experimented with many variations of values for the number of epochs and batch sizes,
but the best performing was setting the number of epochs to 5 and the batch size to 32.

3. Results
   3.1. 3Layer Model
   In this section we will discuss the results of the 3Layer model as well as parameter tuning on the models.
Throughout Table 3 to 6 there have been experiments with each of the 3 feature extraction methods (clean text,
POS tagging and semantic tags) as well as a weighted approach of the three; what worked best in the end is the
weighed approach combined with Gradient Boosting, this combination earned us 6 th place with a F1-macro of
0.44.

Table 3
TF-IDF Vectorization on Cleaned Text
        Classifier             Parameters                       Accuracy                Macro Average
  Multinomial Naive-            alpha = 0.0                       0.57                      0.48
         Bayes
 K-Nearest Neighbors p=2, n_neighbors = 29,                       0.61                        0.41
                              leaf_size = 45
    Random Forest          n_estimators = 1000,                   0.47                        0.25
                           max_features = 'sqrt',
                             max_depth = 50,
                          min_samples_split = 2,
                           min_samples_leaf = 2
   Gradient Boosting        n_estimators = 200                    0.57                        0.43
Table 4
TF-IDF Vectorization on POS Tags
        Classifier             Parameters                 Accuracy              Macro Average
 Multinomial Naive-            alpha = 0.0                  0.48                    0.23
       Bayes
 K-Nearest Neighbors      p=2, n_neighbors = 29,            0.52                    0.37
                               leaf_size = 45
    Random Forest           n_estimators = 400,             0.54                    0.35
                           max_features = 'sqrt',
                             max_depth = 30,
                            min_samples_split =
                           10, min_samples_leaf
                                    =2
  Gradient Boosting         n_estimators = 200              0.58                    0.44

Table 5
TF-IDF Vectorization on Semantic Tags
        Classifier             Parameters                 Accuracy              Macro Average
  Multinomial Naive-            alpha = 0.1                 0.49                    0.29
         Bayes
 K-Nearest Neighbors p=2, n_neighbors = 27,                 0.35                    0.24
                              leaf_size = 12
    Random Forest          n_estimators = 200,              0.52                    0.32
                          max_features = 'sqrt',
                            max_depth = 30,
                           min_samples_split =
                          10, min_samples_leaf
                                    =1
   Gradient Boosting       n_estimators = 200               0.52                    0.42

Table 6
TF-IDF Vectorization on All Three Representations, using a sparse matrix form
        Classifier               Parameters                Accuracy             Macro Average
  Multinomial Naive-             alpha = 0.0                 0.62                   0.45
         Bayes
 K-Nearest Neighbors p=2, n_neighbors = 19,                  0.51                   0.34
                                leaf_size = 6
    Random Forest           n_estimators = 1000,             0.57                   0.39
                            max_features = auto,
                              max_depth = 30,
                             min_samples_split =
                            10, min_samples_leaf
                                     =2
   Gradient Boosting         n_estimators = 200              0.59                   0.48
    3.2. BERT
       Table 6 highlight the performance of BERT; it is clear from this table that the best setup is with
3 epochs, yielding an F1 of 0.5 on the training dataset split.

Table 7
Validation accuracy of BERT on the training dataset split.
         Epoch      Training Validation Validation Validation              Training      Validation
                       loss        loss       accuracy      F1              Time           Time
           1          1.31         1.26         0.50       0.50            0:00:44        0:00:02
           2             1.29       1.25          0.48         0.48        0:00:47        0:00:03
           3             1.25       1.24          0.50         0.50        0:00:50        0:00:03


    3.3. RoBERTa

        RoBERTa accuracy is very different, depending on the label; the F1-macro is 0.37. In Table 9
we can see a confusion matrix of the model, unfortunately the imbalance of label has left the system
unable to predict ‘other’ label, it is only good at ‘false’ and ‘partially false’.

Table 8
Classification report for RoBERTa on training data
                RoBERTa        Precision        Recall             F1           Support
                 False            0.65            0.85            0.74             97
                 True             0.35            0.20            0.26             30
               Partially          0.50            0.49            0.49             47
                false
                Other             0.00            0.00            0.00             15
               Accuracy                                           0.59            189
               Macro avg          0.38            0.38            0.37            189
               Weighted           0.51            0.59            0.54            189
                 avg

Table 9
Confusion matrix for RoBERTa on training data.
                Other            0                 3               2               10
               Partially           0               23              4               20
                False
                True               0               10              6               14
                 False             0               10              5               82
                                Partially        False            True           Other
                                 False



    3.4. LSTM

   The accuracy and loss measured for this model are 0.563157 and 1.405469.
Table 10
Confusion matrix for LSTM on training data.
              Partially          23                25               5               0
                false
                False            10                78               4               0
                  True              9              12               6               0
                 Other              3              14               1               0
                                Partially         False           True            Other
                                 False


    3.5. Bi-LSTM

The results were not the best, mainly to the fact that, the dataset was small, the F1-macro for this model
has been measured at 0.33.
Table 11
Classification report for Bi-LSTM on training data.
                Bi-LSTM         Precision          Recall            F1            Support
                 False            0.58            0.73            0.64              92
                  True            0.43            0.11            0.18              27
                Partially         0.46            0.58            0.52              53
                 false
                 Other            0.00            0.00            0.00              18
              Macro avg           0.37            0.36            0.33             190
               Weighted           0.47            0.53            0.48             190
                 avg

Table 12
Confusion matrix for Bi-LSTM on training data.
                False            67                 1              24               0
                  True             14               3              10               0
                Partially          21               1              31               0
                 false
                 Other             14               2               2               0
                                  False           True          Partially         Other
                                                                 False



    3.6. Results conclusions

   To conclude the results section, we had 5 models, the best approach seems to be the 3Layer
weighted method that officially has an F1-macro of 0.44. We were unable to calculate the other
scores with the gold label and the organizers did not provide a ranking. Mostly the results seem to
revolve around a score of 0.5 which is in part related to the small dimension of the dataset and the fact
that many of our models relied on neural network which require large training sets.
4. Conclusions

   To conclude, in this paper we presented our run at the CLEF 2021 Task 3a; our best method had a
F1-macro of 0.44 ranking us 6th. We proposed multiple mod-els based on different methods, for future
work we plan on increasing the dataset as well as create a system based on inference so that the article
content will be verified using different ontologies.

5. Acknowledgements

   Special thanks go to: Smau Adrian-Constantin, Mosor Andre, Radu Rares-Aurelian, Gramescu
George-Rares, Filipescu Iustina-Andreea without whom this work would not have been possible. This
work was supported by project REVERT (taRgeted thErapy for adVanced colorEctal canceR paTients),
Grant Agreement number: 848098, H2020-SC1-BHC-2018-2020/H2020-SC1-2019-Two-Stage-RTD.

6. References

   [1] Preslav Nakov and Giovanni Da San Martino and Tamer Elsayed and Alberto Barr'on- Cedeño
   and Rubén M'iguez and Shaden Shaar and Firoj Alam and Fatima Haouari and Maram Hasanain and
   Nikolay Babulkov and Alex Nikolov and Gautam Kishore Shahi and Julia Maria Struss and Thomas
   Mandl (2021). The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously
   Fact-Checked Claims, and Fake News. In Advances in Information Retrieval - 43rd European
   Conference on IR Research, ECIR 2021, Virtual Event, March 28 - April 1, 2021, Proceedings, Part
   II (pp. 639–649). Springer.
   [2] Shahi, G., Dirkson, A., & Majchrzak, T. (2021). An exploratory study of covid-19
   misinformation on twitter. Online Social Networks and Media, 22, 100104.
   [3] X. Zhang, and A.A. Ghorbani, “An overview of online fake news: Characterization, detection,
   and discussion.” In Information Processing & Management, vol. 57 (2), 2020, 102025, ISSN 0306-
   4573, https://doi.org/10.1016/j.ipm.2019.03.004.
   [4] Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer, and Alessandro Flammini. 2016.
   The rise of social bots. Commun. ACM 59, 7 (July 2016), 96–104.
   DOI:https://doi.org/10.1145/2818717
   [5] J. Zhao, N. Cao, Z. Wen, Y. Song, Y. Lin and C. Collins, "#FluxFlow: Visual Analysis of
   Anomalous Information Spreading on Social Media," in IEEE Transactions on Visualization and
   Computer Graphics, vol. 20, no. 12, pp. 1773-1782, 31 Dec. 2014, doi:
   10.1109/TVCG.2014.2346922.
   [6] Ghosh, Rumi, Tawan Surachawala, and Kristina Lerman. "Entropy-based classification
   of'retweeting'activity on twitter." arXiv preprint arXiv:1106.0346 (2011).
   [7] Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby
   Bhattacharjee. 2007. Measurement and analysis of online social networks. In Proceedings of the 7th
   ACM SIGCOMM conference on Internet measurement (IMC '07). Association for Computing
   Machinery, New York, NY, USA, 29–42. DOI:https://doi.org/10.1145/1298306.1298311
   [8] J. P. Dickerson, V. Kagan and V. S. Subrahmanian, "Using sentiment to detect bots on Twitter:
   Are humans more opinionated than bots?," 2014 IEEE/ACM International Conference on Advances
   in Social Networks Analysis and Mining (ASONAM 2014), 2014, pp. 620-627, doi:
   10.1109/ASONAM.2014.6921650.
   [9] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on
   twitter. In Proceedings of the 20th international conference on World wide web (WWW '11).
   Association      for    Computing        Machinery,     New    York,    NY,     USA,      675–684.
   DOI:https://doi.org/10.1145/1963405.1963500
   [10] Diakopoulos, N., Naaman, M., Kivran-Swaine, F. 2010. Diamonds in the rough: Social media
   visual analytics for journalistic inquiry. In VAST 10 - IEEE Conference on Visual Analytics Science
   and Technology 2010, Proceedings, art. no. 5652922, pp. 115-122. DOI:
   10.1109/VAST.2010.5652922
    [11] Pritika Bahad, Preeti Saxena, Raj Kamal, Pritika Bahad, Preeti Saxena, Raj Kamal, Fake News
    Detection using Bi-directional LSTM-Recurrent Neural Network, Procedia Computer Science,
    Volume 165, 2019, Pages 74-82, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2020.01.072.
    [12] C.G. Cusmuliuc, L.G. Coca, and A. Iftene, “Identifying Fake News on Twitter using Naive
    Bayes, SVM and Random Forest Distributed Algorithms.” In Proceedings of The 13th Edition of
    the International Conference on Linguistic Resources and Tools for Processing Romanian Language
    (ConsILR-2018), 2018, pp. 177-188.
    [13] Jwa, H., Oh, D., Park, K., Kang, J. M., & Lim, H. (2019). exBAKE: automatic fake news
    detection model based on bidirectional encoder representations from transformers (bert). Applied
    Sciences, 9(19), 4062.
    [14] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language
    understanding." arXiv preprint arXiv:1810.04805 (2018).
    [15] Fast, Ethan, Binbin Chen, and Michael S. Bernstein. "Empath: Understanding topic signals in
    large-scale text." Proceedings of the 2016 CHI conference on human factors in computing systems.
    2016.
    [16] Liu, Yinhan, et al. "Roberta: A robustly optimized bert pretraining approach." arXiv preprint
    arXiv:1907.11692 (2019).
    [17] Shahi, Julia Maria and Thomas Mandl. "Overview of the CLEF-2021 CheckThat! Lab Task 3
    on Fake News Detection." Working Notes of CLEF 2021–-Conference and Labs of the Evaluation
    Forum.
    [18] P. Nakov, G. Da San Martino, T. Elsayed, A. Barrón-Cedeño, R. Míguez, S. Shaar, F. Alam,F.
    Haouari, M. Hasanain, N. Babulkov, A. Nikolov, G. K. Shahi, J. M. Struß, T. Mandl, S. Modha, M.
    Kutlu, Y. S. Kartal, Overview of the CLEF-2021 CheckThat! Lab on Detecting Check-Worthy
    Claims, Previously Fact-Checked Claims, and Fake News, in: Proceedings of the 12th International
    Conference of the CLEF Association: Information Access Evaluation Meets Multiliguality,
    Multimodality, and Visualization, CLEF ’2021, Bucharest, Romania(online), 2021

                                                           Apendix A
lexicon.analyze("he hit the other person", normalize=True)
# => {'help': 0.0, 'office': 0.0, 'violence': 0.2, 'dance': 0.0, 'money': 0.0, 'wedding': 0.0, 'valuable': 0.0,
'domestic_work': 0.0, 'sleep': 0.0, 'medical_emergency': 0.0, 'cold': 0.0, 'hate': 0.0, 'cheerfulness': 0.0,
'aggression': 0.0, 'occupation': 0.0, 'envy': 0.0, 'anticipation': 0.0, 'family': 0.0, 'crime': 0.0, 'attractive': 0.0,
'masculine': 0.0, 'prison': 0.0, 'health': 0.0, 'pride': 0.0, 'dispute': 0.0, 'nervousness': 0.0, 'government': 0.0,
'weakness': 0.0, 'horror': 0.0, 'swearing_terms': 0.0, 'leisure': 0.0, 'suffering': 0.0, 'royalty': 0.0, 'wealthy': 0.0,
'white_collar_job': 0.0, 'tourism': 0.0, 'furniture': 0.0, 'school': 0.0, 'magic': 0.0, 'beach': 0.0, 'journalism': 0.0,
'morning': 0.0, 'banking': 0.0, 'social_media': 0.0, 'exercise': 0.0, 'night': 0.0, 'kill': 0.0, 'art': 0.0, 'play': 0.0,
'computer': 0.0, 'college': 0.0, 'traveling': 0.0, 'stealing': 0.0, 'real_estate': 0.0, 'home': 0.0, 'divine': 0.0, 'sexual':
0.0, 'fear': 0.0, 'monster': 0.0, 'irritability': 0.0, 'superhero': 0.0, 'business': 0.0, 'driving': 0.0, 'pet': 0.0, 'childish':
0.0, 'cooking': 0.0, 'exasperation': 0.0, 'religion': 0.0, 'hipster': 0.0, 'internet': 0.0, 'surprise': 0.0, 'reading': 0.0,
'worship': 0.0, 'leader': 0.0, 'independence': 0.0, 'movement': 0.2, 'body': 0.0, 'noise': 0.0, 'eating': 0.0, 'medieval':
0.0, 'zest': 0.0, 'confusion': 0.0, 'water': 0.0, 'sports': 0.0, 'death': 0.0, 'healing': 0.0, 'legend': 0.0, 'heroic': 0.0,
'celebration': 0.0, 'restaurant': 0.0, 'ridicule': 0.0, 'programming': 0.0, 'dominant_heirarchical': 0.0, 'military': 0.0,
'neglect': 0.0, 'swimming': 0.0, 'exotic': 0.0, 'love': 0.0, 'hiking': 0.0, 'communication': 0.0, 'hearing': 0.0, 'order':
0.0, 'sympathy': 0.0, 'hygiene': 0.0, 'weather': 0.0, 'anonymity': 0.0, 'trust': 0.0, 'ancient': 0.0, 'deception': 0.0,
'fabric': 0.0, 'air_travel': 0.0, 'fight': 0.0, 'dominant_personality': 0.0, 'music': 0.0, 'vehicle': 0.0, 'politeness': 0.0,
'toy': 0.0, 'farming': 0.0, 'meeting': 0.0, 'war': 0.0, 'speaking': 0.0, 'listen': 0.0, 'urban': 0.0, 'shopping': 0.0,
'disgust': 0.0, 'fire': 0.0, 'tool': 0.0, 'phone': 0.0, 'gain': 0.0, 'sound': 0.0, 'injury': 0.0, 'sailing': 0.0, 'rage': 0.0,
'science': 0.0, 'work': 0.0, 'appearance': 0.0, 'optimism': 0.0, 'warmth': 0.0, 'youth': 0.0, 'sadness': 0.0, 'fun': 0.0,
'emotional': 0.0, 'joy': 0.0, 'affection': 0.0, 'fashion': 0.0, 'lust': 0.0, 'shame': 0.0, 'torment': 0.0, 'economics': 0.0,
'anger': 0.0, 'politics': 0.0, 'ship': 0.0, 'clothing': 0.0, 'car': 0.0, 'strength': 0.0, 'technology': 0.0, 'breaking': 0.0,
'shape_and_size': 0.0, 'power': 0.0, 'vacation': 0.0, 'animal': 0.0, 'ugliness': 0.0, 'party': 0.0, 'terrorism': 0.0,
'smell': 0.0, 'blue_collar_job': 0.0, 'poor': 0.0, 'plant': 0.0, 'pain': 0.2, 'beauty': 0.0, 'timidity': 0.0, 'philosophy':
0.0, 'negotiate': 0.0, 'negative_emotion': 0.0, 'cleaning': 0.0, 'messaging': 0.0, 'competing': 0.0, 'law': 0.0, 'friends':
0.0, 'payment': 0.0, 'achievement': 0.0, 'alcohol': 0.0, 'disappointment': 0.0, 'liquid': 0.0, 'feminine': 0.0, 'weapon':
0.0, 'children': 0.0, 'ocean': 0.0, 'giving': 0.0, 'contentment': 0.0, 'writing': 0.0, 'rural': 0.0, 'positive_emotion': 0.0,
'musical': 0.0}
                                                Apendix B


class BertClassifier(nn.Module):
   def __init__(self, dropout=0.1):
     super(BertClassifier, self).__init__()
     self.bert = BertModel.from_pretrained('bert-base-uncased')
     self.dropout = nn.Dropout(dropout)
     self.linear = nn.Linear(768, 1)
     self.sigmoid = nn.Sigmoid()
   def forward(self, tokens, masks=None):
     _, pooled_output = self.bert(tokens, attention_mask=masks, output_all_encoded_layers=False)
     dropout_output = self.dropout(pooled_output)
     linear_output = self.linear(dropout_output)
     proba = self.sigmoid(linear_output)
     return proba
                                                 Apendix C

class ROBERTA(torch.nn.Module, Model):
   def __init__(self, text, dropout_rate=0.4):
     super(ROBERTA, self).__init__()
     # Model.__init__(text)
     self.text = text
     self.tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
     self.roberta = RobertaModel.from_pretrained('roberta-base',return_dict=False, num_labels = 4)
     self.d1 = torch.nn.Dropout(dropout_rate)
     self.l1 = torch.nn.Linear(768, 64)
     self.bn1 = torch.nn.LayerNorm(64)
     self.d2 = torch.nn.Dropout(dropout_rate)
     self.l2 = torch.nn.Linear(64, 4)