<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UAICS at CheckThat! 2021: Fake news detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ciprian G. Cusmuliuc</string-name>
          <email>gabriel.cusmuliuc@info.uaic.ro</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matei A. Amarandei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ioana Pelin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vlad I. Cociorva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adrian Iftene</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Boosting</institution>
          ,
          <addr-line>Naïve Bayes, KNN</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fake news detection, LSTM</institution>
          ,
          <addr-line>Bi-LSTM, BERT, RoBERTa, Random Forest, Gradient</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Alexandru Ioan Cuza” University, Faculty of Computer Science</institution>
          ,
          <addr-line>Iasi</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Social media growth in recent years has facilitated an enhancement in human communication. Platforms such as Facebook and Twitter are now ever-present in our lives, influencing how we speak, think and act. The growth of fake news greatly impacts this phenomenon as it lowers one's trust in the content presented. One such example is related to the 2016 U.S. presidential election campaign where fake news was a deciding factor in tipping the balance of power. It is hence of critical importance to develop tools that detect and combat such destructive content. CLEF 2021 CheckThat! Task 3 tries to address the problem of fake news, posing a challenge to develop systems that could detect if the main claim made in an article is true, partially true, false, or other. Our team participated in this task with 5 models, ranking 6th place with an F1macro of 0.44 and a model based on Gradient Boosting; in this paper we will present our methods, runs and results but also discuss future work.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>2021 Copyright for this paper by its authors.
Section 3 details the results we obtained and finally Section 4 concludes this paper and presents future
work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods and runs</title>
      <p>In this section we will detail the submitted models; 5 models have been developed in the search of
finding the best one, we relied on state-of-the-art methods such as LSTM, Bi-LSTM, BERT, RoBERTa
but also experimented with a few novel methods based on more traditional techniques such as Gradient
Boosting, Naïve Bayes, KNN and Random Forest. In future sections we will take a look at
state-of-theart techniques, analyze the dataset as well as discuss our models and preprocessing.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>State of the art</title>
      <p>
        Research interest in fake news classification has grown exponentially in just a few years.
Identification efforts have been very diverse but they all can be summarized in 3 big categories as [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
outlines: creator and user analysis, social context analysis and news content analysis.
      </p>
      <p>
        Creator and user analysis focuses on extensive analysis of user accounts in order to identify
malicious behaviors. Malicious user accounts behave differently from authentic users; thus,
identification is possible. Different user categorization can be achieved using different techniques: user
profiling analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], temporal and posting behavior analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], credibility-related analysis [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
and sentiment-related analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Considering user information was not available in the CheckThat!
dataset, these techniques would not have been possible to apply.
      </p>
      <p>
        Social context analysis tries to study how the news disseminates in the social environment, meaning
how quick and wide the data is share/distributed and how users interact with each other, having 2 big
research areas: user network analysis (users with high interaction with the news creator can be used to
predict the truthfulness of the news) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and distribution pattern analysis (analysis of the information
spread in the network) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Just like creator and user analysis, social analysis is not feasible on this
task, not to mention that this technique is not used often. Many approaches choose to analyze the news
itself.
      </p>
      <p>
        News content analysis in contrast to creator and user analysis does not focus on who posts rather on
what they post. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] they used a multitude of neural networks in combination with GloVE embedding
to predict the label of a news article; the best result was with a Bi-LSTM, accuracy of 0.91, but notable
results were obtained with CNN (0.90) and vanilla RNN (0.78). [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] takes a different approach based
on machine learning, implying Naïve Bayes, Gradient Boosting and Random Forest in order to identify
a series of 10000 tweets collected in August 2012, concluding that Random Forest is the best algorithm
with an accuracy of 96%. Finally [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] uses the most novel techniques at this time, BERT [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]; they
start off by tokenizing the input string, then padding after which feeding it to a pre-trained large cased
BERT model to perform the classification which yields an accuracy of 0.69 on a test dataset.
      </p>
      <p>Knowing thus what the best models are but also what their limitations were we proceeded with
training them in order to see a result.</p>
      <p>2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Training and test dataset analysis</title>
      <p>The training and test dataset have been provided by the organizers and examples can be seen in
Table 1 and 2. The training dataset consisted of 945 labeled articles and the test dataset had 365
unlabeled articles. This small number of articles proved to be a disadvantage to neural network models
as we did not use any other additional datasets.</p>
      <p>In Figure 1 a dataset analysis is done; taking the Task 3a batches we plot them in order to gain
some insight in the collection. In the left size of the figure a word cloud view of most frequent words
in the dataset has been build, with the biggest topics being related to politics and COVID-19. The
right part of the figure also confirms the latter assumption as there we can see the most frequent
words, such as “trump”, “covid19” and so on (the plots have been done with tokenized data).</p>
      <p>A problem that was identified early on and will greatly impact the results is relat-ed to label
imbalance. Figure 2 shows in different representations how many articles are available with a certain
label, unfortunately since False is the most common one, automatically the algorithms will be biased
in that direction (0-False, 1-Other, 2-Partially False, 3-True).</p>
    </sec>
    <sec id="sec-5">
      <title>2.3. Models</title>
      <p>2.3.1. 3Layer Model</p>
      <p>The first model, and the one which proven to be the most performant, has been named “3Layer
Model” because of its use of 3 different preprocessing methods and 3 different Machine Learning
algorithms used.</p>
      <p>In the data preparation phase, there have been a series of alterations over the dataset. The public_id
field has been removed, the two training batches have been combined as well as the title field and text,
punctuation signs have been removed as well as stop-words, dashed and underscores and lastly the text
has been lowercased and lemmatized.</p>
      <p>The feature extraction phase consisted of three approaches:
• Clean text is a bigram (a contiguous sequence of n items, where n is 2), the training column will be called
clean_text;
• POS Tagging on text column using spaCY3 to obtain the POS form), the training column will be called</p>
      <p>
        POS_text;
• Semantic Analysis is done using Stanford’s Empath Tool4 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to categorize the words in the articles by
their lexicon and approximate which articles that are fake predominantly use a certain lexicon (this column
was named semantics_text). An example can be seen in appendix A.
      </p>
      <p>Besides the three aforementioned techniques we created a fourth one by weighting them as follows: clean_text:
0.5, POST_tagging: 0.15 and semantic_text: 0.35 (these values have been determined experimentally).</p>
      <p>On the columns mentioned earlier, clean_text, POS_text and semantics_text, in order to feed the data to the
M.L. algorithms we applied TF-IDF.</p>
      <p>As for the models used, they consisted of Naïve Bayes, KNN, Random Forest and Gradient
Boosting. In the results section we will discuss the hyperparameter tuning in relationship to the result;
in the end the most performant variant consisted of Gradient Boosting combined with the weighted
representation of clean text, POS tagging and semantic analysis.</p>
      <p>2.3.2. BERT</p>
      <p>
        Another model developed is based on BERT which yielded great results in many state of the art
systems [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Data preparations for this method consisted of shuffling the training articles, concatenation of the
batches, merging the title and text columns and eliminating public_id (it was redundant to training).
Other operations have consisted of punctuation signs removal, lemmatization, mandatory text padding
and a special BERT tokenizing process.</p>
      <p>As for the model, we used bert-large-uncased (24-layer, 1024 hidden dimensions, 16 attention
heads, 336M parameters) from HuggingFace5 and begun the fine-tuning process. A problem
immediately apparent was the size of the dataset as BERT requires many training entities. We used
AdamW Optimizer (fine-tuned the learning rate as well as possible, 6e-6 yielding the best results), 3
epochs and a batch size of 3.</p>
      <p>Figure 2 presents the Training and Validation loss over the epochs; training set contained 70% of
the data, 20% for testing and 10% for validation. Apendix B shows a snippet of the BERT classifier.</p>
      <p>2.3.3. RoBERTa</p>
      <p>
        Since RoBERTa [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] proves to be better than BERT in some scenarios, we were eager to use it and
compare the results. The pre-trained RoBERTa has been taken from HuggingFace as well, we used the
model ‘roberta-base’6.
      </p>
      <p>The data processing is similar to BERT. The dataset has been split as follows: 70% of data is for
training, 20% testing and 10% validation. Hyperparameters used are: text sequence is 256, batches are
of 32 elements. Code samples are available in appendix C.</p>
      <p>2.3.4. LSTM</p>
      <p>The fourth implemented model is LSTM. Training and testing have been done on an 80-20 split.
The data processing involves combining the title and text columns and then applying
SnowballStemmer7 from NTLK8 to stem the text. The text has also been tokenized using Keras’s
Tokenizer.</p>
      <p>Feature extraction uses Word2Vec as it preserves semantic meaning of words in documents, using
the embedding matrix resulted we fed it to the model.</p>
      <p>The model is built with Tensorflow and it’s a combination of the following layers:
• Embedding layer;
• Dropout layer with a dropout rate of 0.3;
• LSTM layer with 100 units with a recurrent dropout (fraction of the units to drop for the linear transformation
of the recurrent state) of 0.2 and a dropout of 0.2 (fraction of the units to drop for the linear transformation
of the inputs);
• Dense layer with 4 units (because we predict 4 labels) and using SoftMax activation function.</p>
      <p>The loss function used was sparse categorical cross entropy with Adam optimizer. The total params of the
model were 2,648,304. The optimum number of epochs found were 8 and the batch size 16. We used callback
functions such as ReduceLROnPlateau9 to reduce the learning rate if the accuracy does not improve and early
stopping to halt training if the model does not improve.</p>
      <p>2.3.5. Bi-LSTM</p>
      <p>The fifth and final implemented model is an improvement effort on the previous LSTM network.
The dataset split was: 90% training and 10% validation.</p>
      <p>The title and text columns were merged in a single column, just like all the models. The newly
formed total column was then processed by removing every stop word and lemmatizing it using NLTK.
Finally, the sentences were converted to lowercase and had their whitespaces removed.</p>
      <p>The text was tokenized using the Keras Tokenizer. The word index generated length was 27401. For
extracting the features, we used GloVe embedding (Global Vectors for Word Representation) with 100
dimensions. Training is performed on aggregated global word-word co-occurrence statistics from a
corpus, and the resulting representations show case interesting linear substructures of the word vector
space.</p>
      <p>For building the model we used Tensorflow. The model was build using the Bidirectional LSTM
architecture. We experimented with a lot of combinations of layers but the one that gave the best results
during the validation stage was the following (in order):
• Embedding layer with the input dimension equaling the word index length (27401), output dimension
equaling the number of embedding dimensions (100) and the input length equaling the maximum sentence
length from the training test.
• Bidirectional LSTM layer with 64 units and return sequences set to true.
• Bidirectional LSTM layer with 32 units.
• Dropout layer with dropout rate equaling 0.25 to better handle the overfitting due to the small dataset.
• Dense layer with 4 units (because it predicts 4 labels) and softmax.</p>
      <p>The loss function we used was sparse categorical cross entropy with Adam optimizer. The total params of the
model was 2,866,156. We experimented with many variations of values for the number of epochs and batch sizes,
but the best performing was setting the number of epochs to 5 and the batch size to 32.</p>
    </sec>
    <sec id="sec-6">
      <title>3. Results</title>
      <p>3.1.3Layer Model</p>
      <p>In this section we will discuss the results of the 3Layer model as well as parameter tuning on the models.
Throughout Table 3 to 6 there have been experiments with each of the 3 feature extraction methods (clean text,
POS tagging and semantic tags) as well as a weighted approach of the three; what worked best in the end is the
weighed approach combined with Gradient Boosting, this combination earned us 6th place with a F1-macro of
0.44.
Gradient Boosting
p=2, n_neighbors = 29,</p>
      <p>leaf_size = 45
n_estimators = 1000,
max_features = 'sqrt',</p>
      <p>max_depth = 50,
min_samples_split = 2,
min_samples_leaf = 2
n_estimators = 200</p>
      <p>Accuracy
0.57
0.61
0.47
Gradient Boosting</p>
      <p>The accuracy and loss measured for this model are 0.563157 and 1.405469.
False
0.73
3.6.Results conclusions</p>
      <p>To conclude the results section, we had 5 models, the best approach seems to be the 3Layer
weighted method that officially has an F1-macro of 0.44. We were unable to calculate the other
scores with the gold label and the organizers did not provide a ranking. Mostly the results seem to
revolve around a score of 0.5 which is in part related to the small dimension of the dataset and the fact
that many of our models relied on neural network which require large training sets.
4
6
1
True
0
0
0</p>
    </sec>
    <sec id="sec-7">
      <title>4. Conclusions</title>
      <p>To conclude, in this paper we presented our run at the CLEF 2021 Task 3a; our best method had a
F1-macro of 0.44 ranking us 6th. We proposed multiple mod-els based on different methods, for future
work we plan on increasing the dataset as well as create a system based on inference so that the article
content will be verified using different ontologies.</p>
    </sec>
    <sec id="sec-8">
      <title>5. Acknowledgements</title>
      <p>Special thanks go to: Smau Adrian-Constantin, Mosor Andre, Radu Rares-Aurelian, Gramescu
George-Rares, Filipescu Iustina-Andreea without whom this work would not have been possible. This
work was supported by project REVERT (taRgeted thErapy for adVanced colorEctal canceR paTients),
Grant Agreement number: 848098, H2020-SC1-BHC-2018-2020/H2020-SC1-2019-Two-Stage-RTD.</p>
    </sec>
    <sec id="sec-9">
      <title>6. References</title>
      <p>Apendix A
lexicon.analyze("he hit the other person", normalize=True)
# =&gt; {'help': 0.0, 'office': 0.0, 'violence': 0.2, 'dance': 0.0, 'money': 0.0, 'wedding': 0.0, 'valuable': 0.0,
'domestic_work': 0.0, 'sleep': 0.0, 'medical_emergency': 0.0, 'cold': 0.0, 'hate': 0.0, 'cheerfulness': 0.0,
'aggression': 0.0, 'occupation': 0.0, 'envy': 0.0, 'anticipation': 0.0, 'family': 0.0, 'crime': 0.0, 'attractive': 0.0,
'masculine': 0.0, 'prison': 0.0, 'health': 0.0, 'pride': 0.0, 'dispute': 0.0, 'nervousness': 0.0, 'government': 0.0,
'weakness': 0.0, 'horror': 0.0, 'swearing_terms': 0.0, 'leisure': 0.0, 'suffering': 0.0, 'royalty': 0.0, 'wealthy': 0.0,
'white_collar_job': 0.0, 'tourism': 0.0, 'furniture': 0.0, 'school': 0.0, 'magic': 0.0, 'beach': 0.0, 'journalism': 0.0,
'morning': 0.0, 'banking': 0.0, 'social_media': 0.0, 'exercise': 0.0, 'night': 0.0, 'kill': 0.0, 'art': 0.0, 'play': 0.0,
'computer': 0.0, 'college': 0.0, 'traveling': 0.0, 'stealing': 0.0, 'real_estate': 0.0, 'home': 0.0, 'divine': 0.0, 'sexual':
0.0, 'fear': 0.0, 'monster': 0.0, 'irritability': 0.0, 'superhero': 0.0, 'business': 0.0, 'driving': 0.0, 'pet': 0.0, 'childish':
0.0, 'cooking': 0.0, 'exasperation': 0.0, 'religion': 0.0, 'hipster': 0.0, 'internet': 0.0, 'surprise': 0.0, 'reading': 0.0,
'worship': 0.0, 'leader': 0.0, 'independence': 0.0, 'movement': 0.2, 'body': 0.0, 'noise': 0.0, 'eating': 0.0, 'medieval':
0.0, 'zest': 0.0, 'confusion': 0.0, 'water': 0.0, 'sports': 0.0, 'death': 0.0, 'healing': 0.0, 'legend': 0.0, 'heroic': 0.0,
'celebration': 0.0, 'restaurant': 0.0, 'ridicule': 0.0, 'programming': 0.0, 'dominant_heirarchical': 0.0, 'military': 0.0,
'neglect': 0.0, 'swimming': 0.0, 'exotic': 0.0, 'love': 0.0, 'hiking': 0.0, 'communication': 0.0, 'hearing': 0.0, 'order':
0.0, 'sympathy': 0.0, 'hygiene': 0.0, 'weather': 0.0, 'anonymity': 0.0, 'trust': 0.0, 'ancient': 0.0, 'deception': 0.0,
'fabric': 0.0, 'air_travel': 0.0, 'fight': 0.0, 'dominant_personality': 0.0, 'music': 0.0, 'vehicle': 0.0, 'politeness': 0.0,
'toy': 0.0, 'farming': 0.0, 'meeting': 0.0, 'war': 0.0, 'speaking': 0.0, 'listen': 0.0, 'urban': 0.0, 'shopping': 0.0,
'disgust': 0.0, 'fire': 0.0, 'tool': 0.0, 'phone': 0.0, 'gain': 0.0, 'sound': 0.0, 'injury': 0.0, 'sailing': 0.0, 'rage': 0.0,
'science': 0.0, 'work': 0.0, 'appearance': 0.0, 'optimism': 0.0, 'warmth': 0.0, 'youth': 0.0, 'sadness': 0.0, 'fun': 0.0,
'emotional': 0.0, 'joy': 0.0, 'affection': 0.0, 'fashion': 0.0, 'lust': 0.0, 'shame': 0.0, 'torment': 0.0, 'economics': 0.0,
'anger': 0.0, 'politics': 0.0, 'ship': 0.0, 'clothing': 0.0, 'car': 0.0, 'strength': 0.0, 'technology': 0.0, 'breaking': 0.0,
'shape_and_size': 0.0, 'power': 0.0, 'vacation': 0.0, 'animal': 0.0, 'ugliness': 0.0, 'party': 0.0, 'terrorism': 0.0,
'smell': 0.0, 'blue_collar_job': 0.0, 'poor': 0.0, 'plant': 0.0, 'pain': 0.2, 'beauty': 0.0, 'timidity': 0.0, 'philosophy':
0.0, 'negotiate': 0.0, 'negative_emotion': 0.0, 'cleaning': 0.0, 'messaging': 0.0, 'competing': 0.0, 'law': 0.0, 'friends':
0.0, 'payment': 0.0, 'achievement': 0.0, 'alcohol': 0.0, 'disappointment': 0.0, 'liquid': 0.0, 'feminine': 0.0, 'weapon':
0.0, 'children': 0.0, 'ocean': 0.0, 'giving': 0.0, 'contentment': 0.0, 'writing': 0.0, 'rural': 0.0, 'positive_emotion': 0.0,
'musical': 0.0}</p>
      <p>Apendix B</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          and
          <article-title>Giovanni Da San Martino and Tamer Elsayed and Alberto Barr'on- Cedeño and Rubén M'iguez and Shaden Shaar and Firoj Alam and Fatima Haouari and Maram Hasanain and Nikolay Babulkov and Alex Nikolov and Gautam Kishore Shahi and Julia Maria Struss</article-title>
          and Thomas
          <string-name>
            <surname>Mandl</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News</article-title>
          .
          <source>In Advances in Information Retrieval - 43rd European Conference on IR Research</source>
          , ECIR
          <year>2021</year>
          ,
          <string-name>
            <surname>Virtual</surname>
            <given-names>Event</given-names>
          </string-name>
          ,
          <year>March</year>
          28 - April 1,
          <year>2021</year>
          , Proceedings,
          <string-name>
            <surname>Part II</surname>
          </string-name>
          (pp.
          <fpage>639</fpage>
          -
          <lpage>649</lpage>
          ). Springer.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Shahi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dirkson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Majchrzak</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>An exploratory study of covid-19 misinformation on twitter</article-title>
          .
          <source>Online Social Networks and Media</source>
          ,
          <volume>22</volume>
          ,
          <fpage>100104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A.A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          , “
          <article-title>An overview of online fake news: Characterization, detection, and discussion</article-title>
          .”
          <source>In Information Processing &amp; Management</source>
          , vol.
          <volume>57</volume>
          (
          <issue>2</issue>
          ),
          <year>2020</year>
          , 102025, ISSN 0306- 4573, https://doi.org/10.1016/j.ipm.
          <year>2019</year>
          .
          <volume>03</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Emilio</given-names>
            <surname>Ferrara</surname>
          </string-name>
          , Onur Varol, Clayton Davis,
          <string-name>
            <given-names>Filippo</given-names>
            <surname>Menczer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Flammini</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The rise of social bots</article-title>
          .
          <source>Commun. ACM 59</source>
          ,
          <issue>7</issue>
          (
          <year>July 2016</year>
          ),
          <fpage>96</fpage>
          -
          <lpage>104</lpage>
          . DOI:https://doi.org/10.1145/2818717
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Collins</surname>
          </string-name>
          ,
          <article-title>"#FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media,"</article-title>
          <source>in IEEE Transactions on Visualization and Computer Graphics</source>
          , vol.
          <volume>20</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1773</fpage>
          -
          <lpage>1782</lpage>
          , 31 Dec.
          <year>2014</year>
          , doi: 10.1109/TVCG.
          <year>2014</year>
          .
          <volume>2346922</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Ghosh</surname>
            , Rumi,
            <given-names>Tawan</given-names>
          </string-name>
          <string-name>
            <surname>Surachawala</surname>
            , and
            <given-names>Kristina</given-names>
          </string-name>
          <string-name>
            <surname>Lerman</surname>
          </string-name>
          .
          <article-title>"Entropy-based classification of'retweeting'activity on twitter</article-title>
          .
          <source>" arXiv preprint arXiv:1106.0346</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Alan</given-names>
            <surname>Mislove</surname>
          </string-name>
          , Massimiliano Marcon,
          <string-name>
            <given-names>Krishna P.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Druschel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bobby</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Measurement and analysis of online social networks</article-title>
          .
          <source>In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (IMC '07)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          . DOI:https://doi.org/10.1145/1298306.1298311
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Dickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kagan</surname>
          </string-name>
          and
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          ,
          <article-title>"Using sentiment to detect bots on Twitter: Are humans more opinionated than bots?,"</article-title>
          <source>2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM</source>
          <year>2014</year>
          ),
          <year>2014</year>
          , pp.
          <fpage>620</fpage>
          -
          <lpage>627</lpage>
          , doi: 10.1109/ASONAM.
          <year>2014</year>
          .
          <volume>6921650</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Castillo</surname>
          </string-name>
          , Marcelo Mendoza, and
          <string-name>
            <given-names>Barbara</given-names>
            <surname>Poblete</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Information credibility on twitter</article-title>
          .
          <source>In Proceedings of the 20th international conference on World wide web (WWW '11)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>675</fpage>
          -
          <lpage>684</lpage>
          . DOI:https://doi.org/10.1145/1963405.1963500
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Diakopoulos</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naaman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kivran-Swaine</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Diamonds in the rough: Social media visual analytics for journalistic inquiry</article-title>
          .
          <source>In VAST 10 - IEEE Conference on Visual Analytics Science and Technology</source>
          <year>2010</year>
          , Proceedings, art. no.
          <issue>5652922</issue>
          , pp.
          <fpage>115</fpage>
          -
          <lpage>122</lpage>
          . DOI:
          <volume>10</volume>
          .1109/VAST.
          <year>2010</year>
          .5652922
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Pritika</surname>
            <given-names>Bahad</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Preeti</given-names>
            <surname>Saxena</surname>
          </string-name>
          , Raj Kamal, Pritika Bahad,
          <string-name>
            <given-names>Preeti</given-names>
            <surname>Saxena</surname>
          </string-name>
          , Raj Kamal,
          <article-title>Fake News Detection using Bi-directional LSTM-Recurrent Neural Network, Procedia Computer Science</article-title>
          , Volume
          <volume>165</volume>
          ,
          <year>2019</year>
          , Pages
          <fpage>74</fpage>
          -
          <lpage>82</lpage>
          , ISSN 1877-
          <volume>0509</volume>
          , https://doi.org/10.1016/j.procs.
          <year>2020</year>
          .
          <volume>01</volume>
          .072.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.G.</given-names>
            <surname>Cusmuliuc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.G.</given-names>
            <surname>Coca</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Iftene</surname>
          </string-name>
          , “
          <article-title>Identifying Fake News on Twitter using Naive Bayes, SVM and Random Forest Distributed Algorithms</article-title>
          .”
          <source>In Proceedings of The 13th Edition of the International Conference on Linguistic Resources and Tools for Processing Romanian Language (ConsILR-2018)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jwa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oh</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>exBAKE: automatic fake news detection model based on bidirectional encoder representations from transformers (bert)</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>9</volume>
          (
          <issue>19</issue>
          ),
          <fpage>4062</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jacob</surname>
          </string-name>
          , et al.
          <article-title>"Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:</article-title>
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Fast</surname>
          </string-name>
          , Ethan, Binbin Chen, and
          <string-name>
            <surname>Michael</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>"Empath: Understanding topic signals in large-scale text." Proceedings of the 2016 CHI conference on human factors in computing systems</article-title>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yinhan</surname>
          </string-name>
          , et al.
          <article-title>"Roberta: A robustly optimized bert pretraining approach." arXiv preprint arXiv:</article-title>
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Shahi</surname>
            , Julia Maria and
            <given-names>Thomas</given-names>
          </string-name>
          <string-name>
            <surname>Mandl</surname>
          </string-name>
          .
          <article-title>"Overview of the CLEF-</article-title>
          2021
          <source>CheckThat! Lab Task 3 on Fake News Detection." Working Notes of CLEF</source>
          <year>2021</year>
          -
          <article-title>-Conference and Labs of the Evaluation Forum</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Míguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struß</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Kartal</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference of the CLEF Association: Information Access Evaluation Meets Multiliguality</source>
          , Multimodality, and Visualization, CLEF '
          <year>2021</year>
          , Bucharest, Romania(online),
          <article-title>2021 class ROBERTA(torch</article-title>
          .nn.Module, Model):
          <article-title>def __init__(self, text, dropout_rate=0.4): super(ROBERTA, self</article-title>
          ).__init__() # Model.__init__
          <article-title>(text) self.text = text self.tokenizer = RobertaTokenizer.from_pretrained("roberta-base") self.roberta = RobertaModel.from_pretrained('roberta-base',return_dict=False, num_labels = 4) self.d1 = torch.nn.Dropout(dropout_rate) self.l1 = torch</article-title>
          .nn.
          <source>Linear</source>
          (
          <volume>768</volume>
          , 64) self.bn1 = torch.nn.
          <source>LayerNorm</source>
          (
          <year>64</year>
          <article-title>) self.d2 = torch.nn.Dropout(dropout_rate) self.l2 = torch</article-title>
          .nn.
          <source>Linear</source>
          (
          <volume>64</volume>
          , 4)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>