Fake News Detection in Urdu Language using BERT
Snehaan Bhawal1 , Pradeep Kumar Roy2
1
    Kalinga Institute of Industrial Technology, Odisha, India
2
    Indian Institute of Information Technology Surat, Gujarat, india


                                         Abstract
                                         With the increase in popularity of social media, we can see an increase in the amount of Fake News in
                                         circulation, leading to misleading public opinion. Thus a system of Fake News detection is necessary to
                                         avoid such consequences. Most of such existing Fake News detection systems work with resource-rich
                                         languages like English and Spanish, but very few systems can work with low resource languages like
                                         Urdu. The current study focuses on detecting Fake News in the Urdu language using Machine and Deep
                                         learning techniques. The ‘UrduFake’ data is used in this research, provided to us as a shared task of
                                         FIRE-2021. The experimental outcomes of various models showed that the Transfer learning models
                                         performed better than the Machine learning models and achieved a weighted average F1-score of 0.87
                                         and 0.61 on the validation and test dataset.

                                         Keywords
                                         Fake News Detection, Urdu, Deep Learning


1. Introduction
There has been a steady rise in internet traffic throughout the world. Connectivity between
people has increased with the popularity of social media [1]. Such media houses have now
become the principal source of information for the general public. Due to the unrestricted
nature of such media, there is little to no oversight in the articles being posted. Although it
promotes freedom of speech, it can also be misused to spread Fake News [2]. Most of such
platforms do not verify the articles and promote them according to popularity, leading to the
faster spread of such unverified articles.
    Rubin et al.[3] categorized such deceptive news into three broad groups: i) Serious Fabrication,
ii) Hoaxes and iii) Satire. There have been many cases where these kinds of Fake News was
intentionally spread via social media platforms to mislead the general public [4][5]. This can
be used to target people by discrediting them or creating a situation of political unrest, and
undermining society’s stability. Such articles are usually based on polarizing topics [6] and
garner massive popularity on social media, which in turn helps to promote the same to a wider
audience. Thus there is an urgent need to detect and stop such volatile articles at an earlier
stage of circulation to prevent further spread by assessing the credibility of the said article and
determining it to be trustworthy or not.


Forum for Information Retrieval Evaluation, December 13-17, 2021, India
Envelope-Open mailtosnehaan@gmail.com (S. Bhawal); pkroynitp@gmail.com (P. K. Roy)
Orcid 0000-0002-1072-5326 (S. Bhawal); 0000-0001-5513-2834 (P. K. Roy)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
   However, most of the research regarding the detection of fake news has been done in resource-
rich languages like English, and Spanish [7]. Despite Urdu having more than 100 million
speakers, it has seen very little development in such detection systems due to the absence
of properly labelled data and very few resources for NLP tasks. The event organizers [8, 9]
provided a benchmark data set for Fake News detection in Urdu [10]. The current study utilizes
this data to implement and compare different Machine and Deep Learning models for Fake
News Detection in the Urdu language.
   The rest of the article is summarized as follows: Section 2 discusses related work, while
the task description and data set distribution is explained in Section 3. Section 4 provides the
preprocessing steps taken, followed by the explanation of the proposed methodology in Section
5. The experiment results are discussed in Section 6. Section 7 conclude this research with
limitations and future scope.


2. Literature Review
Automating fake news detection has been a challenging task for a long time, particularly for low
resource languages. Researchers are creating their own data sets [11] [12] due to the presence
of insufficient benchmarked datasets. Zhou and Zafarani [13] introduced four techniques for
fake news detection based on (i) knowledge, (ii) content, (iii) propagation and (iv) source of
origin.
   Rubin et al. [14] developed a model using content-based approach by picking up on the
satirical cues present in a the news articles and implementing a SVM based algorithm with
five features- (i) Absurdity, (ii) Humor, (iii) Grammar, (iv) Negative affect, and (v) Punctuation.
They tested their combinations on 360 different news articles and were able to detect satirical
or potentially misleading news with a F1 score of 0.87. Another study by [15] follows the
propagation-based approach by exploring the social context during news propagation on social
media by looking into the relationship between publishers, articles, and users.
   Regarding Fake News detection in Urdu, the number of research works that has been con-
ducted is very less. To the best of our knowledge, the data set [10] provided by the organizers
serves as the single proper available data for the required task. For works relating to Fake News
Detection in the Urdu language, we can refer to the works done in the previous iteration of the
shared task of FIRE. The study reported by [16] topped the leader board. They used an ensemble
model of a RoBERTa and a CNN model with word and character embedding, respectively.


3. Task and Data description
Nowadays, social networking platforms are one of the primary sources of information used to
spread Fake news. Mostly, the existing system is built with a non-Urdu language dataset. Hence,
the news written in Urdu may not be detected by the system. The current study implements and
shows a comparison of different Machine and Deep Learning models in Fake News Detection in
the Urdu Language for the UrduFake-2021 task1 . Table 1 shows the category-wise distribution

   1
       https://www.urdufake2021.cicling.org/home
Table 1
Category wise Article Distribution Training data
                                          Category        Real   Fake
                                          Business        150      80
                                          Health          150     130
                                          Showbiz         150     130
                                          Sports          150      80
                                          Technology      150     130
                                          Total           750     550

Table 2
Label Distribution in the given data
                                      Data Set         Real   Fake   Total
                                      Train            600     438   1038
                                      Validation       150     112    262
                                      Test             200     100    300


of the articles present in the Training Data set from Six different domains, namely, Business,
Health, Showbiz, Sports and Technology. Table 2 provides the distribution of Real and Fake
classes in the Train and Test data.


4. Data Preprocessing
The dataset2 provided to us by the organizers is already processed as discussed by the authors
of [10]. Additionally, we have removed any numerals, URLS, email ids and all website links.
The punctuations were replaced with spaces and extra spaces were removed in each article.


5. Methodology
In the study, three different approach were used :
     i Conventional Machine learning models.
    ii Neural Network models
   iii Transfer learning models


5.1. Conventional Machine Learning based models
Under Conventional ML-based models, we have explored the use of 1-5 gram word TF-IDF
features. The features were first extracted and then provided to the different Machine Learning
classifiers, namely, Logistic Regression(LR), Naive Bayes (NB), Random Forest (RF), XGBoost
(XGB) and Support Vector Machine (SVM). The detailed results of the classifier models are
shown in Table 3 of the Results section.
    2
        https://www.urdufake2021.cicling.org/dataset
Figure 1: Framework used to predict Fake News


5.2. Neural Network based models
In the Neural Network based models, we have reused the previously extracted 1-5 gram TF-
IDF features and used them as the input to a simple Deep Neural Network (DNN). The DNN
consists of three fully connected layers consisting of 512, 256, 128 layers, followed by a single
output neuron. Only a single neuron was chosen as the output because of the binary nature of
classification required in the problem. The ReLU activation function is used in the hidden layers
and the sigmoid activation function is used at the output layer. Adam and binary cross-entropy
were chosen to be the optimizer and loss function for all Neural Network-based models.
This was followed by a Convolutional Neural Network (CNN) based approach. The CNN model
consisted of one Conv1D layer followed by a Global Max Pooling layer and a dropout layer. This
was then connected to a sequential network consisting of two hidden layers comprising of 128
and 64 neurons, respectively. As the input, we used an embedding layer of 100 dimensions with
input length set to 512, resulting in an input layer of dimension (512, 100). The Convolutional
layer was made of 64 filters, with kernel size being 3.
As the final Neural Network based model, a Bidirectional Long Short-Term Memory model
(Bi-LSTM) was chosen. It consists of 256 memory units followed by a Global Max Pooling and
Batch Normalization layer. An embedding layer of 50 dimensions was taken as the input layer,
with the padding length being fixed at 512, followed by dense layers of 20 and 10 neurons in the
first and second layers, respectively. The output layer was the same as the other models, with a
single neuron with a sigmoid activation function.
After successive hyperparameter tuning, we found out that the best results were achieved for
the Neural network models by setting the max sequence length to 512, further increase led to
a decrease in F1 scores and an increase in training time. The learning rate was set to 0.00001,
and the optimizer was Adam. The codes for the current study can be found in the GitHub
repository3 .

5.3. Transfer Learning based models
We have implemented BERT (Bidirectional Encoder Representations from Transformers) models
to work with the transfer learning capabilities. For these models, no further preprocessing
was done. The limitation of such BERT-based models is that they cannot accommodate all the
tokens in each article as the maximum sequence length for such models is 512. Still, this issue
was ignored as we saw that increasing the sequence length in Neural Network models led to
diminishing returns.
Two different variants of BERT models were studied.
    i BERT (multilingual)
   ii MuRIL
The BERT [17] multilingual model was trained on 102 languages with masked language mod-
elling. Here, the pooled output from the pre-trained model was fed to a dropout layer and
finally to the output neuron.
   The last model that we used is MuRIL [18] (Multilingual Representations for Indian Languages).
This is a BERT model trained on a large corpus of 17 Indian languages, including Urdu, collected
from Wikipedia and the Dakshina dataset [19]. This model is also trained with the translated
and transliterated data and the monolingual corpus. Which gives it an advantage in processing
code mixed languages.


6. Results
This section presents the experimental results of all the models mentioned in Section 5. These
results were obtained on the validation data with the model being trained with training sample
shown in Table 2 and presented in precision, recall, and weighted F1-score. A particular model
is best if it reports the best-weighted average of precision, recall, and F1 score among all other
models. The value in bold represents the highest value achieved for a particular data set.


   3
       https://github.com/Sbhawal/NEWUrduFake-FIRE-2021-CODES.git
Table 3
Results of Conventional Machine Learning Models
                     Model     Class           Precision   Recall   F1-score
                               Real              0.72       0.93      0.81
                     LR        Fake              0.84       0.52      0.64
                               Weighted Avg      0.77       0.75      0.74
                               Real              0.64       0.84      0.73
                     RF        Fake              0.64       0.38      0.47
                               Weighted Avg      0.64       0.64      0.62
                               Real              0.69       0.92      0.72
                     NB        Fake              0.81       0.46      0.58
                               Weighted Avg      0.74       0.72      0.70
                               Real              0.70       0.89      0.78
                     XGB       Fake              0.76       0.49      0.60
                               Weighted Avg      0.73       0.72      0.70
                               Real              0.58       1.00      0.74
                     SVM       Fake              1.00       0.04      0.09
                               Weighted Avg      0.76       0.59      0.46

Table 4
Results of Neural Network based models
                      Model Class              Precision   Recall   F1-score
                               Real              0.72       0.97      0.82
                      DNN      Fake              0.92       0.49      0.64
                               Weighted Avg      0.80       0.76      0.75
                               Real              0.73       0.91      0.81
                      DNN+
                               Fake              0.82       0.55      0.66
                      Emb
                               Weighted Avg      0.77       0.76      0.75
                               Real              0.74       0.85      0.79
                      CNN      Fake              0.75       0.61      0.67
                               Weighted Avg      0.74       0.74      0.74
                               Real              0.72       0.83      0.77
                      Bi-
                               Fake              0.71       0.57      0.63
                      LSTM
                               Weighted Avg      0.72       0.72      0.71


    By observing the outcomes of the experimented tradition ML-based model shown in Table 3,
it is found that the LR classifier performed the best with precision, recall and F1-score of 0.77,
0.75 and 0.74, respectively.
    The outcomes of the LR model is almost similar to that of the simple Deep Neural Network
model which achieved precision, recall and F1 score of 0.80, 0.76 and 0.75 respectively as Shown
in Table 4. But, in general, the performance of the traditional ML models shown in Table 3 are
low as compared to the Neural Network models. These comparative outcomes confirmed that
the neural network-based models are better choices for developing an automated Urdu fake
news detection system.
    Finally, we have experimented with the Transfer Learning based models- BERT and MuRIL.
The outcomes of the models are shown in Table 5. The MuRIL model performed the best with
Table 5
Results of Transfer Learning based models
                      Model Class                Precision   Recall   F1-score
                            Real                   0.87       0.89      0.88
                      BERT  Fake                   0.84       0.82      0.83
                            Weighted Avg           0.86       0.86      0.86
                            Real                   0.86       0.92      0.89
                      MuRIL Fake                   0.88       0.80      0.84
                            Weighted Avg           0.87       0.87      0.87


weighted precision, recall and F1-score values of 0.87, 0.87, and 0.87, respectively, beating the
multilingual BERT model, which achieved an F1 score of 0.86 on the validation data.


7. Conclusion
Fake news on the social media platforms are a big issue at the current date. This research
suggested a Transfer learning based framework for Urdu fake news detection. Many traditional
ML models and NN based models have been experimented to achieve the best prediction
accuracy. We found, the MuRIL- a transfer learning model, outperform the traditional Machine
Learning and other NN based models in the Fake News Detection task. The transfer learning
based MuRIL model achieved accuracy and macro F1 score of 0.743 and 0.610 respectively on
the test dataset. The developed model used an Urdu dataset for training. Hence, the fake news
posted in other languages may not be detected by it. Due to the use of BERT based models, we
have limited the sequence length to 512 which can be improved by using an ensemble of DNN
and BERT models which will be explored in the future.


References
 [1] P. K. Roy, S. Chahar, Fake profile detection on social networking websites: A comprehensive
     review, IEEE Transactions on Artificial Intelligence 1 (2020) 271–285. doi:1 0 . 1 1 0 9 / T A I .
     2021.3064901.
 [2] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: A data
     mining perspective, ACM SIGKDD explorations newsletter 19 (2017) 22–36.
 [3] V. L. Rubin, Y. Chen, N. K. Conroy, Deception detection for news: three types of fakes,
     Proceedings of the Association for Information Science and Technology 52 (2015) 1–4.
 [4] H. Allcott, M. Gentzkow, Social media and fake news in the 2016 election, Journal of
     economic perspectives 31 (2017) 211–36.
 [5] C. Shao, G. L. Ciampaglia, O. Varol, K.-C. Yang, A. Flammini, F. Menczer, The spread of
     low-credibility content by social bots, Nature communications 9 (2018) 1–9.
 [6] B. Ghanem, P. Rosso, F. Rangel, An emotional analysis of false information in social media
     and news articles, ACM Transactions on Internet Technology (TOIT) 20 (2020) 1–18.
 [7] M. Amjad, G. Sidorov, A. Zhila, Data augmentation using machine translation for fake
     news detection in the Urdu language, in: Proceedings of the 12th Language Resources
     and Evaluation Conference, European Language Resources Association, Marseille, France,
     2020, pp. 2537–2542. URL: https://aclanthology.org/2020.lrec-1.309.
 [8] M. Amjad, G. Sidorov, A. Zhila, A. F. Gelbukh, P. Rosso, Overview of the shared task on
     fake news detection in urdu at fire 2020., in: FIRE (Working Notes), 2020, pp. 434–446.
 [9] M. Amjad, G. Sidorov, A. Zhila, A. Gelbukh, P. Rosso, Urdufake@ fire2020: Shared track
     on fake news identification in urdu, in: Forum for Information Retrieval Evaluation, 2020,
     pp. 37–40.
[10] M. Amjad, G. Sidorov, A. Zhila, H. Gomez-Adorno, I. Voronkov, A. Gelbukh, Bend the
     truth: A benchmark dataset for fake news detection in urdu and its evaluation, Journal of
     Intelligent & Fuzzy Systems 39 (2020) 2457–2469. doi:1 0 . 3 2 3 3 / J I F S - 1 7 9 9 0 5 .
[11] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, R. Mihalcea, Automatic detection of fake news,
     arXiv preprint arXiv:1708.07104 (2017).
[12] W. Y. Wang, ” liar, liar pants on fire”: A new benchmark dataset for fake news detection,
     arXiv preprint arXiv:1705.00648 (2017).
[13] X. Zhou, R. Zafarani, A survey of fake news: Fundamental theories, detection methods,
     and opportunities, ACM Computing Surveys (CSUR) 53 (2020) 1–40.
[14] V. Rubin, N. Conroy, Y. Chen, S. Cornwell, Fake news or truth? using satirical cues
     to detect potentially misleading news, in: Proceedings of the Second Workshop on
     Computational Approaches to Deception Detection, Association for Computational Lin-
     guistics, San Diego, California, 2016, pp. 7–17. URL: https://aclanthology.org/W16-0802.
     doi:1 0 . 1 8 6 5 3 / v 1 / W 1 6 - 0 8 0 2 .
[15] K. Shu, S. Wang, H. Liu, Beyond news contents: The role of social context for fake news
     detection, in: Proceedings of the twelfth ACM international conference on web search
     and data mining, 2019, pp. 312–320.
[16] N. Lina, S. Fua, S. Jianga, Fake news detection in the urdu language using charcnn-roberta,
     Health 100 (2020) 100.
[17] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[18] S. Khanuja, D. Bansal, S. Mehtani, S. Khosla, A. Dey, B. Gopalan, D. K. Margam, P. Aggarwal,
     R. T. Nagipogu, S. Dave, S. Gupta, S. C. B. Gali, V. Subramanian, P. Talukdar, Muril:
     Multilingual representations for indian languages, 2021. a r X i v : 2 1 0 3 . 1 0 7 3 0 .
[19] B. Roark, L. Wolf-Sonkin, C. Kirov, S. J. Mielke, C. Johny, I. Demirşahin, K. Hall, Processing
     South Asian languages written in the Latin script: the Dakshina dataset, in: Proceedings
     of The 12th Language Resources and Evaluation Conference (LREC), 2020, pp. 2413–2423.
     URL: https://www.aclweb.org/anthology/2020.lrec-1.294.