Overview of the Track on Author Profiling and Deception Detection in Arabic? Francisco Rangel1 , Paolo Rosso1 , Anis Charfi2 , Wajdi Zaghouani3 , Bilal Ghanem1 , and Javier Sánchez-Junquera1 1 PRHLT Research Center, Universitat Politècnica de València, Spain 2 Carnegie Mellon University, Qatar 3 Hamad Bin Khalifa University, Qatar Abstract. This overview presents the Author Profiling and Deception Detection in Arabic (APDA) shared task at PAN@FIRE 2019. Two have been the main aims of this years task: i) to profile the age, gender and native language of a Twitter user; ii) to determine whether an Arabic text is deceptive or not in two different genres: Twitter and news headlines. For this purpose we have created three corpora in Arabic. Altogether, the approaches of 13 participants are evaluated. Keywords: author profiling · deception detection · Arabic · Twitter · FIRE. 1 Introduction PAN4 lab is a series of scientific events and shared tasks on digital text forensics. This year at FIRE5 we have organised the Author Profiling and Deception De- tection in Arabic (APDA)6 shared task. In this paper, we describe the resources that we have created and made available to the research community7 , illustrat- ing the obtained results and highlighting the main achievements. The Author Profiling and Deception Detection in Arabic consists of two tasks. In the next section we will describe each of them. 1.1 Task 1. Author Profiling in Arabic Tweets Author profiling distinguishes between classes of authors studying how language is shared by people. This helps in identifying profiling aspects such as age, gender, and language variety, among others. The focus of this task is to identify the age, gender, and language variety of Arabic Twitter users. ? Copyright 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). FIRE 2019, 12-15 Decem- ber 2019, Kolkata, India. 4 http://pan.webis.de/ 5 http://fire.irsi.res.in/fire/2019 6 https://www.autoritas.net/APDA/ 7 Following a methodology that accomplishes with the EU General Data Protection Regulation [24]. 2 Francisco Rangel et al. 1.2 Task 2. Deception Detection in Arabic Texts We can consider that a message is deceptive when it is intentionally written trying to sound authentic. The focus of the task is on deception detection in Arabic on two different genres: Twitter and news headlines. The reminder of this paper is organised as follows. Section 2 covers the state of the art, Section 3 describes the corpora and the evaluation measures, and Section 4 presents the approaches submitted by the participants. Section 5 and 6 discuss results and draw conclusions respectively. 2 Related Work In this section we briefly review the related work on author profiling (age, gender and language variety identification) and deception detection in Arabic. 2.1 Author Profiling The investigation in age and gender identification in Arabic is scarce. The au- thors of [14] collect 8,028 emails from 1,030 native speakers of Egyptian Ara- bic. They propose 518 features and test several machine learning algorithms, and report accuracies between 72.10% and 81.15% respectively for gender and age identification. The authors of [4] approach the gender identification in well- known Arabic newsletters articles written in Modern Standard Arabic. With a combination of bag-of-words, sentiments and emotions, they report an accuracy of 86.4%. Subsequently, the authors of [3] extend their work by experimenting with different machine learning algorithms, data-subsets and feature selection methods, reporting accuracies up to 94%. The authors of [6] manually anno- tate tweets from Jordanian dialects with gender information. They show how the name of the author of the tweet can significantly improve the performance. They also experiment with other stylistic features such as the number of words per tweet or the average word length, achieving a best result of 99.50%. The increasing interest in Arabic varieties identification is supported by the eighteen and six teams participating respectively in the Arabic subtask of the third [18] DSL track, the Arabic Dialect Identification (ADI) shared task [41], as well as the twenty teams participating in the Arabic subtask of the Author Profling shared task [27] at PAN 2017. However, as the authors of [29] highlight, there is still a lack of resources and investigations in that language. Some of the few works are the following ones. The authors of [38] use a smoothed word unigram model and report respectively 87.2%, 83.3% and 87.9% of accuracies for Levantine, Gulf and Egyptian varieties. The authors of [32] achieve 98% of accuracy discriminating among Egyptian, Iraqi, Gulf, Maghreb, Levantine, and Sudan with n-grams. The authors of [12] combine content and style-based fea- tures to obtain 85.5% of accuracy discriminating between Egyptian and Modern Standard Arabic. Author Profiling and Deception Detection in Arabic 3 2.2 Deception Detection Despite the fact that deception detection research in Arabic is still very lim- ited [29], there are some new initiatives focusing on this language. For example, in the context of fact check shared task8 at CLEF9 on automatic identification and verification of claims in political debates [21]. Nevertheless, the aforemen- tioned shared task translate the contents from English to Arabic. Since the claims correspond to US politics, they are not representative of the idiosyncrasy of Arabs. In this sense, the CheckThat! shared task10 on Automatic Identifica- tion and Verification of Claims [16] organised at CLEF 2019 includes a subtask only in Arabic. The authors of [2] collect a corpus in Arabic from 600 tweets and 179 news articles. They automatically annotate the credibility by measuring the cosine similarity between the tweets and the news articles. The authors of [7] complain about the automatic generation of the annotation and they collect and manually annotate two corpora from Twitter and Blogs. Regarding Twitter, they retrieve over 36 million tweets about four topics: i) The forces of the Syr- ian government; ii) Syrian revolution; iii) Syrian problems and concerns related to the Syrian revolution; and iv) The election of the Lebanese president. The annotation process is carried out by five annotators. According to the authors of [37] the obtained inter-annotator agreement (Fleiss’ kappa 0.43) is moderate. The authors also propose a method to approach the credibility analysis of Twit- ter contents. The Credibility Analysis of Arabic Content on Twitter (CAT) [11] relies mainly on features obtained from the user who tweeted the content to be analysed. For example, the authors retrieve the user’s timeline and extract fea- tures such as the number of retweets, the user’s activity, or the user’s expertise in the topic being discussed. They compare their approach with several baselines and show a significant improvement. In the framework of the project Arabic Author Profiling for Cyber-Security (ARAP)11 , we outperform with LDSE [26] (0.797 F-measure) the result obtained by the CAT method (0.701 F-measure) on the Credibility corpus [25]. 3 Evaluation Framework The purpose of this section is to introduce the technical background. We out- line the construction of the corpora, as well as we introduce the performance measures. 3.1 Corpora We have created the following corpora: the ARAP-Tweet corpus for author pro- filing, and the Qatar Twitter and Qatar News corpora for deception detection. We briefly describe them below. 8 http://alt.qcri.org/clef2018-factcheck 9 http://clef2018.clef-initiative.eu/ 10 https://sites.google.com/view/clef2019-checkthat/home?authuser=0 11 http://arap.qatar.cmu.edu 4 Francisco Rangel et al. ARAP-Tweet. This corpus was developed at the Carnegie Mellon University Qatar [39] with the aim at providing with a fine-grained annotated corpus in Arabic. It contains 15 dialectical varieties corresponding to 22 countries of the Arab League. For each variety, a total of 198 authors (150 for training, 48 for test) were annotated with age and gender, maintaining balance for both variables. The following groups were considered for the age annotation: Under 25, Between 25 and 34, and Above 35. For each author, more than 2,000 tweets were retrieved from her/his timeline. The included varieties are: Algeria, Egypt, Iraq, Kuwait, Lebanon Syria, Libya, Morocco, Oman, Palestine Jordan, Qatar, Saudi Arabia, Sudan, Tunisia, United Arab Emirates and Yemen. More information about this corpus is available in [40]. The Qatar Twitter corpus. In the context of the ARAP project, we created the Qatar Twitter corpus by retrieving during 2017 and annotating12 tweets referring to the Qatar Blockade and the Qatar World Cup. Statistics about this corpus are shown in Table 1. The number of tweets for the blockade topic is completely balanced between credible and non-credible classes. For the World Cup topic the corpus is almost balanced, with a slightly smaller amount of credible tweets (48% / 52%). The Qatar News corpus. We also created the Qatar News corpus by re- trieving and annotating short contents such as headlines and/or excerpts from well-known Arabic newsletters. Statistics on this second corpus can be seen in Table 1. The number of documents is almost balanced, with a slightly smaller amount of credible news (47% / 53%). Table 1. Distribution of credible and non-credible tweets per topic in the Qatar Twitter and Qatar News corpora. Non Corpus Topic Credible Credible Total Qatar Twitter Blockade 115 115 230 World Cup 262 281 543 Total 377 396 773 Qatar News 889 999 1,888 3.2 Performance Measures In this section we describe the performance measures used for evaluating the systems in the different tasks. 12 For both the Qatar Twitter and Qatar News corpora, the annotators were 20 students at the Hamad Bin Khalifa University, representing various Arab countries. The inter- annotator agreement was about 80%. Author Profiling and Deception Detection in Arabic 5 Author Profiling Since the data is completely balanced, the performance is evaluated by accuracy, following what has been done in the author profiling tasks at PAN@CLEF. For each subtask (age, gender, language variety), we calculate individual accuracies. Systems rank by the joint accuracy (when age, gender and language variety are properly identified together). Deception Detection As in this case the data is slightly imbalanced, we mea- sure the performance with the macro-averaged F-measure. 4 Overview of the Submitted Approaches Nineteen teams participated in the shared task and fifteen of them submitted the notebook paper13 . We analyse their approaches from three perspectives: pre- processing, features to represent the authors texts, and classification approaches. 4.1 Preprocessing The authors of [17, 9, 13, 30, 23, 20, 15] removed stop words commonly defined for Arabic, and one of the teams (Blat) also removed its own list containing the most frequent words in the vocabulary. Some teams removed punctuation signs [22, 15], special characters [13, 23, 36], numbers [13, 33, 20], or Twitter related items such as emojis, user mentions, urls or hashtags [36, 33, 23]. Tokenisation was applied by the authors of [5]. The authors of [20] lower cased the texts, the authors of [22, 20] treated character flooding, and the authors of [33, 23] removed non-Arabic words. Finally, the authors of [42] applied data augmentation. 4.2 Features Most of the systems [9, 10, 5, 13, 22, 36, 33] relied on n-grams, some of them in its simplest representation: bag-of-words [17, 30, 20, 15]. The team MagdalenaYVino combined word n-grams with emoticons n-grams, and the authors of [17] com- bined bag-of-words with lists of the most discriminant words per class. Some teams approached the task with stylistic features such as the occurrence of emoticons/emojis [17, 15], hashtags [15], tweets length [17], the number of mentions [17, 15], or the use of function words [33]. The authors of [5] combined content-based features (word and character n-grams, stems, lemmas, Parts-of- Speech) with style-based features (urls, hashtags, mentions, character flooding, the average tweet length, the use of punctuation marks). Finally, the authors of [23] used word embeddings, as well as the authors of [10] trained them with FastText. 13 Although some of them were rejected due to their low quality. 6 Francisco Rangel et al. 4.3 Classification Approaches The most used classifier has been Support Vector Machines [17, 9, 10, 5, 13, 30, 22, 23], followed by Multinomial Naive Bayes [33, 20, 15]. The authors of [36] used Logistic Regression, while the team MagdalenaYVino addressed the task with Random Forest. Finally, only two teams approached the task with deep learning: the authors of [42] used BERT pre-trained on Wikipedia and the authors of [35] used LSTM. 5 Evaluation and Discussion of the Results Although we recommended to participate in both tasks, author profiling and deception detection, some participants approached only one problem. Following, we present the results separately. 5.1 Author Profiling Thirteen teams have participated in the Author Profiling task, submitting a to- tal of 28 runs. Participants have used different kinds of features: from classical approaches based on n-grams and Support Vector Machines, to novel represen- tations such as BERT. The best overall result (45.56% joint accuracy) has been achieved by DBMS-KU [33] with combinations of word n-grams, character n- grams, and function words to train Support Vector Machines. The best result for gender identification (81.94%) has been obtained by MagdalenaYVino, with a combination of words and emoticons 2-grams and 3-grams. In case of age iden- tification, the best result has been achieved by Yutong [36] (62.50%) with a Logistic Regression classifier trained with a combination of word unigrams with character 2 to 5-grams. Finally, in regards of language variety identification, the best result (97.78%) has been achieved also by DBMS-KU. Table 2. Author Profiling: Statistics on the accuracy per task. Measure Gender Age Variety Joint Min 0.5111 0.2222 0.2444 0.0597 Q1 0.6496 0.5368 0.8858 0.3104 Median 0.7667 0.5486 0.9354 0.3756 Mean 0.7181 0.5282 0.8705 0.3425 SDev 0.1034 0.0917 0.1843 0.1153 Q3 0.7843 0.5771 0.9694 0.4174 Max 0.8194 0.6250 0.9778 0.4556 Skewness -0.9919 -2.2173 -2.4694 -1.4303 Kurtosis 2.4632 7.2901 8.0241 3.9081 Normality (p-value) 3.01e-06 1.75e-08 1.681e-11 1.156e-05 It can be observed in Table 2 and Figures 1 and 2 that the highest results have been obtained in case of language variety identification, with most of the Author Profiling and Deception Detection in Arabic 7 results very close to 100%, although with three outliers: two runs sent by Allaith (0.2444 and 0.3458), who did not send any description of their system, and the LSTM-based approach by Suman [35] (0.3458). Fig. 1. Distribution of results for the author profiling task. In this figures we can also observe that the lowest sparsity occurs with age identification, where most of the systems obtained very similar results. In this case, there are also four outliers: the two systems of Suman (0.2222 and 0.2750) based on LSTM, and the two systems of Allaith (0.4069 and 0.4222). In case of gender identification, results are more sparse, but there are no ourliers. Fig. 2. Density of the results for the author profiling tasks. 8 Francisco Rangel et al. Table 3. Author profiling: Overall ranking in terms of accuracy. Ranking Team Gender Age Variety Joint 1 DBMS-KU.2 0.7944 0.5861 0.9722 0.4556 2 Nayel.1 0.8153 0.5708 0.9750 0.4486 3 Nayel.3 0.8014 0.5792 0.9708 0.4486 4 DBMS-KU.3 0.7833 0.5819 0.9778 0.4444 5 DBMS-KU.1 0.7778 0.5792 0.9736 0.4347 6 KCE DAlab.sub1 0.7667 0.5722 0.9583 0.4222 7 Nayel.2 0.7667 0.5764 0.9597 0.4194 8 MagdalenaYVino.1 0.8194 0.5653 0.9069 0.4167 9 KCE DAlab.sub2 0.7458 0.5708 0.9694 0.4125 10 Chiyuzhang.maj2 0.8167 0.5472 0.9375 0.4097 11 Chiyuzhang.4 0.8167 0.5472 0.9264 0.4097 12 Blat.1 0.7875 0.5653 0.8722 0.3986 13 Chiyuzhang.2 0.7708 0.5472 0.9333 0.3875 14 Karabasz.1 0.7833 0.5403 0.9083 0.3819 15 KCE DAlab.sub3 0.7444 0.5028 0.9583 0.3694 16 Alrifai.1 0.7708 0.5375 0.8903 0.3639 17 Alrifai.2 0.7681 0.5347 0.8917 0.3611 18 Kosmajac.1 0.7000 0.5417 0.9542 0.3583 19 Alrifai.3 0.7667 0.5139 0.8681 0.3431 20 SSN NLP.1 0.7653 0.5500 0.8083 0.3403 21 Yutong.2 0.5111 0.6250 0.9694 0.3125 22 Karabasz.2 0.6111 0.5403 0.9083 0.3042 23 Yutong.3 0.5111 0.6000 0.9694 0.2944 24 Yutong.1 0.5111 0.5875 0.9694 0.2917 25 Allaith.1 0.5806 0.4069 0.3458 0.1208 26 Suman.LSTM Features 0.6625 0.2222 0.8028 0.1083 27 Suman.LSTM 0.5764 0.2750 0.5514 0.0722 28 Allaith.2 0.5806 0.4222 0.2444 0.0597 5.2 Deception Detection Thirteen teams have participated in the Deception Detection task, submit- ting a total of 25 runs. Participants have used different kinds of features such as classical approaches based on n-grams and Support Vector Machines. No novel approaches based on deep learning have been used, apart from some word embedding-based representations. The best overall result (0.8003 Macro F- measure) has been achieved by Nayel [22] with n-grams weighted with TF/IDF and Support Vector Machines. The best result on the Qatar News corpus (0.7542 Macro F-measure) has been also obtained by Nayel, while the best result on the Qatar Twitter corpus (0.8541 Macro F-measure) has been obtained by KCE Dalab [10], who approached the task with a combination of word and char- acter n-grams and Fast text embeddings to train a Support Vector Machine. In Table 5 and Figures 3 and 4 we can observe that the highest results have been obtained on the Twitter corpus, with similar sparsity on both genres. Perhaps, it should be highlighted that the distribution of results on the News corpus is more skewed to the right, with the median higher than the mean, and most systems close to the best performing ones. Author Profiling and Deception Detection in Arabic 9 Table 4. Deception detection: Overall ranking in terms of macro F-measure. Ranking Team/Run News Twitter Average 1 nayel.3 0.7542 0.8464 0.8003 2 nayel.1 0.7417 0.8463 0.7940 3 KCE Dalab.sub1 0.7232 0.8541 0.7887 4 KCE Dalab.sub2 0.7331 0.8293 0.7812 5 DBMS-KU.2 0.7352 0.8125 0.7739 6 nayel.2 0.7133 0.8337 0.7735 7 Allaith.2 0.7106 0.8289 0.7698 8 Allaith.1 0.7274 0.7950 0.7612 9 SSN NLP.1 0.7108 0.8087 0.7598 10 DBMS-KU.1 0.7188 0.7877 0.7533 11 DBMS-KU.3 0.7188 0.7877 0.7533 12 Actimel.tfidf svm 0.7235 0.7717 0.7476 13 RickyTonCar.1 0.6754 0.7748 0.7251 14 Cabrejas.2 0.6651 0.7699 0.7175 15 Actimel.tree SVC 0.7043 0.7288 0.7166 16 Eros.1 0.6277 0.7924 0.7101 17 Blat.1 0.6675 0.7355 0.7015 18 Cabrejas.1 0.6566 0.7443 0.7005 19 Actimel.trigram arab dict SVM 0.6572 0.7383 0.6978 20 RickyTonCar.2 0.6912 0.7008 0.6960 21 Sinuhe.SVM 0.6261 0.7627 0.6944 22 Eros.2 0.6277 0.7339 0.6808 23 KCE Dalab.sub3 0.6613 0.6791 0.6702 24 Sinuhe.kNN 0.5640 0.6716 0.6178 25 Bravo.1 0.5827 0.6477 0.6152 Table 5. Deception detection: Statistics on the F-measure per task. Measure News Twitter Average Min 0.5640 0.6477 0.6152 Q1 0.6572 0.7355 0.6978 Median 0.7043 0.7748 0.7251 Mean 0.6847 0.7713 0.7280 SDev 0.0502 0.0572 0.0505 Q3 0.7232 0.8125 0.7698 Max 0.7542 0.8541 0.8003 Skewness -0.7928 -0.4649 -0.5946 Kurtosis 2.8250 2.4024 2.7434 Normality (p-value) 0.0501 0.5339 0.2214 10 Francisco Rangel et al. Fig. 3. Distribution of results for the deception detection task. Fig. 4. Density of the results for the deception detection tasks on the different corpora. Author Profiling and Deception Detection in Arabic 11 6 Conclusions In this paper we have presented the results of the Author Profiling and Deception Detection in Arabic (APDA) shared task hosted at FIRE 2019. Two have been the main aims: i) to profile the age, gender and native language of a Twitter user; ii) to determine whether an Arabic text is deceptive or not, in two different genres: Twitter and news headlines. The participants have used different features to address the task, mainly: i) n- grams; ii) stylistic features; and iii) embeddings. With respect to machine learn- ing algorithms, the most used one was Support Vector Machines. Nevertheless, a couple of participants approached the author profiling task with deep learning techniques. In such cases, they used BERT and LSTM respectively. According to the results, traditional approaches obtained better performances than deep learning ones. The best performing team in the author profiling task [33] used combinations of word and character n-grams with function words to train Sup- port Vector Machines, while the best performing team in the deception detectin task [22] used n-grams weighted with TF/IDF and Support Vector Machines. Acknowledgments This publication was made possible by NPRP 9-175-1-033 from the Qatar Na- tional Research Fund (a member of Qatar Foundation). The findings achieved herein are solely the responsibility of the authors. The work of Paolo Rosso was also partially funded by Generalitat Valenciana under grant PROMETEO/2019/121. References 1. A. Akbar-Maulana-Siagian, M. Aritsugi. DBMS-KU Approach for Author Profiling and Deception Detection in Arabic. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12- 15, 2019 2. R.M.B. Al-Eidan, H.S. Al-Khalifa, A.S. Al-Salman. Measuring the Credibility of Arabic Text Content in Twitter. In 2010 Fifth International Conference on Digital Information Management (ICDIM), 2010 3. K. Alsmearat, M. Al-Ayyoub, R. Al-Shalabi. An Extensive Study of the Bag-of- words Approach for Gender Identification of Arabic Articles. In: 11th Interna- tional Conference on Computer Systems and Applications (AICCSA), pages 601608. IEEE/ACS, 2014 4. K. Alsmearat, M. Shehab, M. Al-Ayyoub, R. Al-Shalabi, G. Kanaan. Emotion Analysis of Arabic Articles and its Impact on Identifying the Authors Gender. In: 12th International Conference on Computer Systems and Applications (AICCSA), IEEE/ACS, 2015 5. K. Alrifai, G. Rebdawi, N. Ghneim. Arabic Tweeps Traits Prediction AT2P. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 12 Francisco Rangel et al. 6. E. Al Sukhni, Q. Alequr. Investigating the Use of Machine Learning Algorithms in Detecting Gender of the Arabic Tweet Author. In: International Journal of Ad- vanced Computer Science & Applications, 1(7):319328, 2016 7. A. Al Zaatari, R. El Ballouli, S. Elbassuoni, W. El-Hajj, H.M. Hajj, K.B. Shaban, N. Habash, E. Yahya. Arabic Corpora for Credibility Analysis. In: Language Resources and Evaluation Conference (LREC), 2016 8. L. Cagnina, P. Rosso. Detecting Deceptive Opinions: Intra and Cross-Domain Classi- fication Using an Efficient Representation. In: International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 25, Suppl. 2, pp. 151–174, World Scientific, 2017 9. J. Cabrejas, J.V. Mart, A. Pajares, V. Sanchis. Deception Detection in Arabic Texts Using N-grams Text Mining. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 10. S. Devi, S. Kannimuthu, G. Ravikumar, A. Kumar. KCE DALab-APDAFIRE2019: Author Profiling and Deception Detection in Arabic using Weighted Embedding. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 11. R. El Ballouli, W. El-Hajj, A. Ghandour, S. Elbassuoni, H. Hajj, K. Shaban. CAT: Credibility Analysis of Arabic Content on Twitter. In Proc. of the Third Arabic Natural Language Processing Workshop, 2017 12. H. Elfardy, M.T. Diab. Sentence Level Dialect Identification in Arabic. In: Asso- ciation for Computational Linguistics (ACL), pp. 456–461 (2013) 13. F. Eros-Blázquez-del-Rio, M. Conde-Rodrı́guez, J.M. Escalante. Detection of De- ceptions in Twitter and News Headlines Written in Arabic. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 14. D. Estival, T. and Gaustad, B. Hutchinson, S. Bao-Pham, W. Radford. Author Profiling for English and Arabic eMails, 2008 15. F.J. Fernández-Bravo Peñuela. Deception Detection in Arabic Tweets and News. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 16. M. Hasanain, R. Suwaileh, T. Elsayed, A. Barrn-Cedeno, P. Nakov. Overview of the CLEF-2019 CheckThat! Lab on Automatic Identification and Verification of Claims. Task 2: Evidence and Factuality. In CEUR Workshop Proceedings, CEURWS. org, 2019 17. I. Karabasz, P. Cellini, G. Galiana. Predicting Author Characteristics of Arabic Tweets through Author Profiling. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 18. S. Malmasi, M. Zampieri, N. Ljubešić, P. Nakov, A. Ali, J. Tiedemann. Discrimi- nating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3), pp. 1–14 (2016) Author Profiling and Deception Detection in Arabic 13 19. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Jeff. Distributed Repre- sentations of Words and Phrases and their Compositionality. In: Advances in Neural Information Processing Systems, 2013 20. A. Moreno, R. Navarro, C. Ruiz. UPV at Author Profiling and Deception Detec- tion in Arabic: Task 2. Deception Detection in Arabic Texts. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 21. P. Nakov, A. Barrón-Cedeño, T. Elsayed, R. Suwaileh, L. Màrquez, W. Zaghouani, P. Atanasova, S. Kyuchukov, and G. Da San Martino. Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. In: International Conference of the Cross-Language Evaluation Forum for European Languages, 2018. 22. H.A. Nayel. NAYEL@APDA: Machine Learning Approach for Author Profiling and Deception Detection in Arabic Texts. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12- 15, 2019 23. A. Ranganathan, H. Ananthakrishnan, D. Thenmozhi, C. Aravindan. Arabic Au- thor Profiling and Deception Detection using Traditional Learning Methodologies with Word Embedding. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Work- ing Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 24. F. Rangel, P. Rosso. On the Implications of the General Data Protection Regula- tion on the Organisation of Evaluation Tasks. In: Language and Law= Linguagem e Direito, vol. 5 (2), pp. 95–117, 2019 25. F. Rangel, P. Rosso, A. Charfi, W. Zaghouani. Detecting Deceptive Tweets in Arabic for Cyber-Security. In: Proc. of the 17th IEEE Int. Conf. on Intelligence and Security Informatics (ISI), 2019 26. F. Rangel, P. Rosso, M. Franco. A Low Dimensionality Representation for Lan- guage Variety Identification. In: Proceedings of the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing16), Springer- Verlag, LNCS(9624), pp. 156-169, 2018 27. F. Rangel, P. Rosso, M. Potthast, B. Stein. Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter. Working Notes Papers of the CLEF 2017 Evaluation Labs, Editors: Linda Cappellato and Nicola Ferro and Lorraine Goeuriot and Thomas Mandl, pp. 1613–0073, CLEF and CEUR-WS.org (2017) 28. H. Rheingold. Smart Mobs: the Next Social Revolution. Basic books, 2007 29. P. Rosso, F. Rangel, I. Hernández-Farı́as, L. Cagnina, W. Zaghouani, A. Charfi. A Survey on Author Profiling, Deception, and Irony Detection for the Arabic Lan- guage. Language and Linguistics Compass, vol. 12 (4), pp. e12275, Wiley Online Library (2018a) 30. F.I. Ruedas-Diaz, S. Martı́nez-Rodrı́guez, S. Muñoz-Lorenzo, V. Cristiny-Sá-de- Aráujo, C. Muñoz-Carrasco. Deception Detection in Arabic Texts. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Infor- mation Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR- WS.org, Kolkata, India, December 12-15, 2019 31. C. Russell, B. Miller. Profile of a Terrorist. Studies in Conflict & Terrorism, vol. 1 (1), pp. 17–34, Taylor & Francis, 1977 14 Francisco Rangel et al. 32. F. Sadat, F. Kazemi, A. Farzindar. Automatic Identification of Arabic Language Varieties and Dialects in Social Media. In: Proceedings of SocialNLP, pp. 22 (2014) 33. M. Siagian, A. H. Akbar, M. Aritsugi. DBMS-KU Approach for Author Profiling and Deception Detection in Arabic. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12- 15, 2019 34. A.B. Soliman, K. Eisa, S.R. El-Beltagy, AraVec: A set of Arabic Word Embed- ding Models for use in Arabic NLP. In: 3rd International Conference on Arabic Computational Linguistics (ACLing), 2017. 35. C. Suman, P. Kumar, S. Saha, P. Bhattacharyya. Gender, Age and Dialect Recog- nition Using Tweets in a Deep Learning Framework - Notebook for FIRE 2019. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 36. Y. Sun, H. Ning, K. Chen, L. Kong, Y. Yang, J. Wang, H. Qi. Author Profiling in Arabic Tweets: An Approach based on Multi-Classification with Word and Charac- ter Features. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019 37. A.J. Viera, J.M. Garrett. Understanding Interobserver Agreement: the Kappa Statistic. Fam med journal, 37(5), pp.360-363. 2005 38. O.F. Zaidan, C. Callison-Burch. Arabic Dialect Identification. In: Computational Linguistics, vol. 40 (1), pp. 171–202, MIT Press (2014) 39. W. Zaghouani, A. Charfi. ArapTweet: A Large MultiDialect Twitter Corpus for Gender, Age and Language Variety Identification. In: Proceedings of the 11th In- ternational Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, 2018 40. W. Zaghouani, A. Charfi. Guidelines and Annotation Framework for Arabic Author Profiling. In: Proceedings of the 3rd Workshop on Open-Source Arabic Corpora and Processing Tools, 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, 2018 41. M. Zampieri, S. Malmasi, N. Ljubešić, P. Nakov, A. Ali, J. Tiedemann, Y. Scherrer, N. Aepli. Findings of the Vardial Evaluation Campaign 2017. In: Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 1–15 (2017) 42. C. Zhang, M. Abdul-Mageed. BERT-Based Arabic Social Media Author Profiling. In: Mehta P., Rosso P., Majumder P., Mitra M. (Eds.) Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019). CEUR Workshop Proceedings. CEUR-WS.org, Kolkata, India, December 12-15, 2019