=Paper=
{{Paper
|id=Vol-2956/paper34
|storemode=property
|title=Understanding Misinformation: Perspectives on Emerging Issues
|pdfUrl=https://ceur-ws.org/Vol-2956/paper34.pdf
|volume=Vol-2956
|authors=Astrid Krickl
|dblpUrl=https://dblp.org/rec/conf/ruleml/Krickl21
}}
==Understanding Misinformation: Perspectives on Emerging Issues==
Understanding Misinformation: Perspectives on Emerging Issues Astrid Krickl Vienna University of Economics and Business, Vienna, Austria astrid.krickl@wu.ac.at Abstract. The ever-increasing volume of information disseminated by technological advances leads to new challenges with respect to misinfor- mation and fake news. Misinformation could potentially affect people in their daily life, and the choices they are making, be it medical, financial, or the upcoming elections. Technological solutions to identify, verify, and manage misinformation aim to combat the harmful effects of misinfor- mation. In this research proposal, we explore the use of machine learning algorithms in combination with symbolic approaches to identify misin- formation in German news texts in order to help users spot deceptive articles. Based on linguistic aspects of the text, news stories are classified as misleading or truthful. In particular, we plan to compare it to existing symbolic and sub-symbolic approaches for misinformation detection. Keywords: Misinformation · Fake news · Machine learning · Ger- man text. 1 Introduction Misinformation is defined as any incorrect content (text, images, or videos) that can be found in news, articles, posts, or tweets, among others [19, 32]. This content can be intentionally misleading, so that readers are convinced to believe something that is not true (e.g., during the Cold War, the Soviets KGB circu- lated the myth that Aids was the result of secret US military research) [19, 31] or without any intention (e.g., users posting photos of swans and dolphins in Venice’s canals on social media without realizing the reports were false) [6, 32]. Although the implications of misinformation vary, it is clear that the electronic spread of misinformation hastened the process [13]. For instance, social network- ing sites make it simple for people to share information and misinformation with others, but they also facilitate manual source checking [13]. When it comes to tackling misinformation technical, modern technological solutions include the identification [10, 19, 31], verification [10, 32], and gov- ernance [13, 32]. Researchers also examine the effect of misinformation from a Copyright © 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). social [15, 24] and political [2, 7] perspective. Readers are subjected to recurring information floods, such as the infodemic, which includes misinformation about the coronavirus, such as unconfirmed claims that drinking alcohol and smok- ing may protect against the virus, which is linked to a self-reported increase in consumption in China [15]. Furthermore, there is fear about elections being manipulated for the benefit of a small group of individuals or corporations [2, 7]. Many machine learning approaches for misinformation in English, such as rhetorical structure theory (RST) [22], linguistic inquiry and word count (LIWC) [20], and hierarchical attention neural network (HAN) [16], have been proven to be effective. While, support vector machine (SVM) and convolutional neural network (CNN) were used successfully in the German language to detect mis- information [29]. We aim to improve sub-symbolic machine learning approaches by combining symbolic approaches, given the constraints of identifying misinfor- mation solely based on its style. Symbolic techniques, such as knowledge graphs, give logic to a system through their graph structure, which consists of relations between objects. Few studies have demonstrated that combining symbolic and sub-symbolic techniques is effective for detecting misinformation in English [3]. We intend to fill this gap by enhancing effective techniques and offering ap- proaches that aim to uncover difficult-to-identify German misinformation. Our work is guided by the following overarching research question: What are the difficulties in identifying German misinformation using symbolic and sub-symbolic approaches, and how can detection techniques be evaluated? This overarching research question can be subdivided into the following three sub questions: In what ways could linguistically based machine learning algo- rithms and logic be utilized to detect German misinformation in news articles in the mainstream media? How can German misinformation datasets be enhanced in order to further optimize machine learning algorithms? Which attributes are necessary to benchmark existing misinformation techniques? The aforementioned research question translate into the following concrete tasks: the development of a misinformation identification algorithm, training it on two German datasets, and lastly comparing it to existing symbolic and sub-symbolic identification techniques. The remainder of the research proposal is structured as follows: in Section 2 we provide an overview about the state of the art in the area of misinformation. Following on, in Section 3, we present the anticipated methodology we will apply for this study. In Section 4 we discuss the resulting tasks of the methodology. Finally we summarize our findings in a conclusion in Section 5. 2 State of the art While much information is shared via traditional media, social network sites sim- plify the sharing of news significantly, especially for non journalists [24]. With the vast amount of news comes a certain percentage of incorrect information, which is referred to as misinformation or fake news. Zhou and Zafarani [32] describe concepts related to intentionally misleading misinformation, such as deceptive news, disinformation, cherry picking, and click-baits. False news, misinforma- tion, and rumors, on the other hand, are not always deceptive, according to the authors. They also mention satire as an example of fake news for entertainment. Islam et al. [10] provide a differentiation of misinformation, including: (i) disin- formation which is incorrect information with the intention of misleading, such as fake news and disinformation; and (ii) misinformation, which is also false information but without intention, where they fit false information, rumor and spam in. Also the authors state that receivers of misinformation are mostly un- aware of the intention of the article’s authors. In addition, they argue that the spread of false information can lead to a harm of society, business markets and also health care systems among others. Although modern technology facilitates the spread of misinformation, techni- cal approaches for identification, verification and governance of misinformation can address problems and provide solutions. The rapid spread of misinformation in comparison to the relatively slow spread of clarifications poses issues [13]. To address this concern, correction sites1 in various languages and for different topics try to respond quickly to identified misinformation and correct it. Unfor- tunately, many people do not use them, as fact checking is a very time-consuming job [29]. In order to raise awareness about false information and fact checking, technical solutions can aid social media providers and also users, for example by an integrated fact checks or warnings if content is suspicious [21]. Platforms such as Twitter and Facebook, for example, attempt to filter misinformation once it has been detected and rank clarifications higher in their search results and rec- ommendation engines. For this purpose Facebook uses machine learning, user feedback and behaviour, and third-party fact checking organisations [5]. Twitter on the other hand uses labels and warnings to provide more context for users to certain topics [21]. Approaches for identification, verification, and governance of misinformation must be clearer and faster in order to encounter misinformation. We need technological support to combat the amount of misinformation. Due to the fast spread and hard-to-detect suspicious content, users are not able to cope with the load of false and true information anymore, so the use of continually provided and updated techniques is crucial [6]. Many techniques are available for the identification of misinformation, using different machine learning techniques such as RST [22], LIWC [20], and HAN [16], however they all have advantages and disadvantages and are sometimes only suitable for specific use-cases [3, 32]. Manual and automatic approaches are used in the body of research that focuses primarily on German misinformation. Content analysis to extract top- ics for subsequent (automated) processing, such as sentiment analysis [14] or statistical analysis [25], are examples of manual procedures. Readability anal- ysis [27], and content analysis [4, 29] are two automatic ways for dealing with German misinformation. Content analysis are also used to identify misinforma- tion, for example Englmeier [4] proposes a text analysis approach for recognizing 1 https://www.politifact.com/; https://www.mimikama.at/; https://www.factcheck.org/; https://correctiv.org/faktencheck/ German and Spanish misinformation by combining Named Entity Recognition (NER) and Bag of Words (BoW) to extract semantic markers that lead to the specific meaning of a sentence. While this strategy is still under development, no evaluation has been conducted, and they imply that various adjustments and human interventions are required to adapt this technique to different topics. Vo- gel and Jiang [29] employ two supervised machine learning algorithms, SVM and CNN, to detect German misinformation, which has been shown to be effective. Machine learning approaches, which employ German language in general include content analysis [23, 28], opinion mining [18], and categorizations [30]. Although most studies focus on misinformation in English, many of which show promising results, we aim to apply them to German misinformation and test their effectiveness. Considering supervised machine learning algorithms only analyze linguistic and style-based characteristics, we intend to address this issue by using a combination of symbolic and sub-symbolic approaches. We aim to improve a solely style-based approach with logic and reasoning by incorporating a symbolic system, such as knowledge graphs, due to information about objects and their relationships. This solution was proven to be effective by Denaux and Gomez-Perez [3], for misinformation in the English language, however it is not evaluated for German misinformation yet. 3 Methodology In order to approach this research, which involves implementations and com- parisons of symbolic and sub-symbolic approaches for the identification of mis- information, as well as a theoretical overview of state-of-the-art technologies and strategies, we use the design science methodology [9]. The design science approach is used for research projects in the information systems field, and it focuses on the outcome, namely artifacts, as well as the process of development and evaluation, with the goal of improving the artifact’s functional performance. The goal of this research is to create a number of artifacts, that will contribute to the technical aspects of misinformation detection. We employ the design science approach of Hevner [9], which specifies three cycles: the relevance cycle, the rigor cycle, and the design cycle. The contextual environment of the research will be defined in the relevance cycle. For the rigor cycle, the state of the art will be examined to extract scientific foundations, experience, and expertise in order to add this knowledge to the research project. The central design cycle attaches to both cycles and includes the core activity of creating and evaluating the artifacts. For this study we will identify opportunities and challenges for involved stake- holders, structures, and technologies to determine which context, requirements, and scope the anticipated artifact would be successful in, in the relevance cy- cle. We are currently working on the rigor cycle, where we examine existing literature and approaches regarding misinformation, using a combination of a semi-systematic approach and an integrative approach, as proposed by Snyder [26]. At the beginning, we identified state of the art domains, trends, and gaps in the misinformation literature in order to categorize and abstract them into clusters, each with its included topics; however, there are connections between topics that enrich our understanding of the clusters, so we extracted this knowl- edge. The overarching goal is to provide a holistic overview and abstract topics in three clusters: social, political, and technical. In the core of the study, the de- sign cycle, we will design and evaluate artifacts. The goal of the design cycle is to produce artifacts that aid in the problem of misinformation detection. Based on the results of the rigor cycle, we intend to fill a gap in the area misinforma- tion detection by creating a technological contribution that improves on current misinformation identification approaches in the German language, and evaluate it using a benchmark on performance values of the algorithms. We will start by exploring misinformation in the German language using news articles from mainstream media outlets. As a result, this study will contain longer pieces of text in High German, small texts such as micro-blogs and commentary will be considered in future iterations. Given that there are already text analysis approaches for High German [23, 28], we will be able to extract specific language features utilized in the German language and customize the artifact to apply these features on misinformation. This method could be extended to various dialects in the future. This cycle will be separated into three tasks, each of which results in an artifact, as described in the research plan in Section 4. 4 Research plan To approach this research, we devised the research plan outlined below. The design cycle is divided into three tasks: identification implementation, dataset evaluation, and algorithm benchmark, each of which results in an artifact. Identification implementation. Since most linguistically based algorithms are trained for the English language, we aim to improve the accuracy of detect- ing German misinformation in news articles and texts. To train on German language misinformation, we use a supervised classification machine learn- ing algorithm, which works with natural language, such as RST [22], LIWC [20], HAN [16], and Long Short-Term Memory (LSTM) [1]. We also in- clude transformer models, which use contextual word embedding, such as Bidirectional Encoder Representations from Transformers (BERT) [12]. For the rule-based identification of misinformation by its linguistic features, two German datasets (GermanFakeNC2 , Fake News Dataset German3 ) are used to train the machine learning algorithms. We extend this sub-symbolic ap- proach by combining it with symbolic techniques, such as knowledge graphs, or including knowledge from datasets in other languages [8], which adds logic and multi-lingual capability to misinformation detection. Dataset evaluation. In order to further optimize the trained algorithms (RST, LIWC, HAN, LSTM, BERT), we annotate articles of German media outlets 2 https://zenodo.org/record/3375714 3 https://www.kaggle.com/astoeckl/fake-news-dataset-german claim by claim, classifying them as factual false claims or truthful infor- mation. Then we evaluate the algorithms’ misclassifications, by comparing the results of misinformation identification of the same story from different mainstream news outlets. By comparing the same story across many news sources, we aim to recognize sensational news sites, as well as which features are used (or may be used) to cover fake news content on certain websites. This knowledge and rules can be further used to improve existing linguistic feature machine learning algorithms. Algorithm benchmark. We compare the algorithm, which is optimized for iden- tification of German misinformation to other linguistic (LIWC, HAN) and non-linguistic (recurrent neural network RNN [11]) algorithms. For the eval- uation of the combination of symbolic and sub-symbolic approaches, we con- duct a benchmark using the aforementioned German datasets. In order to assess the performance of each classification algorithm, we measure the num- ber of true positives, true negatives, false positives (information that is true, but classified as misinformation), and false negatives (information that is false, but classified as true), compare these values and interpret the outcome of the benchmark. We intend to dig deeper into the output error analysis in order to provide explanations for why a piece of information is discovered to be incorrect, in order to build an interpretable system, because errors in the identification of misinformation can have severe consequences [17]. 5 Conclusion We presented our anticipated research in this proposal, which focuses on the technical side of misinformation, specifically detecting German false information using a combination of linguistic-based machine learning algorithms and sym- bolic techniques, such as knowledge graphs. We summarized related work, which encompasses a generally, and a technical viewpoint on the issue of misinforma- tion. There we recognized a need for the work with machine learning algorithms on automatic misinformation detection in the German language, as many ap- proaches have proven successful for English text, and have not been assessed for the German language. We described our strategy to identifying German misin- formation using a combination of sub-symbolic and symbolic approaches, and we intend to compare it to other algorithms. With this research, we aim to con- tribute to the problem of misinformation identification. The present main focus is a survey of the state of the art, which has been ongoing for several months, with plans to complete it soon. We have already begun collecting data on German misinformation in mainstream media, and as a following step, we will implement the identification algorithm, train and test it on the acquired datasets. References 1. A. Abedalla, A. Al-Sadi, and M. Abdullah. A closer look at fake news detection: A deep learning perspective. In Proceedings of the 2019 3rd In- ternational Conference on Advances in Artificial Intelligence, pages 24–28, 2019. 2. H. Allcott and M. Gentzkow. Social media and fake news in the 2016 election. Journal of economic perspectives, 31(2):211–36, 2017. 3. R. Denaux and J. M. Gomez-Perez. Linked credibility reviews for explainable misinformation detection. In The Semantic Web – ISWC 2020, pages 147– 163, Cham, 2020. Springer International Publishing. 4. K. Englmeier. Named entities and their role in creating context information. Procedia Computer Science, 176:2069–2076, 2020. 5. Facebook for Media. Working to Stop Misinformation and False News. URL https://www.facebook.com/formedia/blog/ working-to-stop-misinformation-and-false-news. 6. M. Fernandez and H. Alani. Online misinformation: Challenges and future directions. In Companion Proceedings of the The Web Conference 2018, pages 595–602, 2018. 7. A. Guess, J. Nagler, and J. Tucker. Less than you think: Prevalence and predictors of fake news dissemination on facebook. Science Advances, 5(1): eaau4586, 2019. 8. G. Guibon, L. Ermakova, H. Seffih, A. Firsov, and G. Le Noé-Bienvenu. Multilingual fake news detection with satire. In CICLing: International Conference on Computational Linguistics and Intelligent Text Processing, 2019. 9. A. R. Hevner. A three cycle view of design science research. Scandinavian Journal of Information Systems, 19(2):4, 2007. 10. M. R. Islam, S. Liu, X. Wang, and G. Xu. Deep learning for misinformation detection on online social networks: a survey and new perspectives. Social Network Analysis and Mining, 10(1):1–20, 2020. 11. S. S. Jadhav and S. D. Thepade. Fake news identification and classifica- tion using dssm and improved recurrent neural network classifier. Applied Artificial Intelligence, 33(12):1058–1068, 2019. 12. H. Jwa, D. Oh, K. Park, J. M. Kang, and H. Lim. exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert). Applied Sciences, 9(19):4062, 2019. 13. V. Koulolias, G. M. Jonathan, M. Fernandez, and D. Sotirchos. Combating misinformation: An ecosystem in co-creation. OECD Publishing, 2018. 14. E. Kušen and M. Strembeck. Politics, sentiments, and misinformation: An analysis of the twitter discussion on the 2016 austrian presidential elections. Online Social Networks and Media, 5:37–50, 2018. 15. T. T. Luk, S. Zhao, X. Weng, J. Y.-H. Wong, Y. S. Wu, S. Y. Ho, T. H. Lam, and M. P. Wang. Exposure to health misinformation about covid-19 and increased tobacco and alcohol use: A population-based survey in hong kong. Tobacco Control, 2020. 16. R. Mishra and V. Setty. Sadhan: Hierarchical attention networks to learn latent aspect embeddings for fake news detection. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’19, page 197–204. Association for Computing Machinery, 2019. 17. S. Mohseni, E. Ragan, and X. Hu. Open issues in combating fake news: Interpretability as an opportunity. arXiv preprint arXiv:1904.03016, 2019. 18. S. Momtazi. Fine-grained german sentiment analysis on social media. In LREC, pages 1215–1220. Citeseer, 2012. 19. S. B. Parikh and P. K. Atrey. Media-rich fake news detection: A survey. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pages 436–441. IEEE, 2018. 20. V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea. Automatic de- tection of fake news. arXiv preprint arXiv:1708.07104, 2017. 21. Y. Roth and N. Pickles. Updating our approach to misleading information, may 2020. URL https://blog.twitter.com/en_us/topics/product/ 2020/updating-our-approach-to-misleading-information.html. 22. V. L. Rubin and T. Lukoianova. Truth and deception at the rhetorical struc- ture level. Journal of the Association for Information Science and Technol- ogy, 66(5):905–917, 2015. 23. M. Scharkow. Thematic content analysis using supervised machine learning: An empirical evaluation using german online news. Quality & Quantity, 47 (2):761–773, 2013. 24. D. A. Scheufele and N. M. Krause. Science audiences, misinformation, and fake news. Proceedings of the National Academy of Sciences, 116(16):7662– 7669, 2019. 25. M. Shahrezaye, M. Meckel, L. Steinacker, and V. Suter. Covid-19’s (mis)information ecosystem on twitter: How partisanship boosts the spread of conspiracy narratives on german speaking twitter. CoRR, abs/2009.12905, 2020. 26. H. Snyder. Literature review as a research methodology: An overview and guidelines. Journal of Business Research, 104:333–339, 2019. 27. J. L. Spiegel, B. G. Weiss, I. Stoycheva, M. Canis, and F. Ihler. [assessment of german-language information on sudden sensorineural hearing loss in the internet]. Laryngo- rhino- otologie, 2021. 28. T. Spinde, F. Hamborg, and B. Gipp. An integrated approach to detect media bias in german news articles. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, pages 505–506, 2020. 29. I. Vogel and P. Jiang. Fake news detection with the new german dataset “germanfakenc”. In International Conference on Theory and Practice of Digital Libraries, pages 288–295. Springer, 2019. 30. B. Waltl, G. Bonczek, E. Scepankova, and F. Matthes. Semantic types of legal norms in german laws: classification and analysis using local linear explanations. Artificial Intelligence and Law, 27(1):43–71, 2019. 31. R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roes- ner, and Y. Choi. Defending against neural fake news. arXiv preprint arXiv:1905.12616, 2019. 32. X. Zhou and R. Zafarani. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 53(5):1–40, 2020.