=Paper= {{Paper |id=Vol-2956/paper34 |storemode=property |title=Understanding Misinformation: Perspectives on Emerging Issues |pdfUrl=https://ceur-ws.org/Vol-2956/paper34.pdf |volume=Vol-2956 |authors=Astrid Krickl |dblpUrl=https://dblp.org/rec/conf/ruleml/Krickl21 }} ==Understanding Misinformation: Perspectives on Emerging Issues== https://ceur-ws.org/Vol-2956/paper34.pdf
Understanding Misinformation: Perspectives on
              Emerging Issues

                                       Astrid Krickl

            Vienna University of Economics and Business, Vienna, Austria
                              astrid.krickl@wu.ac.at



        Abstract. The ever-increasing volume of information disseminated by
        technological advances leads to new challenges with respect to misinfor-
        mation and fake news. Misinformation could potentially affect people in
        their daily life, and the choices they are making, be it medical, financial,
        or the upcoming elections. Technological solutions to identify, verify, and
        manage misinformation aim to combat the harmful effects of misinfor-
        mation. In this research proposal, we explore the use of machine learning
        algorithms in combination with symbolic approaches to identify misin-
        formation in German news texts in order to help users spot deceptive
        articles. Based on linguistic aspects of the text, news stories are classified
        as misleading or truthful. In particular, we plan to compare it to existing
        symbolic and sub-symbolic approaches for misinformation detection.

        Keywords: Misinformation · Fake news · Machine learning · Ger-
        man text.


1     Introduction


    Misinformation is defined as any incorrect content (text, images, or videos)
that can be found in news, articles, posts, or tweets, among others [19, 32]. This
content can be intentionally misleading, so that readers are convinced to believe
something that is not true (e.g., during the Cold War, the Soviets KGB circu-
lated the myth that Aids was the result of secret US military research) [19, 31]
or without any intention (e.g., users posting photos of swans and dolphins in
Venice’s canals on social media without realizing the reports were false) [6, 32].
Although the implications of misinformation vary, it is clear that the electronic
spread of misinformation hastened the process [13]. For instance, social network-
ing sites make it simple for people to share information and misinformation with
others, but they also facilitate manual source checking [13].
    When it comes to tackling misinformation technical, modern technological
solutions include the identification [10, 19, 31], verification [10, 32], and gov-
ernance [13, 32]. Researchers also examine the effect of misinformation from a
    Copyright © 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
social [15, 24] and political [2, 7] perspective. Readers are subjected to recurring
information floods, such as the infodemic, which includes misinformation about
the coronavirus, such as unconfirmed claims that drinking alcohol and smok-
ing may protect against the virus, which is linked to a self-reported increase
in consumption in China [15]. Furthermore, there is fear about elections being
manipulated for the benefit of a small group of individuals or corporations [2, 7].
    Many machine learning approaches for misinformation in English, such as
rhetorical structure theory (RST) [22], linguistic inquiry and word count (LIWC)
[20], and hierarchical attention neural network (HAN) [16], have been proven to
be effective. While, support vector machine (SVM) and convolutional neural
network (CNN) were used successfully in the German language to detect mis-
information [29]. We aim to improve sub-symbolic machine learning approaches
by combining symbolic approaches, given the constraints of identifying misinfor-
mation solely based on its style. Symbolic techniques, such as knowledge graphs,
give logic to a system through their graph structure, which consists of relations
between objects. Few studies have demonstrated that combining symbolic and
sub-symbolic techniques is effective for detecting misinformation in English [3].
We intend to fill this gap by enhancing effective techniques and offering ap-
proaches that aim to uncover difficult-to-identify German misinformation. Our
work is guided by the following overarching research question:
     What are the difficulties in identifying German misinformation using
     symbolic and sub-symbolic approaches, and how can detection techniques
     be evaluated?
    This overarching research question can be subdivided into the following three
sub questions: In what ways could linguistically based machine learning algo-
rithms and logic be utilized to detect German misinformation in news articles in
the mainstream media? How can German misinformation datasets be enhanced
in order to further optimize machine learning algorithms? Which attributes are
necessary to benchmark existing misinformation techniques?
    The aforementioned research question translate into the following concrete
tasks: the development of a misinformation identification algorithm, training
it on two German datasets, and lastly comparing it to existing symbolic and
sub-symbolic identification techniques.
    The remainder of the research proposal is structured as follows: in Section 2
we provide an overview about the state of the art in the area of misinformation.
Following on, in Section 3, we present the anticipated methodology we will apply
for this study. In Section 4 we discuss the resulting tasks of the methodology.
Finally we summarize our findings in a conclusion in Section 5.


2    State of the art
While much information is shared via traditional media, social network sites sim-
plify the sharing of news significantly, especially for non journalists [24]. With the
vast amount of news comes a certain percentage of incorrect information, which
is referred to as misinformation or fake news. Zhou and Zafarani [32] describe
concepts related to intentionally misleading misinformation, such as deceptive
news, disinformation, cherry picking, and click-baits. False news, misinforma-
tion, and rumors, on the other hand, are not always deceptive, according to the
authors. They also mention satire as an example of fake news for entertainment.
Islam et al. [10] provide a differentiation of misinformation, including: (i) disin-
formation which is incorrect information with the intention of misleading, such
as fake news and disinformation; and (ii) misinformation, which is also false
information but without intention, where they fit false information, rumor and
spam in. Also the authors state that receivers of misinformation are mostly un-
aware of the intention of the article’s authors. In addition, they argue that the
spread of false information can lead to a harm of society, business markets and
also health care systems among others.
    Although modern technology facilitates the spread of misinformation, techni-
cal approaches for identification, verification and governance of misinformation
can address problems and provide solutions. The rapid spread of misinformation
in comparison to the relatively slow spread of clarifications poses issues [13].
To address this concern, correction sites1 in various languages and for different
topics try to respond quickly to identified misinformation and correct it. Unfor-
tunately, many people do not use them, as fact checking is a very time-consuming
job [29]. In order to raise awareness about false information and fact checking,
technical solutions can aid social media providers and also users, for example by
an integrated fact checks or warnings if content is suspicious [21]. Platforms such
as Twitter and Facebook, for example, attempt to filter misinformation once it
has been detected and rank clarifications higher in their search results and rec-
ommendation engines. For this purpose Facebook uses machine learning, user
feedback and behaviour, and third-party fact checking organisations [5]. Twitter
on the other hand uses labels and warnings to provide more context for users to
certain topics [21]. Approaches for identification, verification, and governance of
misinformation must be clearer and faster in order to encounter misinformation.
We need technological support to combat the amount of misinformation. Due to
the fast spread and hard-to-detect suspicious content, users are not able to cope
with the load of false and true information anymore, so the use of continually
provided and updated techniques is crucial [6]. Many techniques are available for
the identification of misinformation, using different machine learning techniques
such as RST [22], LIWC [20], and HAN [16], however they all have advantages
and disadvantages and are sometimes only suitable for specific use-cases [3, 32].
    Manual and automatic approaches are used in the body of research that
focuses primarily on German misinformation. Content analysis to extract top-
ics for subsequent (automated) processing, such as sentiment analysis [14] or
statistical analysis [25], are examples of manual procedures. Readability anal-
ysis [27], and content analysis [4, 29] are two automatic ways for dealing with
German misinformation. Content analysis are also used to identify misinforma-
tion, for example Englmeier [4] proposes a text analysis approach for recognizing
1
    https://www.politifact.com/; https://www.mimikama.at/;
    https://www.factcheck.org/; https://correctiv.org/faktencheck/
German and Spanish misinformation by combining Named Entity Recognition
(NER) and Bag of Words (BoW) to extract semantic markers that lead to the
specific meaning of a sentence. While this strategy is still under development,
no evaluation has been conducted, and they imply that various adjustments and
human interventions are required to adapt this technique to different topics. Vo-
gel and Jiang [29] employ two supervised machine learning algorithms, SVM and
CNN, to detect German misinformation, which has been shown to be effective.
Machine learning approaches, which employ German language in general include
content analysis [23, 28], opinion mining [18], and categorizations [30].
    Although most studies focus on misinformation in English, many of which
show promising results, we aim to apply them to German misinformation and
test their effectiveness. Considering supervised machine learning algorithms only
analyze linguistic and style-based characteristics, we intend to address this issue
by using a combination of symbolic and sub-symbolic approaches. We aim to
improve a solely style-based approach with logic and reasoning by incorporating
a symbolic system, such as knowledge graphs, due to information about objects
and their relationships. This solution was proven to be effective by Denaux and
Gomez-Perez [3], for misinformation in the English language, however it is not
evaluated for German misinformation yet.


3   Methodology

In order to approach this research, which involves implementations and com-
parisons of symbolic and sub-symbolic approaches for the identification of mis-
information, as well as a theoretical overview of state-of-the-art technologies
and strategies, we use the design science methodology [9]. The design science
approach is used for research projects in the information systems field, and it
focuses on the outcome, namely artifacts, as well as the process of development
and evaluation, with the goal of improving the artifact’s functional performance.
The goal of this research is to create a number of artifacts, that will contribute
to the technical aspects of misinformation detection.
    We employ the design science approach of Hevner [9], which specifies three
cycles: the relevance cycle, the rigor cycle, and the design cycle. The contextual
environment of the research will be defined in the relevance cycle. For the rigor
cycle, the state of the art will be examined to extract scientific foundations,
experience, and expertise in order to add this knowledge to the research project.
The central design cycle attaches to both cycles and includes the core activity
of creating and evaluating the artifacts.
    For this study we will identify opportunities and challenges for involved stake-
holders, structures, and technologies to determine which context, requirements,
and scope the anticipated artifact would be successful in, in the relevance cy-
cle. We are currently working on the rigor cycle, where we examine existing
literature and approaches regarding misinformation, using a combination of a
semi-systematic approach and an integrative approach, as proposed by Snyder
[26]. At the beginning, we identified state of the art domains, trends, and gaps
in the misinformation literature in order to categorize and abstract them into
clusters, each with its included topics; however, there are connections between
topics that enrich our understanding of the clusters, so we extracted this knowl-
edge. The overarching goal is to provide a holistic overview and abstract topics
in three clusters: social, political, and technical. In the core of the study, the de-
sign cycle, we will design and evaluate artifacts. The goal of the design cycle is
to produce artifacts that aid in the problem of misinformation detection. Based
on the results of the rigor cycle, we intend to fill a gap in the area misinforma-
tion detection by creating a technological contribution that improves on current
misinformation identification approaches in the German language, and evaluate
it using a benchmark on performance values of the algorithms.
    We will start by exploring misinformation in the German language using news
articles from mainstream media outlets. As a result, this study will contain longer
pieces of text in High German, small texts such as micro-blogs and commentary
will be considered in future iterations. Given that there are already text analysis
approaches for High German [23, 28], we will be able to extract specific language
features utilized in the German language and customize the artifact to apply
these features on misinformation. This method could be extended to various
dialects in the future. This cycle will be separated into three tasks, each of
which results in an artifact, as described in the research plan in Section 4.


4     Research plan
To approach this research, we devised the research plan outlined below. The
design cycle is divided into three tasks: identification implementation, dataset
evaluation, and algorithm benchmark, each of which results in an artifact.

Identification implementation. Since most linguistically based algorithms are
   trained for the English language, we aim to improve the accuracy of detect-
   ing German misinformation in news articles and texts. To train on German
   language misinformation, we use a supervised classification machine learn-
   ing algorithm, which works with natural language, such as RST [22], LIWC
   [20], HAN [16], and Long Short-Term Memory (LSTM) [1]. We also in-
   clude transformer models, which use contextual word embedding, such as
   Bidirectional Encoder Representations from Transformers (BERT) [12]. For
   the rule-based identification of misinformation by its linguistic features, two
   German datasets (GermanFakeNC2 , Fake News Dataset German3 ) are used
   to train the machine learning algorithms. We extend this sub-symbolic ap-
   proach by combining it with symbolic techniques, such as knowledge graphs,
   or including knowledge from datasets in other languages [8], which adds logic
   and multi-lingual capability to misinformation detection.
Dataset evaluation. In order to further optimize the trained algorithms (RST,
   LIWC, HAN, LSTM, BERT), we annotate articles of German media outlets
2
    https://zenodo.org/record/3375714
3
    https://www.kaggle.com/astoeckl/fake-news-dataset-german
   claim by claim, classifying them as factual false claims or truthful infor-
   mation. Then we evaluate the algorithms’ misclassifications, by comparing
   the results of misinformation identification of the same story from different
   mainstream news outlets. By comparing the same story across many news
   sources, we aim to recognize sensational news sites, as well as which features
   are used (or may be used) to cover fake news content on certain websites.
   This knowledge and rules can be further used to improve existing linguistic
   feature machine learning algorithms.
Algorithm benchmark. We compare the algorithm, which is optimized for iden-
   tification of German misinformation to other linguistic (LIWC, HAN) and
   non-linguistic (recurrent neural network RNN [11]) algorithms. For the eval-
   uation of the combination of symbolic and sub-symbolic approaches, we con-
   duct a benchmark using the aforementioned German datasets. In order to
   assess the performance of each classification algorithm, we measure the num-
   ber of true positives, true negatives, false positives (information that is true,
   but classified as misinformation), and false negatives (information that is
   false, but classified as true), compare these values and interpret the outcome
   of the benchmark. We intend to dig deeper into the output error analysis in
   order to provide explanations for why a piece of information is discovered to
   be incorrect, in order to build an interpretable system, because errors in the
   identification of misinformation can have severe consequences [17].


5   Conclusion
We presented our anticipated research in this proposal, which focuses on the
technical side of misinformation, specifically detecting German false information
using a combination of linguistic-based machine learning algorithms and sym-
bolic techniques, such as knowledge graphs. We summarized related work, which
encompasses a generally, and a technical viewpoint on the issue of misinforma-
tion. There we recognized a need for the work with machine learning algorithms
on automatic misinformation detection in the German language, as many ap-
proaches have proven successful for English text, and have not been assessed for
the German language. We described our strategy to identifying German misin-
formation using a combination of sub-symbolic and symbolic approaches, and
we intend to compare it to other algorithms. With this research, we aim to con-
tribute to the problem of misinformation identification. The present main focus is
a survey of the state of the art, which has been ongoing for several months, with
plans to complete it soon. We have already begun collecting data on German
misinformation in mainstream media, and as a following step, we will implement
the identification algorithm, train and test it on the acquired datasets.


References
 1. A. Abedalla, A. Al-Sadi, and M. Abdullah. A closer look at fake news
    detection: A deep learning perspective. In Proceedings of the 2019 3rd In-
    ternational Conference on Advances in Artificial Intelligence, pages 24–28,
    2019.
 2. H. Allcott and M. Gentzkow. Social media and fake news in the 2016 election.
    Journal of economic perspectives, 31(2):211–36, 2017.
 3. R. Denaux and J. M. Gomez-Perez. Linked credibility reviews for explainable
    misinformation detection. In The Semantic Web – ISWC 2020, pages 147–
    163, Cham, 2020. Springer International Publishing.
 4. K. Englmeier. Named entities and their role in creating context information.
    Procedia Computer Science, 176:2069–2076, 2020.
 5. Facebook for Media.              Working to Stop Misinformation and
    False News.            URL https://www.facebook.com/formedia/blog/
    working-to-stop-misinformation-and-false-news.
 6. M. Fernandez and H. Alani. Online misinformation: Challenges and future
    directions. In Companion Proceedings of the The Web Conference 2018,
    pages 595–602, 2018.
 7. A. Guess, J. Nagler, and J. Tucker. Less than you think: Prevalence and
    predictors of fake news dissemination on facebook. Science Advances, 5(1):
    eaau4586, 2019.
 8. G. Guibon, L. Ermakova, H. Seffih, A. Firsov, and G. Le Noé-Bienvenu.
    Multilingual fake news detection with satire. In CICLing: International
    Conference on Computational Linguistics and Intelligent Text Processing,
    2019.
 9. A. R. Hevner. A three cycle view of design science research. Scandinavian
    Journal of Information Systems, 19(2):4, 2007.
10. M. R. Islam, S. Liu, X. Wang, and G. Xu. Deep learning for misinformation
    detection on online social networks: a survey and new perspectives. Social
    Network Analysis and Mining, 10(1):1–20, 2020.
11. S. S. Jadhav and S. D. Thepade. Fake news identification and classifica-
    tion using dssm and improved recurrent neural network classifier. Applied
    Artificial Intelligence, 33(12):1058–1068, 2019.
12. H. Jwa, D. Oh, K. Park, J. M. Kang, and H. Lim. exbake: Automatic fake
    news detection model based on bidirectional encoder representations from
    transformers (bert). Applied Sciences, 9(19):4062, 2019.
13. V. Koulolias, G. M. Jonathan, M. Fernandez, and D. Sotirchos. Combating
    misinformation: An ecosystem in co-creation. OECD Publishing, 2018.
14. E. Kušen and M. Strembeck. Politics, sentiments, and misinformation: An
    analysis of the twitter discussion on the 2016 austrian presidential elections.
    Online Social Networks and Media, 5:37–50, 2018.
15. T. T. Luk, S. Zhao, X. Weng, J. Y.-H. Wong, Y. S. Wu, S. Y. Ho, T. H.
    Lam, and M. P. Wang. Exposure to health misinformation about covid-19
    and increased tobacco and alcohol use: A population-based survey in hong
    kong. Tobacco Control, 2020.
16. R. Mishra and V. Setty. Sadhan: Hierarchical attention networks to learn
    latent aspect embeddings for fake news detection. In Proceedings of the 2019
    ACM SIGIR International Conference on Theory of Information Retrieval,
    ICTIR ’19, page 197–204. Association for Computing Machinery, 2019.
17. S. Mohseni, E. Ragan, and X. Hu. Open issues in combating fake news:
    Interpretability as an opportunity. arXiv preprint arXiv:1904.03016, 2019.
18. S. Momtazi. Fine-grained german sentiment analysis on social media. In
    LREC, pages 1215–1220. Citeseer, 2012.
19. S. B. Parikh and P. K. Atrey. Media-rich fake news detection: A survey. In
    2018 IEEE Conference on Multimedia Information Processing and Retrieval
    (MIPR), pages 436–441. IEEE, 2018.
20. V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea. Automatic de-
    tection of fake news. arXiv preprint arXiv:1708.07104, 2017.
21. Y. Roth and N. Pickles. Updating our approach to misleading information,
    may 2020. URL https://blog.twitter.com/en_us/topics/product/
    2020/updating-our-approach-to-misleading-information.html.
22. V. L. Rubin and T. Lukoianova. Truth and deception at the rhetorical struc-
    ture level. Journal of the Association for Information Science and Technol-
    ogy, 66(5):905–917, 2015.
23. M. Scharkow. Thematic content analysis using supervised machine learning:
    An empirical evaluation using german online news. Quality & Quantity, 47
    (2):761–773, 2013.
24. D. A. Scheufele and N. M. Krause. Science audiences, misinformation, and
    fake news. Proceedings of the National Academy of Sciences, 116(16):7662–
    7669, 2019.
25. M. Shahrezaye, M. Meckel, L. Steinacker, and V. Suter. Covid-19’s
    (mis)information ecosystem on twitter: How partisanship boosts the spread
    of conspiracy narratives on german speaking twitter. CoRR, abs/2009.12905,
    2020.
26. H. Snyder. Literature review as a research methodology: An overview and
    guidelines. Journal of Business Research, 104:333–339, 2019.
27. J. L. Spiegel, B. G. Weiss, I. Stoycheva, M. Canis, and F. Ihler. [assessment
    of german-language information on sudden sensorineural hearing loss in the
    internet]. Laryngo- rhino- otologie, 2021.
28. T. Spinde, F. Hamborg, and B. Gipp. An integrated approach to detect
    media bias in german news articles. In Proceedings of the ACM/IEEE Joint
    Conference on Digital Libraries in 2020, pages 505–506, 2020.
29. I. Vogel and P. Jiang. Fake news detection with the new german dataset
    “germanfakenc”. In International Conference on Theory and Practice of
    Digital Libraries, pages 288–295. Springer, 2019.
30. B. Waltl, G. Bonczek, E. Scepankova, and F. Matthes. Semantic types of
    legal norms in german laws: classification and analysis using local linear
    explanations. Artificial Intelligence and Law, 27(1):43–71, 2019.
31. R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roes-
    ner, and Y. Choi. Defending against neural fake news. arXiv preprint
    arXiv:1905.12616, 2019.
32. X. Zhou and R. Zafarani. A survey of fake news: Fundamental theories,
    detection methods, and opportunities. ACM Computing Surveys (CSUR),
    53(5):1–40, 2020.