=Paper=
{{Paper
|id=Vol-2041/paper4
|storemode=property
|title=Towards an Understanding of Fake News
|pdfUrl=https://ceur-ws.org/Vol-2041/paper4.pdf
|volume=Vol-2041
|authors=Ozlem Ozgobek,Jon Atle Gulla
}}
==Towards an Understanding of Fake News==
Towards an Understanding of Fake News Özlem Özgöbek, Jon Atle Gulla Email: {ozlem.ozgobek, jon.atle.gulla}@ntnu.no Department of Computer Science, NTNU, Trondheim, Norway Abstract. Fake news articles are intentionally fabricated to be decep- tive and can be proven that they are false. Fake news and spread of mis- information are important concepts which may have serious real world consequences. Even though this concept exists for so many years, with the advancements in technology the speed of diusion of misinformation and, how people consume and produce news has changed a lot. So the eort towards detecting fake news quickly and correctly has became a challenge. Today most of the fact checking is done by professional jour- nalists but the research towards the automatic detection of fake news increases rapidly. For automatic detection of fake news, linguistic and machine learning techniques are the most frequently used techniques. In this paper, we analyze these techniques in three main groups: Content based methods, user based methods and network based methods. We also give a short introduction to the concepts and present some preliminary research results towards an understanding of fake news. It was not so long ago that the term fake news started to appear frequently in the media. Even though the terms like deception, hoax, clickbait and credibil- ity detection of news articles are within the focus of researchers for some time, with the frequent use of the term fake news, a new detion and more specic work towards detecting it was required. Even though the term fake news is quite new, there was always some news- papers trying to take more attention from the readers through exaggerated head- lines and articles containing misinformation [7] [1]. In the internet era, where every individual have an opportunity to publish and be visible by many others, it is not surprising that the generation of fake news has increased. One of the main reasons of generating fake news is the economic gain which can be acquired by getting more clicks or generating paid fake content [5] for parties who want to get more clicks. Fake news can cause sudden changes in stock market and this can easily be converted to an economic gain by the parties who published the fake news articles [2]. Another common reason for generating fake news is trying to create a deception and/or a political bias within users in order to get more supporters. On the other hand where to draw the line between fake news and expression of opinions is important. Copyright held by the author(s). NOBIDS 2017 2 There are many reasons that the fake news gained so much attention in the last couple of years. Especially the 2016 US elections played an important role for the public attention on fake news. The fact that fake news actually causes some real world problems [4] [6] is another reason that the public reaction has increased. Obviously social media is an important way of getting news for especially younger generation [17]. There are two aspects of the news on social media: Traditional news shared on social media and social media as a source of news (through users who shared events nearby). The second aspect is sometimes used by the traditional media houses to generate news articles. Both aspects, inten- tionally or unintentionally, can lead fake news to spread even more. According to some research people are not very good at distinguishing real news from fake ones. In [18] it is suggested that humans can distinguish only 70% of the fake news. In another study it is mentioned that 75% of people classied the fake news as accurate news [26]. [23] suggests that people tend to classify news articles that they do not agree as fake news. The confusion and bias of the readers bring the demand for approved news from trusted sources. Today fake news detection is mostly done by the manual work of professional journalists. Fact checking web pages are available in many languages all around the world [3]. On the other hand, the amount of research going on towards the automatic detection of fake news is increasing rapidly. The collaboration between the researchers and journalists has an increasing importance towards the development of automatic fake news detection systems. In this paper we give a brief state of the art to the automatic detection of fake news. The paper is structured as follows: In Section 1 the denition of fake news is given with dierent aspects. In Section 2 the diusion mechanisms of fake news is introduced. Commonly used techniques towards automatic detection of fake news is discussed in Section 3. A brief summary of the important research ndings is given in Section 4. Finally, the conclusions is given in Section 5. 1 Dening Fake News Automatic detection of fake news has many challenges. Dening fake news and setting the right scope for detecting it is one of the basic challenges. The scope of fake news is usually dierent within dierent research about fake news detection. For example, clickbaits (the content which is specically designed to attarct more attention from the readers) are seen as implicated in fast spread of fake news by some researchers [11], while others does not count this aspect. Similarly satirical news is classied as a type of fake news by some researchers [25], while it is not by some others [29]. There is a similar debate about rumors, conspiracy theories, hyperpartisan news and hoaxes too. A more general term used for fake news and misinformation detection is called deception detection. In [24] authors dene deceptive news in three main categories: Intentionally fabricated, large scale hoaxes and humorous news taken seriously. On the other hand, in [29] a narrower denition of fake news was accepted which only includes the news 3 articles that are intentionally and veriably false [8]. This denition of fake news does not include satire, hoax, rumors, conspiracy theories and unintentionally created misinformation. Keeping the fake news denition as narrow as possible for automatic fake news detection research has more advantages in order to keep the focus down to the core elements and eliminate ambiguity. But we should also keep in mind that there are more complicated challenges and it is important to: Draw the line between fake news and expression of opinions. Distinguish humour, satire, fauxtire and irony: The satire and humouristic news articles sometimes classied as fake news because they does not con- tain real events and sometimes they can mislead readers. Also detecting the humor within a true news story is challenging and can be a false cue in order to detect the fake news automatically. Consider fake items in a true story: Sometimes the news article can contain fake elements. These elements does not make the whole story fake but can cause deception. These fake elements are usually harder to detect than de- tecting the whole fake news article and they make it hard to classify the article as fake or not or to decide the level of fake elements in the article. In this paper we assume the following denition of fake news: Fake news articles are intentionally fabricated to be deceptive and can be proven that they are false. 2 The Diusion of Fake News The diusion of fake news can be through dierent media but obviously social media has the greatest share on spread of fake news. The very dynamic nature of social media and the fact that every individual is also a publisher in it, makes social media a suitable medium for diusion of any information. But in social media fake, inammatory, emotional and one-sided news spread quicker than many serious, real news articles [21]. Understanding the spread mechanisms of fake news helps us to prevent them spreading more and even to detect them. The value of social media as a news source can not be ignored. Social media has a value as a quick and rst hand source of news events [32] with an increased event coverage [19]. On the other hand, due to the high noise levels and lack of validation mechanisms, extracting true news from social media is highly chal- lenging. Still, social media is commonly used as a source of news for various news web pages and it has become a prioritized source of news for especially the younger generation [17]. Besides, strong personal biases exist when it comes to the perception of news. According to [23] people tend to classify the news arti- cles that they do not agree with as fake news. Obviously these strong biases will aect what they share on social media and the spread of fake news. All of these aspects of social media put it in a very important position for fake news diusion. 4 According to [13] the spread of fake images on Twitter as a representative of a news, was mainly (%86) distributed through retweets and a few through original tweets. Social bots are another important thing to look at in order to understand the spread mechanisms of fake news. In [28] authors suggest that bots play an important role in spreading the misinformation on social networks, they are mostly active in the early phases of the spreading and they target the inuential users that has a bigger chance of spreading the false information. 3 Commonly Used Detection Techniques With the increase in attention to fake news, there is also a fast increase in the number of recent research in this topic. In this section we summarize the commonly used automatic detection techniques for fake news. 3.1 Content based methods Content based methods use content cues in order to detect fake news. Content cue implies to any kind of content related cue to detect fake news. Most of the content cues are textual cues, so we nd it useful to classify the content cues as textual and non-textual cues. A more detailed classication of cues towards clickbait detection is proposed in [11]. Content based methods include the anal- ysis of any kind of content available in the news like text, image, video or sound in combination with various machine learning techniques. Textual content Many of the existing research look for textual cues in order to detect fake news. This is also how professional journalists approach to manual fake news detection. Understanding and analyzing the text for fake elements is the most natural way that we can think of. In order to detect textual cues, lexi- cal, semantic, syntactic and pragmatic leves of analysis are usually used. Writing style analysis is one of the most commonly used techniques towards the detection of misinformation in news [20], [21]. In addition to writing style some research addresses the language use analysis. In [15] authors suggest a method that in- cludes stylistic, complexity and psychological feature analysis by using NLP and sentiment analysis. In [14] linguistic (n-gram) features, credibility features (cap- italization, punctuation etc.) and semantic features (DBpedia and embeddings) are used for analysis. Another research for detecting clickbaits proposes a lin- guistic approach with dierent classiers (e.g. SVM, Random forests etc.) with an accuracy of 93% [10]. Even though clickbait detection is not directly related with fake news, they can give us a clue about the fake news. Of course this does not mean that every clickbait headline refers to a fake news article. Some research does not include the full text analysis of news articles. In [31] authors suggest a method only by analyzing the headlines. Fake news detection on social media might be a little bit more challenging due to the diculty of reaching the original source of news and the noisy text. In [16] authors propose a method of 5 detecting fake news by exploiting the conicting information on Twitter. They suggest that opposing viewpoints can help detecting fake news. Fake News Chal- lenge 1 leverages the use of machine learning and dierent articial intelligence techniques for fake news detection. The rst part of the challenge was about stance detection in news. In [22] authors applied several neural network archi- tectures and two novel architectural variations for the stance detection of fake news challenge. Non-textual content Online news articles usually include more than just the text. Images, URLs, sound or video les are often available with the news text. Also the metadata of existing elements, web trac information, image captions can be classied as non-textual cues since they are not the direct part of the news article [11]. The analysis of such cues can give us interesting results in the detection of fake news. Image analysis is one of the most used non-textual content analysis. In [12] authors proposed an automatic image verication method for online news articles which has almost %73 of accuracy. 3.2 User based methods User based fake news detection methods depend on the idea that user behaviour can give clues about the misinformation. Any kind of user interaction analysis take place in this category. User comments which can be taken as a textual cue without the identication of belonging to a particular user. But analyzing the comments in a relation with particular users can be classied as a user based method. In [30] authors detect the hoax posts on Facebook by analyzing the users who liked them. Even though there are some research towards the unreliability of people identiying fake news [23], in [27] crowdsourcing was used to identify the fake news. 3.3 Network based methods Network based fake news detection methods include web trac analysis, web metadata analysis [11] as well as user network analysis. In [13] authors models how some tweets get viral on Twitter by analyzing the social network graph of users. In [9] authors model a social network as a directed weighted graph and then calculate the probability of nodes transmitting information to each other in order to contrast the spread of misinformation. They also mentioned the source identication problem on a social network which addresses the problem of identiying the infected nodes on a network by misinformation. Also tracking the news items to its original source can give us strong clues about the reliability of news. 1 http://fakenewschallenge.org/ 6 3.4 Hybrid methods Sometimes two or more of the methods above can be used together in order to achieve more accurate results in fake news detection. One example is [26] where authors considered the three generally agreed upon characteristics of fake news: Text, response received and source. They propose an architecture called CSI (Capture, Score, Integrate). In this architecture recurrent neural networks is used to capture news article representation, user behaviour over time is used to score users and two previous outputs are integrated and the result is used for classication. 4 Research Findings Although there is limited work on automatic fake news detection, there are some interesting preliminary results that may help us shape fake news detection systems in the future: The textual characteristics of fake news articles share many similarities with satire news. Fake and real news are substantially dierent. [15] News articles with a hyperpartisan world view was successfully distinguished from more balanced news. Fake news detection is dicult via style analysis alone. [21] A linguistic approach with dierent classiers (e.g. SVM, Random forests etc.) managed to detect clickbaits in online news with an accuracy of 93%. [10] The diusion pattern of information can be useful for detecting the hoax. [30] Opposing viewpoints can help us detecting fake news, e.g. by analyzing con- icting information on Twitter. [16] Users are quite biased when it comes to detecting fake news personally. The perception of what is fake and what is not can be very dierent from one person to another, and fake news related keywords are often used to express disagreement. [23] 5 Conclusions In this paper we give a short state of the art towards an understanding of fake news. Fake news is an important concept which may have serious real world consequences. Even though the scope of fake news diers (ex: including satire or rumors as fake news), the challenges exist for the automatic detection of mis- information for all. The diusion mechanisms of fake news is an important step towards understanding and preventing the spread of misinformation. The im- portance of social media in the spread of fake news can not be underestimated. Deeper understanding of human psychology on fake news could be helpful to de- velop tools for detection and prevention of misinformation. The existing methods 7 for automatic fake news detection are mostly based on lingusitic and machine learning techniques. In addition to these methods image analysis and crowd- sourcing methods were applied. With the increasing popularity of the term fake news, the research towards automatic detection also increases rapidly. The man- ual fact checking done by professional journalists give the researchers opportunity to understand the nature of misinformation and work more eciently towards the automatic detection of fake news. References 1. Before the internet, irresponsible journalism was blamed for a war and a presidential assassination. https://timeline.com/yellow-journalism-media-history- 8a29e4462ac. Accessed: 2017-10-15. 2. Can 'fake news' impact the stock market? https://www.forbes.com/sites/kenrapoza/2017/02/26/can-fake-news-impact- the-stock-market/. Accessed: 2017-10-15. 3. Global fact-checking sites. https://reporterslab.org/fact-checking/. Accessed: 2017-10-15. 4. The real consequences of fake news. https://theconversation.com/the-real- consequences-of-fake-news-81179. Accessed: 2017-10-15. 5. This is how facebook's fake-news writers make money. https://www.washingtonpost.com/news/the-intersect/wp/2016/11/18/this-is- how-the-internets-fake-news-writers-make-money/. Accessed: 2017-10-15. 6. Washington gunman motivated by fake news 'pizzagate' conspiracy. https://www.theguardian.com/us-news/2016/dec/05/gunman-detained-at-comet- pizza-restaurant-was-self-investigating-fake-news-reports. Accessed: 2017-10-15. 7. Yellow journalism: The fake news of the 19th century. http://publicdomainreview.org/collections/yellow-journalism-the-fake-news- of-the-19th-century/. Accessed: 2017-10-15. 8. H. Allcott and M. Gentzkow. Social media and fake news in the 2016 election. Technical report, National Bureau of Economic Research, 2017. 9. M. Amoruso, D. Anello, V. Auletta, and D. Ferraioli. Contrasting the spread of misinformation in online social networks. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 13231331. International Foundation for Autonomous Agents and Multiagent Systems, 2017. 10. A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly. Stop clickbait: Detect- ing and preventing clickbaits in online news media. In Advances in Social Networks Analysis and Mining (ASONAM), 2016 IEEE/ACM International Conference on, pages 916. IEEE, 2016. 11. Y. Chen, N. J. Conroy, and V. L. Rubin. Misleading online content: Recognizing Proceedings of the 2015 ACM on Workshop on Multi- clickbait as false news. In modal Deception Detection, pages 1519. ACM, 2015. 12. S. Elkasrawi, A. Dengel, A. Abdelsamad, and S. S. Bukhari. What you see is Document what you get? automatic image verication for online news content. In Analysis Systems (DAS), 2016 12th IAPR Workshop on, pages 114119. IEEE, 2016. 13. A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. InProceedings of the 22nd international conference on World Wide Web, pages 729736. ACM, 2013. 8 14. M. Hardalov, I. Koychev, and P. Nakov. In search of credible news. In Interna- tional Conference on Articial Intelligence: Methodology, Systems, and Applica- tions, pages 172180. Springer, 2016. 15. B. D. Horne and S. Adali. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. arXiv preprint arXiv:1703.09398, 2017. 16. Z. Jin, J. Cao, Y. Zhang, and J. Luo. News verication by exploiting conicting social viewpoints in microblogs. In AAAI, pages 29722978, 2016. 17. R. Marchi. With facebook, blogs, and fake news, teens reject journalistic objec- tivity. Journal of Communication Inquiry, 36(3):246262, 2012. 18. V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea. Automatic detection of fake news. arXiv preprint arXiv:1708.07104, 2017. 19. S. Petrovic, M. Osborne, R. McCreadie, C. Macdonald, I. Ounis, and L. Shrimpton. Can twitter replace newswire for breaking news? In ICWSM, 2013. 20. K. Popat, S. Mukherjee, J. Strötgen, and G. Weikum. Credibility assessment of textual claims on the web. Proceedings of the 25th ACM International on In Conference on Information and Knowledge Management, pages 21732178. ACM, 2016. 21. M. Potthast, J. Kiesel, K. Reinartz, J. Bevendor, and B. Stein. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638, 2017. 22. N. Rakholia and S. Bhargava. is it true?deep learning for stance detection in news. 23. M. H. Ribeiro, P. H. Calais, V. A. Almeida, and W. Meira Jr. " everything i disagree with is# fakenews": Correlating political polarization and spread of mis- information. arXiv preprint arXiv:1706.05924, 2017. 24. V. L. Rubin, Y. Chen, and N. J. Conroy. Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 52(1):14, 2015. 25. V. L. Rubin, N. J. Conroy, Y. Chen, and S. Cornwell. Fake news or truth? using satirical cues to detect potentially misleading news. In Proceedings of NAACL- HLT, pages 717, 2016. 26. N. Ruchansky, S. Seo, and Y. Liu. Csi: A hybrid deep model for fake news. arXiv preprint arXiv:1703.06959, 2017. 27. R. J. Sethi. Crowdsourcing the verication of fake news and alternative facts. 2017. 28. C. Shao, G. L. Ciampaglia, O. Varol, A. Flammini, and F. Menczer. The spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 2017. 29. K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1):2236, 2017. 30. E. Tacchini, G. Ballarin, M. L. Della Vedova, S. Moret, and L. de Alfaro. Some like it hoax: Automated fake news detection in social networks. arXiv preprint arXiv:1704.07506, 2017. 31. W. Wei and X. Wan. Learning to identify ambiguous and misleading news head- lines. arXiv preprint arXiv:1705.06031, 2017. 32. H. M. Wold, L. Vikre, J. A. Gulla, Ö. Özgöbek, and X. Su. Twitter topic modeling for breaking news detection. In WEBIST (2), pages 211218, 2016.