=Paper=
{{Paper
|id=Vol-2482/paper42
|storemode=property
|title=Characterizing the Public Perception of WhatsApp through the Lens of Media
|pdfUrl=https://ceur-ws.org/Vol-2482/paper42.pdf
|volume=Vol-2482
|authors=Josemar Alves Caetano,Gabriel Magno,Evandro Cunha,Wagner Meira Jr.,Humberto T. Marques-Neto,Virgilio Almeida
|dblpUrl=https://dblp.org/rec/conf/cikm/CaetanoMCMMA18
}}
==Characterizing the Public Perception of WhatsApp through the Lens of Media==
Characterizing the public perception of WhatsApp through the lens of media Josemar Alves Caetano1 , Gabriel Magno1 , Evandro Cunha1,2 ,Wagner Meira Jr.1 , Humberto T. Marques-Neto3 , Virgilio Almeida1,4 {josemarcaetano, magno, evandrocunha, meira}@dcc.ufmg.br, humberto@pucminas.br, virgilio@dcc.ufmg.br 1 Dept. of Computer Science, Universidade Federal de Minas Gerais (UFMG), Brazil 2 Leiden University Centre for Linguistics (LUCL), The Netherlands 3 Dept. of Computer Science, Pontifı́cia Universidade Católica de Minas Gerais (PUC Minas), Brazil 4 Berkman Klein Center for Internet & Society, Harvard University, USA 1 Introduction The messaging service WhatsApp is, as of 2018, one Abstract of the most rapidly growing components of the global information and communication infrastructure, count- ing with 1.5 billion users who send around 60 billion WhatsApp is, as of 2018, a significant com- messages per day [Con18]. This tool combines one-to- ponent of the global information and commu- one, one-to-many and group communication by offer- nication infrastructure, especially in develop- ing private chats, broadcasts and public group chats, ing countries. However, probably due to its through which users are able to send text and media strong end-to-end encryption, WhatsApp be- (audio, image and video), as well as files in various came an attractive place for the dissemina- formats. tion of misinformation, extremism and other According to data published by Statista [Sta18], forms of undesirable behavior. In this pa- more than half of the population of Saudi Arabia, per, we investigate the public perception of Malaysia, Germany, Brazil, Mexico and Turkey were WhatsApp through the lens of media. We an- active WhatsApp users in 2017. Also, the Reuters alyze two large datasets of news and show the Institute Digital News Report 2018 [NFK+ 18] shows kind of content that is being associated with a rise in the use of messaging applications, including WhatsApp in different regions of the world WhatsApp, as sources of news in several parts of the and over time. Our analyses include the ex- world. This report indicates that WhatsApp use for amination of named entities, general vocabu- news has almost tripled since 2014 and it has surpassed lary and topics addressed in news articles that Twitter as a communication system in many countries. mention WhatsApp, as well as the polarity of One of the alleged reasons for this is that users are these texts. Among other results, we demon- looking for more private and secure spaces to com- strate that the vocabulary and topics around municate. In addition to this, WhatsApp turned out the term “whatsapp” in the media have been to be an important platform for political propaganda changing over the years and in 2018 concen- and election campaigns, having held a central role in trate on matters related to misinformation, elections in Brazil, India [Goe18], Kenya, Malaysia, politics and criminal scams. More generally, Mexico and Zimbabwe, for instance. Also, WhatsApp our findings are useful to understand the im- has been frequently associated with the spread of mis- pact that tools like WhatsApp play in the con- information and disinformation [Wat18]. temporary society and how they are seen by Despite its prominence, continued growth and opac- the communities themselves. ity, there has been an insufficient number of studies exploring the various aspects of WhatsApp and sim- Copyright © CIKM 2018 for the individual papers by the papers' ilar mobile messaging applications [GWCG18]. Since authors. Copyright © CIKM 2018 for the volume as a collection WhatsApp provides encrypted end-to-end communi- by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). cation, it is a great challenge to conduct large-scale and cultural trends through quantitative analyses of analyses on the behavior of its users. In this work, texts, using sources like large collections of digitized we take a different approach: instead of looking at books. Several studies explore this method to in- inside the system, we focus on the public perception vestigate topics such as the dynamics of birth and of WhatsApp from outside sources. The goals of this death of words [PTHS12], semantic change [GB11], paper are: emotions in literary texts [ALGB13] and character- istics of modern societies [Rot14]. Some works pro- • to characterize how media in different countries pose a complementary approach to culturomics by us- interpret the role of WhatsApp in society; ing historical news data [Lee11], analyzing European news media [FTA+ 10] or the writing style and gen- • to analyze the evolution of the perception of der bias of particular topics in large corpora of news WhatsApp over time, from its creation until its articles [FALW+ 13]. Other works concentrate in spe- massive popularization; cific events in history, such as the Fukushima nuclear • to comprehend how sensitive topics, such as disaster [LWSVC14], by using large datasets of media politics, crime and extremism, are related to reports to understand aspects such as how the media WhatsApp in different regions of the world and polarity towards a topic changes over time. in distinct periods of time. Employing methods similar to the ones presented here, [CMC+ 18] investigate the perception and the To achieve these goals, we explore different techniques: conceptualization of the term “fake news” in the me- analysis of Web search behavior, co-occurring named dia, showing that contextual changes around this ex- entities and vocabulary, co-occurrence networks, top- pression might be observed after the United States ics addressed and textual polarity. According to our presidential election of 2016. However, as far as we are understanding, each of these methods is able to pro- concerned, this is the first work that uses these meth- vide additional information about the perception of ods to examine in detail how the term “whatsapp” is WhatsApp in the news articles investigated. As a being reported by news media in different parts of the whole, our results indicate that the media has sig- world, making us able to analyze how important top- nificantly changed its perception and portrayal of ics, such as misinformation, manipulation and extrem- WhatsApp: while in the period before 2013 the focus ism, might be associated with WhatsApp by societies. of the news was on WhatsApp features, in the follow- ing years the tool started to be more associated with On WhatsApp social issues, including the dissemination of misinfor- mation. Despite the increasing use of WhatsApp in the world, This paper is organized as follows: in Section 2, we few quantitative and large-scale studies about this review a selection of works on WhatsApp and, more instant messaging application are currently avail- generally, on the use of textual datasets to under- able. [GT18] propose a data collection methodology for stand social phenomena; in Section 3, we describe our this application and perform a statistical exploration methodology of data collection and the overall char- to indicate how data from WhatsApp public groups acterization of the datasets used in this investigation; can be collected and analyzed. Also, [MGB17] collect next, in Section 4, we characterize the vocabulary, an- WhatsApp messages to monitor critical events during alyze the topics addressed and evaluate the polarity of Ghana’s 2016 presidential election, and [CdO13] an- the news articles contained in our datasets; finally, in alyze differences between WhatsApp and SMS mes- Section 5, we conclude the paper and present future saging system using a large-scale survey. [FCSD15] directions of work. investigate Facebook and WhatsApp traces collected from an European national wide mobile network and 2 Related Work characterize the usage of both applications. The work of [SHS+ 16] surveys users to investigate the usage of On the use of textual datasets to understand social WhatsApp groups and, more specifically, its implica- phenomena tions for mobile network traffic, while [RSS+ 18] collect Analyzing how a term is used over time and in a personal information and messages from one hundred geographic location is important to help in the un- WhatsApp users with the aim of understanding their derstanding of how cultural values, societal issues usage patterns. and customs are perceived by society and expressed All of these works investigate a limited part of through language [Cam13, Mat53]. Culturomics, for WhatsApp, therefore offering a restricted understand- example, is a concept proposed by [MSA+ 11] refer- ing of how this application is used. Nevertheless, here ring to a method for the study of human behavior we study this tool using large datasets of external data provided by news articles containing the term indicates the most common associated terms and the “whatsapp” in different regions of the world and cov- countries from which the highest volume of searches ering the whole WhatsApp history, thus shedding light are originated from. It is also possible to filter these not exactly on its usage, but on how it is viewed from results for given periods. For our investigations, we outside sources. collected data from searches made between 2010 and 2018, and use this information in Section 4.1. 3 Data Collection 4 Analyses and Results We use two large datasets of news articles in this study. The first one is a collection of texts from the Corpus In this section, we discuss the outcomes of differ- of News on the Web (NOW Corpus), which contains ent analyses aimed to understand the perception of articles from online newspapers and magazines writ- WhatsApp in the media. Each characterization is in- ten in English in 20 different countries from 2010 to troduced by a description of how it may contribute the present time [Dav13]. This corpus is available for to accomplish our goals, followed by the methodology download and online exploration1 and, according to employed and, finally, by a presentation and discussion its author, it is, at the moment of our data collection, of the results found. the largest corpus available in full-text format. In 31 May 2018, we gathered all the news articles containing 4.1 Web search behavior the 33,185 occurrences of the term “whatsapp” in the Before analyzing the public perception of WhatsApp NOW Corpus. These news articles cover every year through the lens of news articles from different regions in the corpus (from 2010 to 2018) and comprise all of the world, we investigate whether it is possible to 20 countries represented. These countries were then observe a change in the Web search behavior regard- grouped into six regions based on their geographic ing the term “whatsapp” through time. We use data locations (Africa, British Isles, Indian subcontinent, collected from Google Trends to perform this analysis. Oceania, Southeast Asia and the Americas). Our results show that, unsurprisingly, the number Our second dataset includes articles collected from of queries on the Google Search engine for the term Brazilian online newspapers and magazines, all written “whatsapp” is constantly growing since the release of in Portuguese, also containing the term “whatsapp”. this tool for Android devices in 2010, as indicated in We searched for articles starting from 2010, but did Figure 1. Also, Table 2 lists the five most frequent not find any from 2010 and 2011 containing the term search terms employed by users who also searched for “whatsapp”, so our second dataset contains news from “whatsapp” from 2010 to 2018. Here, we notice a shift 2012 to 2018. To build this dataset, we used the tool in the related terms through the years: in the first two Selenium2 to automate Web searches with the term years, most of the words are concerned with the down- “whatsapp” in the following ten major Brazilian news load of the app (“download”, “descargar”) and de- websites: Exame, Folha de S. Paulo, Gazeta do Povo, vice compatibility (“blackberry”, “iphone”, “nokia”); G1, O Estado de S. Paulo, R7, Terra, Universo On- then, from 2012 onwards, queries for “whatsapp” start line (UOL), Valor Econômico and Veja. The total to be linked to different topics, especially features of number of occurrences of “whatsapp” extracted from the tool (“status unavailable”, “whatsapp encryption”, these websites on 31 May 2018 is 4,047. Finally, we “video status download”), but also content shared in used the Python library newspaper3 to collect the full WhatsApp (“imagens para whatsapp”, “el negro del texts of these news articles. whatsapp”). In Sections 4.2 to 4.6, we analyze the news texts from the two previously described datasets. Table 1 shows the number of news containing the term 4.2 Co-occurring named entities “whatsapp” in our two datasets, according to the geo- In natural language processing, named entity recog- graphical origin of the corresponding news media and nition is the task of extracting mentions of named the year of publication of the news article. entities – that is, definite noun phrases referring to In addition to these datasets, we also collected data individuals, organizations, dates, locations – in a from Google Trends4 , an online tool that indicates the text [BLK09]. We here extract the most mentioned frequency of particular terms in the total volume of named entities in our NOW Corpus dataset for each searches in the Google Search engine. This tool also region and year of publication of the articles in order 1 https://corpus.byu.edu/now/ to understand who are the main actors related to the 2 https://www.seleniumhq.org/ tool WhatsApp according to the media. In this paper, 3 https://pypi.org/project/newspaper/ the co-occurrence is computed on a document level, 4 https://trends.google.com/trends/ so we consider all the entities that are mentioned in Table 1: (a) Number of news articles containing the term “whatsapp” in our NOW Corpus dataset according to the geographical origin of the corresponding news media; (b) Number of news articles containing the term “whatsapp” in both NOW Corpus and Brazilian news articles datasets according to the year of publication. (a) Geographical origin of news articles in our NOW Corpus dataset Region Country Occurrences United States 1,244 The Ameri- Canada 507 cas Jamaica 151 Total: 5.73% / 1,902 Singapore 2,889 Southeast Malaysia 2,578 Asia Philippines 253 Hong Kong 124 Total: 17.61% / 5,844 Great Britain 2,251 British Isles Ireland 2,152 Total: 13.27% / 4,403 Region Country Occurrences South Africa 5,274 Nigeria 1,607 Africa Kenya 1,585 Ghana 754 Tanzania 3 Total: 27.79% / 9,223 Australia 895 Oceania New Zealand 306 Total: 3.62% / 1,201 India 8,991 Indian subconti- Pakistan 1,353 nent Sri Lanka 186 Bangladesh 82 Total: 31.98% / 10,612 (b) Year of publication of news articles in both NOW Corpus and Brazilian news articles datasets Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 Total Occurrences in NOW Corpus 4 41 145 393 1,101 1,642 7,266 11,677 14,636 33,185 Occurrences in Brazilian news articles 0 0 4 91 427 785 904 888 948 4,047 our news articles as co-occurring with the key-term ties accompanying the term “whatsapp” are usually “whatsapp”. other social media companies (“Facebook”, “Twit- To perform the named entity recognition, we use the ter”), countries (“US”, “India”), cities (“Dublin”, Natural Language Toolkit (NLTK)5 classifier trained “Delhi”) and demonyms (“African”, “Australian”). to recognize named entities. Since this tool does not When we analyze the continuation of the lists (not support texts in Portuguese, we do not include the displayed here due to space constraints), we also find dataset containing the Brazilian news articles in this that US-American individuals like Mark Zuckerberg analysis. and Donald Trump are highly mentioned across the Table 3 lists the ten most mentioned entities in globe. However, local entities are also mentioned in each different region considered in this investigation. their respective regions: among the entities not dis- Overall, we observe that the most mentioned enti- played in the table, the most mentioned persons or 5 http://www.nltk.org/ organized groups in each region are Mark Zuckerberg Table 3: Most mentioned named entities in each region (the entity “whatsapp” is excluded from the lists) Region Entities 1 R U P D O L ] H G 9 R O X P H R I 6 H D U F K H V Facebook, Google, US, The Americas Twitter, Instagram, Apple, Android, American, Europe, China Facebook, Malaysia, India, Southeast Asia Singapore, US, Malaysian, Google, Indian, China, Chinese Facebook, Ireland, US, British Isles London, Irish, Google, British, Android, Dublin, Twitter <