Visualization of Dataflows: a Casestudy of COVID-19 Rumors
Mikhail Ulizko 1,2, Evheniy Tretyakov 1,2, Rufina Tukumbetova 1,2, Alexey Artamonov 1,2 and
Mikhail Esaulov 2
1
 Plekhanov Russian University of Economics, Stremyannyy Pereulok, 36, Moscow, 115093, Russia
2
 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashira Hwy, 31,
Moscow, 115409, Russia

                Abstract
                One of the most significant and rapidly developing works in the field of data analysis is
                information flow management. Within the analysis targeted and stochastic dissemination
                patterns are studied. The solving of such problems is relevant due to the global growth in the
                amount of information and its availability for a wide range of users. The paper presents a study
                of dissemination of information messages in open networks on the example of COVID-19. The
                study was conducted with the use of visual analytics. Informational messages from the largest
                world and Russian information services, social networks and instant messengers were used as
                sources of information. Due to the large amount of information on the topic, the authors
                proposed a pattern of the wave-like dissemination of information on the example of topic
                clusters on the connection of COVID-19, hydroxychloroquine and 5G. The developed methods
                can be scaled up to analyze information events of various topics.

                Keywords 1
                graph analysis, geospatial analysis, Web-technology, COVID 19, data scrapping,
                misinformation

1. Introduction
    In the digital world, Internet traffic is growing every year [1]. According to various projections, by
the end of 2021 Internet traffic will exceed 3 zettabytes (ZB) [2, 3]. At the same time, communication
in society also passes into the virtual world [4, 5]. In particular, to obtain information about world
events, the user is increasingly using Internet media, social networks and instant messengers [6]. These
sources of information promptly react to world events. The exchange of information via the Internet
has significantly reduced the time for the delivery of information from the moment of the event to the
moment it is received by the consumer / user. However, due to the large amount of data and its
heterogeneity, such an environment has become a fertile ground for the dissemination of false
information, and in some cases it is generated in larger volumes than true information. False information
can form a false point of view, which can lead to the destabilization of society [7].
    The paper examines the following sources of information: online media, Twitter and Telegram
channels with the following characteristics:
        regularity. It means that a flow of information messages is sent regularly.
        impartiality (excluding Twitter), i.e. without value judgments and distortion of reality.
        citation (including self-citation). It means that sources can rely not only on their own materials,
    but also refer to other sources.
    According to the authors, more than 14000 publications about the coronavirus indexed in WOS and
Scopus in 2020 providing sharp rate of publication growth (1600% compared to the previous 5 years)
[8, 9]. Authors performed the bibliographic analysis of scientific papers from 2000 to 2020 by existing

GraphiCon 2021: 31st International Conference on Computer Graphics and Vision, September 27-30, 2021, Nizhny Novgorod, Russia
EMAIL: mulizko@kaf65.ru (M. Ulizko); etreyakov@kaf65.ru (E. Tretyakov); rrtukumbetova@kaf65.ru (R. Tukumbetova);
aartamonov@kaf65.ru (A. Artamonov); mnesaulov@mephi.ru (M. Esaulov)
ORCID: 0000-0003-2608-8330 (M. Ulizko); 0000-0002-1051-8562 (E. Tretyakov); 0000-0002-1976-1390 (R. Tukumbetova); 0000-0002-
9140-5526 (A. Artamonov); 0000-0002-3062-8005 (M. Esaulov)
             ©️ 2021 Copyright for this paper by its authors.
             Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
visualization tools (for instance, VOSviewer [10, 11]) and noted the most significant terms in this field
and emphasized international collaboration as a key for dealing with the COVID-19. Results shows
three main clusters for contributing papers: The USA, China and European countries. Meanwhile Belli
et al [8] highlights the importance of open science: “Open science is the best method because it is an
approach based on collaborative work, openness, and transparency in all stages including not only
publication, but also data collection, peer review, and assessment.”
    Despite the existence of wide variety of software for monitoring coronavirus, a number of authors
have proposed new visualization tools [12, 13, 14, 15]. Martínez Beltra ń et al. [12] describes developed
“web-based, user-friendly dashboard for interactive plotting” which is able to analyze both coronavirus
and related data, such as maximum temperature, grocery and farmacy, etc., mostly in Spain. As
Marcílio-Jr et al. [13] writes: “Visualization-based strategies for monitoring the dissemination of
diseases account for the fact that graphical representations can enhance the ability to identify data
patterns and tendencies”. So, authors proposed tool for drawing graphics, pie-charts, and especially
maps to monitor the evolution of dissemination of coronavirus in Sao Paulo state. Another geospatial
techniques are proposed by Mast et al. in their study [14]. Their analysis for the USA allows to reveal
vaccinating distribution in two directions: over the time and over the place (cities, states). And finally,
Chintala et al. develop software for workflow-based analysis over the world [15]. Their Python’s
application provides dynamic maps on cumulative daily confirmed COVID-19 cases for different
countries.
    All the papers below deal with coronavirus data rather than myths and rumors. Actually, there is
much fewer publications which consider dissemination true of false information on COVID-19 around
the world via the Internet or mass-media. For instance, one study on the topic shows the statistical
analyses to separate myths and facts [16]. Authors illustrates several graphs based on 13-item
questionnaire of 125 participants. Pang et al. describes it how governmental social media was used
during COVID-19 pandemic [17]. Authors carry out word frequency and contents analyses for Macao
governmental social media in Facebook and reveal that it can useful to control rumors dissemination.
Also Sond et al. (2021) show different types of rumors and how they can be corrected [18]. They
perform their analysis of the data on Sina Weibo, the most popular microblogging site in China. Last
but not least papers solve the issue of analyzing misinformation to some extent [19, 20, 21],
emphasizing methods of text analysis rather than visualization tools.
    It ought to be noted that researchers rarely use visual tools for analysis in their work, except when
they are looking at data within their own country. This paper tries to combine approaches from
mentioned articles and apply them to main rumors over the world.
    This paper presents methods of data processing and data visualization, which is conducted with the
use of web technologies for building graphs and plotting data on the globe. The methods are examined
and applied on the example of rumors about COVID-19. The described methods can be applied to
analyze any data with a similar structure.

2. Methodology
    With advances in information technology, researchers in various fields began to pay attention to the
analysis of streaming data, which refers to data that is generated continuously from various sources.
Also, as in the case of static data, the process of their processing can be represented as follows (Figure
1). In this paper, visualization is used for performing an analysis and showing the results.


Figure 1: The process of collecting, processing and analyzing data
   The paper concerns the following data sources: the world's leading media (CNN, The New York
Times, The Verge, etc.), instant messengers (Telegram) and social networks (Twitter). The object of
the study is rumors about COVID-19, which are fake information messages about COVID-19. The data
analysis process includes two parts:
       dissemination of rumors and their contradictions between information sources;
       consideration of facts of various degrees of credibility between countries.

2.1. Dissemination of rumors and their contradictions between information
sources
    Collecting data about rumors related to coronavirus disease comes down to the aggregation of
information messages from the mentioned media, instant messengers and social networks using agent
technologies [22]. The information message model is based on the collected data. The model consists
of the following fields:
        Id;
        Source URL;
        Title of the information message;
        Text of the information message;
        Links to other information sources;
        Date (and time) of posting.
    The main feature of such representation is that the objects of this model are connected to each other
through the "Links" field. To display such relationships, it is proper to use graph representations of data
[23].
    Thus, data is represented in the form of a dynamically weighted directed graph, the nodes of which
are information sources, edges are information messages, in which one source refers to another. Since
information messages differ from each other by the time of posting, the graph is dynamic, that is, it can
change its state over time, i.e. new nodes and edges appear over time (eg. the graph is rebuilt at those
moments when the information source posts a rumor). The weighting of the graph is applied to both
edges and nodes: the weight of a node corresponds to the number of links to it, and the weight of an
edge corresponds to the number of links between its initial and final nodes (all information messages
are considered to be equal).
    The graph consists of two types of edges (color legend of edges) since in the final samples an
information message can be either a fake message or a contradiction to a fake message. To determine
the type of edge, the text of an information message is analyzed. It is compared with the thesaurus of
words, which have a meaning of contradiction, to calculate the value of the criterion. If the value of the
criterion exceeds a certain threshold, then the message is recognized as a contradiction.

2.2. Dissemination of facts of varying degrees of credibility between
countries
    The CoronaVirusFacts/DatosCoronaVirus Alliance Database from the Poynter information resource
was used as a data source [24]. The database includes verified facts in over 70 countries and articles
published in at least 40 languages. Since the database records are assessments of information messages
stored on different web pages, the entire database was collected using scraping. Only records with one
of the following credibility levels (50 degrees in total) were selected:
        False;
        Misleading;
        Mostly false;
        No evidence;
        Partially false.
    After collecting the data, preliminary data processing was carried out: unification of the countries to
which the record belongs, highlighting the date, collecting the link to the primary source of information.
To visualize the dissemination of rumors between countries, a model is built, which consists of the
following fields (Figure 2):
       category/title of rumor;
       a description of information message;
       a country in which the rumor was spread;
       a source of a message (media, Facebook, etc.);
       a degree of credibility.


Figure 2: An example of the record

3. Visualization and data analysis
3.1. Dissemination of rumors and their contradictions between information
sources
   It is customary to build a graph with the help of such tools as Gephi, igraph, LargeViz, etc., or using
additional libraries for programming languages. For dynamic and interactive work with graphs 3D
Force-Directed Graph [25], a component for visualizing graphs was chosen, which uses
ThreeJS/WebGL for rendering and force-directed graph drawing algorithms to build a graph.
   To analyze information sources, the data is classified into separate rumors. The paper considers
information messages on the topics '5G' and 'hydroxychloroquine'. The graph with '5G' rumors is
represented below (Figure 3).
   The graph was built according to the following rules:
         nodes of the graph are information sources;
         if two nodes have an edge, which connects them, it means that the two sources are
   interconnected by an information message, in which one information source refers to another
   (henceforth such nodes will be called adjacent or connected);
         the graph is oriented. The direction of an edge is shown by the motion of the ball along the edge
   from the source of the information message to the link;
         a size of a node is proportional to the number of posts associated with this information source
   (both as a source and as a link);
         the graph is displayed using the force-directed graph drawing algorithm;
         an edge can belong to one of the two types: without contradiction (green edges) and with
   refutation (red edges);
         the nodes are color-coded: without selecting an active node, nodes have colors according to the
   color bar (Figure 4); when an active node is selected, it and all nodes adjacent to it become orange.
Figure 3: Graph web interface


Figure 4: Color bar
    The interface of the constructed web application makes it possible to analyze rumors on a selected
topic by selecting individual nodes, choosing a time interval and considering information messages
between two nodes connected by an edge.
    This approach allows us to identify the primary sources of the dissemination of false information
and the main distribution nodes. For instance, the rumor about chloroquine shows that “Global Banking
& Finance Review” released has released only rumors with refutations (Figure 5). On the other hand,
the figure (Figure 6) represents the possibilities of the graph. According to the graph (Figure 5a),
“MarketBeat” was the first resource which published information about connection between 5G and
coronavirus, but with time its contribution to the total share of messages decreased. Also, analysis of
the data using the graph revealed that most often information sources refer to Twitter (as evidenced by
the maximum total input flow and the location of the node in the center of the graph), followed by
“Verizon Media”. The graph also shows that the most active source of informational messages is
"Yahoo! Finance", “The Conversation”, etc. For instance, in this time interval “The Conversation”
released most of the messages in May and then reduced the intensity.


Figure 5: Rumor "chloroquine"
Figure 6: Graph analysis

3.2. Dissemination of facts of varying degrees of credibility between
countries
   To analyze the dissemination of facts of various degrees of credibility between countries, a proper
method is to plot them on the world globe. The authors used the Globe.GL web component [26], which
enables applying data visualization layers to a three-dimensional globe. The final form of the web
application is demonstrated in Figure 7.
    The globe has an intuitive interface: the fewer information messages related to the country, the
darker the country looks and the closer it is to the globe; and vice versa: the more information messages
related to the country, the more reddish the country and the higher it is raised above the globe. The web
interface enables analyzing data by creating a query. It may include:
        a word or phrase;
        a type of rumors (7 main rumors and all the rest);
        a country to which a rumor belongs;
        a source of an information message;
        a degree of credibility of a rumor;
        time range.
    The search query tool together with the user-friendly interface enables carrying out sufficient
research. The analysis revealed that most of the rumors about the coronavirus came from India and the
United States, while information is spreading in neighboring countries.
    This approach allows us to identify the hidden relationships between countries. For example, when
analyzing rumors connected with the United States, it can be seen that there are similar rumors in Spain
(35 common rumors), France (20 common rumors), Canada (12 common rumors) and Ukraine (9
common rumors). This suggests that there are close connections between these countries.


Figure 7: Globus web interface

4. Conclusion
    Even though various objects from physical installations to social networks and Internet media can
be a source of streaming data, the analysis of such sources can be carried out using similar methods and
tools. The paper considers the task of visualizing the dissemination of rumors about coronavirus disease
in online media and between countries. The key feature of this study is the statistical analysis of the
dynamic system, so with the help of visualization, the spread of information and assess the intensity of
information dissemination both locally in the country and around the world is explicitly shown.
    A dynamic graph was built to analyze the dissemination of rumors on 5G and hydroxychloroquine
in the world media. The graph enables identifying the most "important" nodes (information sources that
produce a lot of information messages), considering the process of dissemination rumors over time and
visually determine a cluster structure of objects.
    To analyze the global situation about coronavirus disease according to rumors, a globe with a search
engine was built, which demonstrates the spread of rumors between countries. With the help of this
application, the countries in which the rumors appeared most often and their influence on the rest of the
world have been identified.
    According to the authors, the obtained data model together with the described tools may serve as a
good basis for analyzing streaming data of various nature. Further research will be devoted to improving
the accuracy of the model due to natural language processing and expanding the possibilities for
statistical analysis.

5. References
[1] Cisco.Com, Cisco Annual Internet Report (2018–2023) White Paper, 2018. URL:
     https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-
     report/white-paper-c11-741490.html.
[2] Cisco.Com,          VNI         Complete         Forecast      Highlights,        2017.     URL:
     https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-
     highlights/pdf/Global_2022_Forecast_Highlights.pdf.
[3] The       World       Bank,     World       Development      Report     2021,       2021.   URL:
     https://wdr2021.worldbank.org/stories/crossing-borders.
[4] W.-L. Shiau, Y.K. Dwivedi, H.S. Yang, Co-citation and cluster analyses of extant literature on
     social networks, International Journal of Information Management 37(5) (2017) 390-399. doi:
     10.1016/j.ijinfomgt.2017.04.007.
[5] W.-L. Shiau, Y.K. Dwivedi, H.-H. Lai, Examining the core knowledge on facebook, International
     Journal of Information Management 43 (2018) 52-63. doi: 10.1016/j.ijinfomgt.2018.06.006
[6] S. Athey, M. Mobius, J. Pal, (2017) “The Impact of Aggregators on Internet News Consumption.”
     Stanford University Graduate School of Business Research Paper No. 17-8.
[7] A. Gruzd, M. De Domenico, P.L. Sacco, S. Briand, Studying the COVID-19 infodemic at scale,
     Big Data and Society 8(1). doi: 10.1177/20539517211021115.
[8] S. Belli, R. Mugnaini, J. Baltà, E. Abadal, Coronavirus mapping in scientific publications: When
     science advances rapidly and collectively, is access to this knowledge open to society?
     Scientometrics 124(3) (2020) 2661-2685.
[9] J.K. Pal, Visualizing the knowledge outburst in global research on COVID-19. Scientometrics 126
     (2021) 4173–4193.
[10] A.Kh. Khakimova, O.V. Zolotarev, M.A. Berberova, Coronavirus infection study: Bibliometric
     analysis of publications on COVID-19 using PubMed and Dimensions databases, Scientific
     Visualization 12(5) 112-129. doi: 10.26583/SV.12.5.10.
[11] VOSViewer.Com, VOSViewer, 2021. URL: https://www.vosviewer.com.
[12] E.T. Martínez Beltrán, M. Quiles Pérez, Pastor-Galindo et al. COnVIDa: COVID-19
     multidisciplinary data collection and dashboard. Journal of Biomedical Informatics 117 (2021)
     103760.
[13] Marcílio-Jr, W.E., Eler, D.M., Garcia, R.E., Correia, R.C.M., Rodrigues, R.M.B. Visual analytics
     of COVID-19 dissemination in São Paulo state, Brazil. Journal of Biomedical Informatics 117
     (2021) 103753.
[14] T.C. Mast, D. Heyman, E. Dasbach, et al. (2021) Planning for monitoring the introduction and
     effectiveness of new vaccines using real-word data and geospatial visualization: An example using
     rotavirus vaccines with potential application to SARS-CoV-2. Vaccine: X, 7, 100084.
[15] S. Chintala, R. Dutta, D. Tadmor, COVID-19 spatiotemporal research with workflow-based data
     analysis, Infection, Genetics and Evolution 88 (2021) 104701.
[16] K. Konar, N. Kabli, A statistical analysis on Covid-2019 to distinguish between myths and facts
     with data visualization. IOP Conference Series: Materials Science and Engineering 1022(1) (2021)
     012043.
[17] P.C.-I. Pang, Q. Cai, W. Jiang, K.S. Chan, Engagement of government social media on facebook
     during the COVID-19 pandemic in Macao. International Journal of Environmental Research and
     Public Health 18(7) (2021) 3508.
[18] Song, Y., Kwon, K.H., Lu, Y., Fan, Y., Li, B. The “Parallel Pandemic” in the Context of China:
     The Spread of Rumors and Rumor-Corrections During COVID-19 in Chinese Social Media.
     American Behavioral Scientist (2021).
[19] G.K. Shahi, D. Nandini, FakeCovid--A multilingual cross-domain fact check news dataset for
     COVID-19, arXiv preprint arXiv:2006.11343. doi: 10.36190/2020.14.
[20] P. Patwa, S. Sharma, S. Pykl, et al., Fighting an Infodemic: COVID-19 Fake News Dataset,
     Communications in Computer and Information Science 1402, 21-29. doi: 10.1007/978-3-030-
     73696-5_3.
[21] P. Mookdarsanit, L. Mookdarsanit, The covid-19 fake news detection in thai social texts, Bulletin
     of Electrical Engineering and Informatics 10(2), 988-998. doi: 10.11591/eei.v10i2.2745.
[22] M. Ulizko, L. Pronicheva, A. Artamonov, R. Tukumbetova, E. Tretyakov, Complex Objects
     Identification and Analysis Mechanisms, Advances in Intelligent Systems and Computing 1310,
     517-526. doi: 10.1007/978-3-030-65596-9_63.
[23] M.S. Ulizko, E.V. Antonov, A.A. Artamonov, R.R. Tukumbetova, Visualization of graph-based
     representations for analyzing related multidimensional objects, Scientific Visualization 12(4) 133-
     142. doi: 10.26583/sv.12.4.12.
[24] Poynter.Org, The CoronaVirusFacts/DatosCoronaVirus Alliance Database, 2020. URL:
     https://www.poynter.org/ifcn-covid-19-misinformation.
[25] Github.Com, 3D Force-Directed Graph, 2020. URL: https://github.com/vasturiano/3d-force-
     graph.
[26] Github.Com, Globe.GL, 2020. URL: https://github.com/vasturiano/globe.gl.