=Paper= {{Paper |id=Vol-2829/paper4 |storemode=property |title=Using the Profile of Publishers to Predict Barriers across News Articles |pdfUrl=https://ceur-ws.org/Vol-2829/paper4.pdf |volume=Vol-2829 |authors=Abdul Sittar,Dunja Mladenić |dblpUrl=https://dblp.org/rec/conf/www/SittarM21 }} ==Using the Profile of Publishers to Predict Barriers across News Articles== https://ceur-ws.org/Vol-2829/paper4.pdf
        Using the profile of publishers to predict
              barriers across news articles

Abdul Sittar1,2[0000−0003−0280−9594] and Dunja Mladenić1,2[0000−0002−0360−6505]
                             1
                                Jožef Stefan Institute, Slovenia,
              2
                  Jožef Stefan International Postgraduate School, Slovenia,
                                        Jamova cesta 39
                         {abdul.sittar, dunja.mladenic}@ijs.si



        Abstract. Detection of news propagation barriers, being economical,
        cultural, political, time zonal, or geographical, is still an open research
        issue. We present an approach to barrier detection in news spreading
        by utilizing Wikipedia-concepts and metadata associated with each bar-
        rier. Solving this problem can not only convey the information about the
        coverage of an event but it can also show whether an event has been
        able to cross a specific barrier or not. Experimental results on IPoNews
        dataset (dataset for information spreading over the news) reveals that
        simple classification models are able to detect barriers with high accu-
        racy. We believe that our approach can serve to provide useful insights
        which pave the way for the future development of a system for predicting
        information spreading barriers over the news.

        Keywords: news propagation · news spreading barriers · cultural bar-
        rier · economical barriers · geographical barrier · political barrier · time
        zone barrier · classification methods


1     Introduction
The phenomenon of event-centric news spreading due to globalization has been
exposed internationally [8]. International events capture attention from all cor-
ners of the world. News agencies play their part to bring our attentions on some
events and not on others. Varying nature of living styles, cultures, economic con-
ditions, time zone, and geographical juxtaposition of countries present a signifi-
cant role in process of publishing news related to different events [3,6,13,19–21].
For example, publishing about sports events could be dependent on culture, epi-
demic events can reach firstly to neighboring countries due to geographic prox-
imity and, news on a luxury product may be relevant for economically strong
countries due to demand of wealthy people. We represent this differentiation
along with different barriers. These barriers include but are not limited to 1)
Economic Barrier, 2) Cultural Barrier, 3) Political Barrier, 4) Geographical Bar-
rier, and 5) Time Zone Barrier. Detection of the overpass of these barriers does
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2        A. Sittar et al.

not only tell us the area where the broadcasting of an event reached, but it also
shows us events-location relation as countries have different culture, economic
conditions, geographical placement on the globe, political point of view, and
time zone. Following are the definitions of news crossing these barriers:
Cultural Barrier. If we identify the coverage of specific event-centric news by
publishers that are surrounded by different cultures, then we can say that the
news related to the event crossed cultural barriers.
Political Barrier. If news about a specific event is disseminated from publishers
having different political alignment, we can say that the news related to that
event crossed the political barrier.
Geographical Barrier. We say that some news related to a specific event
overpasses geographical barriers if that event gets attention by publishers of
countries located in different geographical regions.
Time Zone Barrier. We can claim that event-centric news has crossed the
time zone barrier if it has been published by publishers located in different time
zones.
Economic Barrier. It can be asserted that a piece of event-centric news has
crossed economic barriers if it is published in countries having different economic
conditions.
    In this paper, we propose a methodology for detection of different barriers
during information propagation in form of news that utilize data (IPoNews) [18]
related to three contrasting events (earthquake, Global warming, and FIFA world
cup) in different domains (natural disasters, climate changes, and sports) in 5
different languages: English, Slovene, Portuguese, German, and Spanish.


1.1     Contributions

Following are the main scientific contributions of this paper:

    – A novel methodology for barrier detection in news spreading.
    – Experimental comparison of several simple classification models that can
      serve as a baseline.


1.2     Problem Statement

Observing the spreading of news on a particular event over time, we want to
predict whether a barrier (cultural, political, geographical, time zone, economi-
cal) is likely to hamper information while information propagates over the news
(binary classification).


2      RELATED WORK

Multiple barriers come across event-centric news specifically when the news is
concerned about international or national events. According to news flow theo-
ries, multiple determinants impact international news spreading. The economic
         Using the profile of publishers to predict barriers across news articles   3

power of a country is one of the factors that influence news spreading. Moreover,
economic variations has different influence for different events (e.g. protests, con-
flicts, disasters) [15]. The magnitude of economic interactivity between countries
can also impact the news flow [21]. Economic growth/income level shows the eco-
nomic condition of a country. Multiple organizations are working on generating
prosperity and welfare index on yearly basis. Among them, “The Legatum Pros-
perity Index” and “Human Development Index” are popular 1 , 2 . Geographical
representation of entities and events has been utilized extensively in the past
to detect local, global, and critical events [3, 13, 19, 20]. It has been said that
countries with close distance share culture and language up to a certain extent
which can further unfold interesting facts about shared tendencies in informa-
tion spreading [15, 16].

    News agencies tend to follow the national context in which journalists op-
erate. One of the related examples is the SARS epidemic study which found
that cross-national contextual values such as political and economic situations
impact the news selection [5]. It will be true to say that fake news is produced
based on many factors and it is surrounded by a paramount factor that is polit-
ical effect [11]. A great amount of work regarding fake news dwells on different
strategies and few studies considered political alignment to have a compelling
effect on news spreading [4, 12]. [12] strongly proved it to be a major strategy
in news agencies to control the news and change accordingly due to the involve-
ment of journalists and political actors. Countries that share common culture
are expected to have heavier news flow about between them reporting on similar
events [21]. Many quantitative studies found demographic, psychological, socio-
cultural, source, system, and content-related aspects [1]. Many models have tried
to explain cultural differences between societies. Hofstede’s national culture di-
mensions (HNCD) has been widely used and cited in different disciplines [7, 9].

    News classification for different kinds of problems is a well-known topic since
the past and features used to classify varies depending upon the problem. [17]
used news content and user profile to classify the news whether it is fake or
not. [2] calculated TF-IDF score and Word2Vec score of most frequent words
and used them as features to classify into one of the five categories (state, econ-
omy, entertainment, international, and sports). Similarly, [14] performed part-
of-speech (POS) tagging at sentences level and used them as features, and built
supervised learning classifiers to classify news articles based on their location.
Mostly classifier trained to utilize popular supervised learning methods such as
Random Forest, Support Vector Machine (SVM), Naive Bayes, k-Nearest Neigh-
bour (kNN), and Decision Tree. In this work, we used the profile of each barrier
for each news publisher (see section 3.5) and most frequent 300 Wikipedia con-
cepts from the dataset that appeared in the list of news articles related to three
contrasting events (earthquake, Global Warming, and FIFA world cup). We also
1
    http://hdr.undp.org/en/content/human-development-index-hdi
2
    https://www.prosperity.com/
4            A. Sittar et al.

compared the results of popular classifiers such as SVM, Random Forest, Deci-
sion Tree, Naive Bayes, and kNN (see Section 5.4).


3       DATA DESCRIPTION

3.1        Dataset

We utilized dataset ”A dataset for information spreading over the news (IPoNews)”
that consists of pairs of news articles that were labeled based on the level of their
similarity, as described in [18]. This dataset was collected from Event Registry,
a platform that identifies events by collecting related articles written in differ-
ent languages from tens of thousands of news sources [10]. The similarity score
among cross-lingual news articles was calculated using concept-based similar-
ity employing Wikifier service3 . [18] describes the criteria when information is
considered to be propagated. Statistics of the data set are shown in table 3.


                                        Table 1. Statistics about dataset
             Dataset Domain                   Event typeArticles per Language Total Articles
                                                        Eng Spa Ger Slv Por
                 1      Sports           FIFA World Cup 983 762 711 10 216         2682
                 2      Natural Disaster Earthquake     941 999 937 19 251         3147
                 3      Climate Changes Global Warming 996 298 545 8 97            1944




    The dataset contains a list of pairs of news articles annotated with one of
the labels such as ”information-Propagated”, ”Unsure”, or ”Information-Not-
Propagated” (see Table 2). The information is considered to be propagated if the
cosine similarity score of the two articles in the pair is above a predefined thresh-
old ( ≥ 0.7 for Information-Propagated, < 0.4 for Information-not-Propagated,
otherwise Unsure). We restructured the original dataset to include only exam-
ples labeled as spreading information. In this way, we have pair of news articles
where we observe information spreading from one to the other. Furthermore, for
each example, instead of having a pair of articles, we kept only the article that
was published earlier. In this way, each example contains an article that spreads
information.

                                         Table 2. Articles with metadata
    from       to         weight Class                     from-publisher to-publisher from-pub-uri            to-pub-uri
    Por44      Por43      0.627 Unsure                     ClicRBS         SAPO 24      jornald.clicrbs.com.br 24.sapo.pt
    English881 English880 1      Information-Propagated    Sky News        247 Wall St. news.sky.com           247wallst.com
    English258 English329 0.313 Information-Not-Propagated Sify            4-traders    sify.com               4-traders.com
    English793 English787 0.238 Information-Not-Propagated Bioengineer.org 7NEWS Sydney scienmag.com           7news.com.au
    German237 German236 0.979 Information-Propagated       watson          watson       aargauerzeitung.ch     aargauerzeitung.ch




3
     http://wikifier.org/info.html, https://github.com/abdulsittar/IPoNews
           Using the profile of publishers to predict barriers across news articles                5

3.2    Statistics after restructuring the data

The original dataset describes in Section 3 contains pairs of articles along with
the information on whether there was the propagation of information related to a
specific event or not. We used only examples labeled as propagating information
4
  . Based on the available metadata for articles, we ignored articles that do not
have metadata information in our database (see Section 3.4). Table 3 shows the
statistics for each barrier after filtering the original dataset.


                              Table 3. Statistics about barrier
    Dataset Domain          Event type                  Articles for each barrier
                                            Time-Zone Cultural Political Geographical Economical
       1    Sports           FIFA World Cup 724       699      143       726          634
       2    Natural Disaster Earthquake     1102      1113     227       1113         1010
       3    Climate Changes Global Warming 586        445      108       487          463




3.3    Wikipedia Concepts as Features

As our dataset already mention (see Section 3) if information in news is spread-
ing from an article to another based on Wikipedia-concepts, we utilized the
most frequent (top 300) Wikipedia-concepts as features. Figure 1 portrays these
Wikipedia-concepts for all three events in form of word clouds.




Fig. 1. Word clouds of most frequent words related to earthquake, FIFA
World Cup and Global Warming events respectively.




3.4    Barriers Knowledge

Barriers knowledge refers to a database that contains metadata about each bar-
rier. Figure 3 shows schema of database and Table 4 presents barriers along with
their characteristics. Each barrier depends on one main information that is the
country name of the headquarter of the news publishers. Since the utilized data
4
    https://doi.org/10.5281/zenodo.3950064
6        A. Sittar et al.

set already contains headquarter of publishers therefore we fetched the coun-
try associated with headquarters. For economical barrier, we fetched economical
profile for each country using “”The Legatum Prosperity Index”” 5 . Cultural
differences among different regions were collected using Hofstede’s national cul-
ture dimensions (HNCD). For time zone and geographical barrier, we stored
general UTC-offset, latitude, and longitude. For political barrier we are using
the political alignment of the newspaper/magazine that we determined based on
Wikipedia infobox at their Wikipedia page. For instance, for Austrian newspa-
per ”Der Standard” we find social liberalism as political alignment (See Figure
2), for British newspaper ”Daily Mail” we find right-wing as political alignment,
for German ”Stern” magazine there is no information in its Wikipedia infobox
on the political alignment thus we label political alignment as unknown.




Fig. 2. Three Wikipedia infobox for three different newspapers/magazines
with political alignment



5
    https://www.prosperity.com/
       Using the profile of publishers to predict barriers across news articles   7

3.5   Features for Individual Barrier

We represented each barrier with a specific profile containing a list of features.
Table 4 depicts the list of features for each barrier. Economic and cultural bar-
riers consist of a vector of length 11 and 6 features whereas geographical, time
zone, and political only contain 1 or 2 features such as latitude-longitude, UTC-
offset, and political alignment.




                       Fig. 3. Database Schema for Barriers




3.6   Dataset Annotation

We queried the metadata information for each article and generated a CSV file
for each barrier. We annotated each article based on that meta information to be
used for model training and classification. For economic and cultural barriers, we
calculated cosine similarity between vectors of economical values and vectors of
cultural values. Score greater than the threshold value of 0.9 labeled as FALSE
otherwise TRUE. We set the lowest value as a threshold based on the fact that
if two countries have a little gap concerning culture or economical values then
there exists a barrier. For geographical barriers, we compared the latitude and
longitude of the country of each publisher. If a country name or lat/lat appeared
to be the same then we annotated it with FALSE otherwise TRUE. Lastly, for
8      A. Sittar et al.

                          Table 4. Features of each barrier

Barrier      Features
             Rank, Safety-Security,
             Personal-Freedom, Governance, Social-Capital, Investment-Environment,
Economic
             Enterprise-Conditions, Market-Infrastructure, Economic-Quality,
              Living-Conditions, Health, Education, Natural-Environment
             Power-Distance,
Cultural     Uncertainty-Avoidance-By-Individuals, Individualistic-Cultures,
             Masculinity-Femininity, Long-Term-Orientation, Indulgence-Restraint
Geographical Latitude, Longitude
Time Zone    UTC-offset
Political    Political-Alignment




time-zone and political barriers, we followed the same process that was for the
geographical barrier. if political alignment or UTC-offset appeared to be the
same for a pair then it is annotated with FALSE otherwise TRUE. Figure 4
depicts the class distribution for each barrier. We can notice unbalanced class
distribution with majority of the examples being False. This is especially true
for Cultural and Political barrier with 91 percent of example being False. Thus
in our evaluation we rely more on F1 measure than classification accuracy.




                  Fig. 4. Class Distribution for Each Barrier
       Using the profile of publishers to predict barriers across news articles   9

4     MATERIALS AND METHODS
4.1   Problem Modeling
For each barrier, we have a list of news articles where each article is associated
with 300 Wikipedia-concepts and features related to that barrier. The task is to
predict the status S of each barrier B.

                                    S = f (C, B)
f is the learning function for barrier detection, C is donating here Wikipedia-
concepts related to an article and B is the list of features related to a specific
barrier (see Table 4).

4.2   Methodology
We utilized dataset IPoNews [18] and built a database on top of this dataset
that includes barrier knowledge. Figure 5 explains the overall process of model
construction from news articles to results generation. We created a list of in-
stances using the most frequent Wikipedia-concepts based on news articles and
joined them along with barrier knowledge. After performing the annotation (see
Section 3.6), we trained popular classification models and generated the results
on test data (see Section 5.4).




                       Fig. 5. Steps for Model Construction




5     EXPERIMENTAL EVALUATION
5.1   Baselines
We used the following methods as baselines for all our models.
 – Uniform: Generates predictions uniformly at random.
 – Stratified: Generates predictions by respecting the training set’s class dis-
   tribution.
 – Most Frequent: Always predicts the most frequent label in the training
   set.
10        A. Sittar et al.

5.2     Classification Methods

We trained popular classification models for each barrier such as SVM, kNN,
Decision Tree, Random Forest, and Naive Bayes using Scikit-Learn. We applied
a stratified 10-fold cross-validator to split the dataset for training and testing.
For Random Forest, kNN, and Decision Tree, we varied the size of n-estimator,
value of k, and max-leafs and chosen the one with the best score on test data
respectively. Implementation of this methodology to barrier detection can be
found on GitHub 6 .


5.3     Evaluation Metric

Due to imbalance in the class distribution for all barriers, we used micro averaged
precision and recall to evaluate our models. 7

 – Micro-Precision: The precision of average contributions from each class is
   calculated in micro-precision whereas the following question is answered by
   precision: What proportion of positive predictions was correct? It is defined
   as:
                                                 T rueP ositivesum
              M icro − P recision =
                                      T rueP ositivesum + F alseP ositivesum

 – Micro-Recall: Recall of average contributions from each class is calculated
   in micro-recall whereas the following question is answered by recall: What
   proportion of actual positives was predicted correctly? It is defined as:

                                              T rueP ositivesum
               M icro − Recall =
                                   T rueP ositivesum + F alseN egativesum

5.4     Results and Analysis

Table 5 shows the results of all the classifiers for each barrier along with baselines.
Analysis of the experimental results show that overall all the machine learning
models outperform the three baselines. For all the barriers, we can notice Micro-
Recall is equal to Micro-Precision. The best performing baseline is the ”Most-
frequent” with Micro-F1 for economic, cultural, geographical, time zone, and
political barrier equal to 0.70, 0.90, 0.58, 0.70, and 0.90 respectively. The best
performing models on all the barriers are Decision Tree, Random Forest, and
kNN. Looking at Micro-F1, we can see that on the Economic and Cultural
barrier kNN achieved the best performance of 0.75 and 0.95 respectively. On
Geographical barriers, kNN and Decision Tree performed the best achieving 0.81.
On Time-Zone, the best performing classifier is Random Forest with Micro-F1
6
     https://github.com/cleopatra-itn/BarrierDetection-Classification
7
     https://peltarion.com/knowledge-center/documentation/evaluation-
     view/classification-loss-metrics/micro-recall
        Using the profile of publishers to predict barriers across news articles   11

0.83. On Political barriers, SVM, kNN, and Random Forest achieve the best
Micro-F1 score of 0.97.
    In terms of classification accuracy, we can see that Random Forest outper-
forms the baselines as well as the other four classifiers for the first four barriers.
Notice that Random forest performs better than decision tree but takes more
time. Naive-Bayes achieves a little bit lower classification accuracy than the Deci-
sion Tree for the first four barriers. On the political barrier Naive-Bayes achieves
the best classification accuracy (0.98) but lower Micro-F1 (0.66).


6    CONCLUSIONS AND FUTURE WORK
It is highly important to detect the barriers while information propagates specif-
ically through the news. For journalists, marketers, and social scientists, the phe-
nomenon of knowing which barrier appeared most frequently for what type of
events, is significantly helpful to solve business and marketing problems. In this
regard, we proposed a simple methodology. Though its results are good enough
for three types of events, we would like to enhance features as well as events. We
used only Wikipedia-concepts and meta information to detect barriers. In the
future, we would like to use DMoz categories provided by Event Registry [10],
and transformation of the text of news articles as a feature for barrier detection.
Currently geographical and time zone barriers are calculated in a binary way ei-
ther the same or different. In the future, we would like to introduce the distance
between countries and between time zones as labels instead of the currently used
binary labeling.


7    ACKNOWLEDGMENTS
The research described in this paper was supported by the Slovenian research
agency under the project J2-1736 Causalify and co-financed by the Republic
of Slovenia and the European Union’s Horizon 2020 research and innovation
program under the Marie Sklodowska-Curie grant agreement No 812997.
12   A. Sittar et al.



                Table 5. Classifiers’ comparison with baselines

          Barrier        Model         CA Mic-Pre Mic-Rec Mic-F1
          Economic       Uniform       0.50 0.50  0.49    0.49
                         Stratified    0.58 0.59  0.57    0.59
                         Most Frequent 0.70 0.70  0.70    0.70
                         SVM           0.66 0.69  0.69    0.69
                         kNN           0.70 0.75  0.75    0.75
                         Decision Tree 0.69 0.73  0.73    0.73
                         Random Forest 0.74 0.74  0.74    0.74
                         Naive Bayes   0.61 0.63  0.63    0.63


          Cultural       Uniform       0.50 0.50    0.49     0.50
                         Stratified    0.83 0.83    0.83     0.83
                         Most Frequent 0.90 0.90    0.90     0.90
                         SVM           0.84 0.93    0.93     0.93
                         kNN           0.55 0.95    0.95     0.95
                         Decision Tree 0.90 0.94    0.94     0.94
                         Random Forest 0.93 0.93    0.93     0.93
                         Naive Bayes   0.83 0.51    0.51     0.51


          Geographical Uniform       0.49 0.50      0.50     0.50
                       Stratified    0.50 0.51      0.51     0.51
                       Most Frequent 0.58 0.58      0.58     0.58
                       SVM           0.81 0.76      0.76     0.76
                       kNN           0.79 0.81      0.81     0.81
                       Decision Tree 0.78 0.81      0.81     0.81
                       Random Forest 0.79 0.79      0.79     0.79
                       Naive Bayes   0.76 0.79      0.79     0.79


          Time Zone      Uniform       0.49 0.49    0.49     0.49
                         Stratified    0.59 0.58    0.58     0.58
                         Most Frequent 0.70 0.70    0.70     0.70
                         SVM           0.78 0.77    0.77     0.77
                         kNN           0.70 0.78    0.78     0.78
                         Decision Tree 0.80 0.81    0.81     0.81
                         Random Forest 0.83 0.83    0.83     0.83
                         Naive Bayes   0.72 0.64    0.64     0.64


          Political      Uniform       0.51 0.52    0.50     0.50
                         Stratified    0.84 0.83    0.81     0.82
                         Most Frequent 0.90 0.90    0.90     0.90
                         SVM           0.79 0.97    0.97     0.97
                         kNN           0.62 0.97    0.97     0.97
                         Decision Tree 0.79 0.91    0.91     0.91
                         Random Forest 0.97 0.97    0.97     0.97
                         Naive Bayes   0.98 0.66    0.66     0.66
        Using the profile of publishers to predict barriers across news articles       13

References
 1. Al-Samarraie, H., Eldenfria, A., Dawoud, H.: The impact of personality traits on
    users’ information-seeking behavior. Information Processing & Management 53(1),
    237–247 (2017)
 2. Alam, M.T., Islam, M.M.: Bard: Bangla article classification using a new compre-
    hensive dataset. In: 2018 International Conference on Bangla Speech and Language
    Processing (ICBSLP). pp. 1–5. IEEE (2018)
 3. Andrews, S., Gibson, H., Domdouzis, K., Akhgar, B.: Creating corroborated cri-
    sis reports from social media data through formal concept analysis. Journal of
    Intelligent Information Systems 47(2), 287–312 (2016)
 4. Bakshy, E., Messing, S., Adamic, L.A.: Exposure to ideologically diverse news and
    opinion on facebook. Science 348(6239), 1130–1132 (2015)
 5. Camaj, L.: Media framing through stages of a political discourse: International
    news agencies’ coverage of kosovo’s status negotiations. International Communica-
    tion Gazette 72(7), 635–653 (2010)
 6. Dagon, D., Zou, C.C., Lee, W.: Modeling botnet propagation using time zones. In:
    NDSS. vol. 6, pp. 2–13 (2006)
 7. He, M., Lee, J.: Social culture and innovation diffusion: a theoretically founded
    agent-based model. Journal of Evolutionary Economics pp. 1–41 (2020)
 8. Hong, X., Yu, Z., Tang, M., Xian, Y.: Cross-lingual event-centered news clustering
    based on elements semantic correlations of different news. Multimedia Tools and
    Applications 76(23), 25129–25143 (2017)
 9. Khosrowjerdi, M., Sundqvist, A., Byström, K.: Cultural patterns of information
    source use: A global study of 47 countries. Journal of the Association for Informa-
    tion Science and Technology 71(6), 711–724 (2020)
10. Leban, G., Fortuna, B., Brank, J., Grobelnik, M.: Event registry: learning about
    world events from news. In: Proceedings of the 23rd International Conference on
    World Wide Web. pp. 107–110 (2014)
11. Martens, B., Aguiar, L., Gomez-Herrera, E., Mueller-Langer, F.: The digital trans-
    formation of news media and the rise of disinformation and fake news (2018)
12. Maurer, P., Beiler, M.: Networking and political alignment as strategies to con-
    trol the news: Interaction between journalists and politicians. Journalism Studies
    19(14), 2024–2041 (2018)
13. Quezada, M., Peña-Araya, V., Poblete, B.: Location-aware model for news events
    in social media. In: Proceedings of the 38th International ACM SIGIR Conference
    on Research and Development in Information Retrieval. pp. 935–938 (2015)
14. Rao, V., Sachdev, J.: A machine learning approach to classify news articles based
    on location. In: 2017 International Conference on Intelligent Sustainable Systems
    (ICISS). pp. 863–867. IEEE (2017)
15. Segev, E.: Visible and invisible countries: News flow theory revised. Journalism
    16(3), 412–428 (2015)
16. Segev, E., Hills, T.: When news and memory come apart: A cross-national com-
    parison of countries’ mentions. International Communication Gazette 76(1), 67–85
    (2014)
17. Shu, K., Zhou, X., Wang, S., Zafarani, R., Liu, H.: The role of user profiles for fake
    news detection. In: Proceedings of the 2019 IEEE/ACM international conference
    on advances in social networks analysis and mining. pp. 436–439 (2019)
18. Sittar, A., Mladenić, D., Erjavec, T.: A dataset for information spreading over the
    news. In: Proc. of Slovenian KDD Conf. on Data Mining and Data Warehouses
    (SiKDD) (2020)
14      A. Sittar et al.

19. Watanabe, K., Ochi, M., Okabe, M., Onai, R.: Jasmine: a real-time local-event
    detection system based on geolocation information propagated to microblogs. In:
    Proceedings of the 20th ACM international conference on Information and knowl-
    edge management. pp. 2541–2544 (2011)
20. Wei, H., Sankaranarayanan, J., Samet, H.: Enhancing local live tweet stream to
    detect news. GeoInformatica pp. 1–31 (2020)
21. Wu, H.D.: A brave new world for international news? exploring the determinants of
    the coverage of foreign news on us websites. International Communication Gazette
    69(6), 539–551 (2007)