Eve2Sign: Creating Signed Networks of News Events

                      Roshni Chakraborty                                  Srishti Bhandari
                  Aalborg University, Denmark                           NIT Durgapur, India
                       roshnic@cs.aau.dk                           sb.16u10463@btech.nitdgp.ac.in

                     Nilotpal Chakraborty                                   Ritwika Das
                  Aalborg University, Denmark                           NIT Durgapur, India
                      nilotpalc@cs.aau.dk                          rd.16u10468@btech.nitdgp.ac.in


                                                         Abstract

                       Studying news events and user opinions towards the news events have
                       several applications, like detection of the popularity of the news, un-
                       derstanding user preferences and stance towards particular news, de-
                       termining the evolution of the news story, etc. One of the major ways
                       to understand a news event is through the main characters (targets)
                       of the event and the relationships among those targets with respect to
                       the news event. Therefore, in this paper, we propose an approach to
                       creating a signed network of the news event to visualize the targets
                       and their interrelationships. Our experimental evaluations on 3 news
                       events indicate that the proposed approach can detect a large number
                       of relevant goals related to any news event without supervision and,
                       further, create a signed network e↵ectively.

1    Introduction
With the growth of Twitter as a social media platform [WVG+ 16], a wide range of users are coming to this
platform for discussing and sharing their opinions on various events [KLPM10]. The impacts of the events vary
across users [SAMA17]; for example, the immigration policy reformulation by an government might directly im-
pact the people seeking refuge and indirectly a↵ect the economy of industries. Therefore, a proper understanding
of news events related opinions is very important and finds applications, such as in the identification of the pop-
ularity of a news article [CEHPS14, Cas13], identifying the stance of a user [LPRR18] towards a particular news,
summarizing users’ opinions [ZSAG12, OGZ+ 17], etc. However, there are inherent challenges in identifying the
user opinions relevant to a news event and further utilizing these user opinions for other applications, such as the
vast vocabulary gap between the news event and the tweets, short length of tweets, usage of informal vocabulary
and abbreviations [SAC+ 18, MCHV18]. Moreover, the user might not explicitly mention the news event name
in the tweet while expressing the opinion related to the news event, and understanding of the user opinions
related to a news event requires considering both the positive and negative interactions of the users towards the
news event [DWT18]. Thus, identifying and understanding of the users’ opinions from Twitter towards a news
event would be eased if prior knowledge of the news event 1 [SAMA17], possible targets related to the news
event 2 [MKS+ 16] and the polarity of the targets towards the news event are known.

Copyright c by the paper’s authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia (eds.): Proceedings of the Text2Story’20 Workshop, Lisbon, Portugal, 14-April-2020,
published at http://ceur-ws.org
   1 A news event is a collection of one or many news articles related to the same real-life event
   2 A target may be a person, an organization, a government policy, a movement, a product related to the news event


                                                             79
   A signed network representation of the news event, where the nodes represent the possible targets related
to the news event and the news event, the edges represent the polarity relationship among the targets with
respect to the news event can provide as a prior knowledge base of the news event. This further can aid in
several applications, like identifying the stance [MSK17, Moh16] and bias of a user [DWT18] from his tweet
irrespective of the tweet text and absence of event name in the tweet, generating fair summaries of news events
by considering both negative and positive opinions of the readers towards the news event [DSB+ 19, JA18],
identifying the major aspects of the news event [VCLDD17, KWHR16], understanding of the evolution of a
news story [AKT18], etc. However, the creation of a signed network of a news event requires the resolution
of multiple sub-problems, like identifying the targets of the news event and the polarity relationship of the
targets with the news event. Although few existing research works discuss the significance of creating signed
networks from text [HAJR12, AKR13, SCM16], these works identify the polarity relationships between pre-
specified targets [HAJR12] or identify targets through degree and eigenvector centrality score [AKR13]. However,
identifying targets only through degree or eigenvector centrality score might not be able to cover the huge list
of possible targets related to the news event or identify the targets which comprise of multiple words. Further,
there is a significant di↵erence in content in news event-related information from the literary text and online
discussions. Therefore, it remains a challenge to build an approach to create a signed network of a news event
that can inherently identify the possible targets of a news event irrespective of the type of news event and number
of targets and determine the polarity relationships among the identified targets related to the news event.
   In this paper, we propose a novel approach that utilizes the news articles of a news event to create a signed
network of the event where the nodes represent the entity names or phrases, and the edges represent the po-
larity relationship between a pair of nodes with respect to the news event. E↵ectively, the proposed intends
to resolve two sub-problems, i.e., identifying the targets and determining the polarity relationship between the
targets towards the news event. Existing research works on target identification from the text can be segregated
into supervised and unsupervised techniques based on their proposed methodology [HN14]. Although the ex-
isting supervised approaches might provide better performance than the existing unsupervised approaches, the
requirement of human-annotated information related to a news event makes it challenging to be used [KMKB13]
for news events. Existing unsupervised approaches can be segregated into graph-based and embedding based
approaches. The graph-based approaches either utilize the sub-group structure [TMV16] through k-truss, k-core,
k-clique or community substructures, centrality based techniques, like textrank [MT04] and position based page
rank [FC17]. However, our preliminary analysis indicates that the significance of a target of a news event is
independent of their position in the news article, and further, a target might comprise of multiple words. We
observed that centrality based measures fail to identify targets comprising multiple words and sub-group based
measures generate extensive noise. Therefore, in this paper, we propose a page rank score (PRS) based edge
traversal approach to identify the relevant targets related to a news event. Our experimental evaluation indicates
that the proposed approach can identify targets of a news event irrespective of the number of words in the target
and the type of news event. After successful identification of the targets of the news event, the proposed approach
relies on existing natural language processing tools to identify the polarity among the detected targets from the
news articles. The identified targets, along with their detected polarity towards each other and the news event
is used to create a signed network of the news event.
   The rest of the paper is organized as follows: we present a formal definition of the problem along with a brief
description of the proposed approach in Section 2. We discuss the experiments and observations in Section 3
and finally draw our conclusions in Section 4.


                                 Figure 1: Overview of the Proposed Approach


                                                      80
2     Problem Definition
In this section, we provide a brief discussion of the problem followed by the details of the proposed approach.
Given a news event N , let A = {a1 , a2 , · · · } be the set of news articles related to N . We intend to create a
signed network representation, S = (V, E) where V represents the targets, T = {t1 , t2 , · · · } related to N and E
represents the polarity relationship among T , i.e., Eij 2 { 1, +1} indicates the polarity relationship between Ti
and Tj . An overview of the proposed approach is given in the figure 1, and we discuss the proposed approach in
details next.

2.1     Proposed Approach
The proposed approach primarily identifies T related to N and thereafter, determines the polarity relationships
between a pair of targets, say Ti and Tj from A. We discuss each of these steps next.

2.1.1     Identification of Targets
In this section, we discuss the proposed approach to identify T related to N . A target can comprise of single or
multiple words. We primarily discuss the proposed procedure to identify single word Ti followed by multiple word
Ti . Given N , we primarily create a document, D by combining the news articles, A related to N . Thereafter,
we create a graph G, where the nodes, Q represent the words, w, from D after pre-processing (the pre-processing
details is provided in section ) and the edges, R represent the normalized co-occurrence score, R(wi , wj ) between
a pair of words, wi and wj calculated from D by:

                                                             coSc(wi , wj )
                                      R(wi , wj ) =
                                                      maxwr ,ws 2w (coSc(wr , ws ))

where coSc(wi , wj ) represent the co-occurrence score of wi and wj in D. On G, we calculate the page
rank [PBMW99] centrality 3 score of each node, say wk . We select a node into the set of targets, T if the
PRS of wk is greater than the threshold, i.e., (pgR(wk ) thw ). Based on two news events, the threshold was
decided by a group of 2 manual annotators by considering that a minimum number of relevant targets is excluded,
and a minimum number of irrelevant targets is included related to N into T . We checked the e↵ectiveness of the
threshold on three di↵erent news events and found it relevant irrespective of the type of news event. Although
the number of relevant targets excluded and the number of irrelevant targets included varies across news events,
we found that the maximum number of irrelevant targets which was included was less than 4% of the total
relevant targets and minimum number of relevant targets which was excluded was less than 9% of the total
relevant targets for an event.
   However, as previously discussed, a target might contain multiple words. Hence, we follow a path based PRS
to identify targets of multiple words. For a word, wi , we represent the score of his neighbours as N gScore(wi , wk ),
where wk is a neighbour of wi , i.e., (wk 2Ng(wi )), as the weighted summation of the PRS of wi , the PRS of
wk and normalized co-occurrence score of wi and wk , R(wi , wk ). Therefore, N gScore(wi , wk ) can be formally
defined as,

                                N gScore(wi , wk ) = ↵[pgR(wi ) + pgR(wk ) + R(wi , wk )]

   We have considered ↵ as 0.33 to provide equal weightage to the page rank scores of wi and wk and the
normalized co-occurrence score of wi and wk . Therefore, a phrase which comprises of wi followed by wk is added
to the T if (N gScore(wi , wk )    thp ). We further extend N gScore to more than two words in a phrase by
following a path of length 2, 3 and 4 starting from wi to identify targets of length 3 5 respectively. However,
our experimental evaluations indicate that the number of phrases of size which are more than 4 words and is
greater than the thp is very few. Therefore, we consider targets that comprise of 1 4 words in this paper. In
table 1, we provide a brief overview of the graphs for each of the 3 events (details of the dataset of the news
events are discussed in section 3.1) considered in this paper.

2.1.2     Identification of Polarity Relationships
After identification of T related to N , we identify the polarity relationships among the targets from D. We
follow Fernandez et al. [FGÁLJM+ 16] to identify the polarity relationship by identifying the sentiment from the
    3 https://en.wikipedia.org/wiki/PageRank


                                                            81
          Table 1: Overview of the event graph related E1 , E2 and E3 , from which targets were detected.

                                              Dataset           E1          E2     E3
                                               Nodes          3000         2351   1825
                                               Edges          10673       14520   7831
                                           Average Degree      7.12       12.35   8.58

consequent sentences, S of D. We primarily discuss the approach proposed by Fernandez et al. [FGÁLJM+ 16]
briefly followed by the details of our adopted methodology based on the same work [FGÁLJM+ 16]. Fernandez
et al. [FGÁLJM+ 16] primarily identifies the dependency tree of a sentence and calculates the sentiment of the
sentence based on the sentiment of the dependency sub-trees. The sentiment of each sub-tree is calculated
through the sentiment score of the constituent lexicons along with the score due to the presence of intensifier,
modifier, and negation in the subtree. The SO-CAL dictionary [TBT+ 11] is referred to calculate the sentiment
score of a lexicon.
   For our approach, we primarily identify the subject, st and object, ot of s (for each s 2 S) and thereafter,
match if both st and ot exist in T , say as Ti and Tj . We identify the polarity between Ti and Tj as the sentiment
score calculated by following Fernandez et al. [FGÁLJM+ 16] on s. An iterative application of this procedure for
each s 2 S can e↵ectively identify a fraction of the existing links (E) between the targets, (T ). Therefore, after
this step, the polarity between Ti and Tj is identified as +1, 1 or 0. However, we observed that a fraction of the
edges across the targets could not be identified by this procedure. We believe the absence of polarity information
of all possible pairs of targets in S and the inherent complexity of the sentences are the two major limitations
due to which we can not ensure identification of the polarity among all the targets in T . In order to handle this
issue, existing characteristics of signed networks, like structural balance theory and status theory, or di↵erent
properties of signed networks could be used to predict the sign of a missing link [LHK10, DMT18] between
the targets which we consider as one of the future directions of this work. On identification of all the possible
polarity relationships among targets, we create a signed network of the event, S = (V, E) where V comprises of
T , E represents the identified polarity relationship between V as {+1, 1}.

3      Experiments and Discussion
In this section, we provide details of the datasets used, the pre-processing techniques followed in the experimental
evaluation of the proposed approach, discuss the results obtained.

3.1     Datasets and Preprocessing
For our experimental evaluation, we selected 3 news events that belonged to USA 4 , Europe 5 and India 6
respectively which has occurred during 2016 2017 from the list of events provided by Wikipedia corresponding
to each of the country or continent. We manually selected USA, Europe and India for our experiments to ensure
the proposed approach is not biased by the location of the news event. For each of the these news events, we
crawled the news articles from Google News Search API 7 by using Python Newspaper API 8 . We discuss the
news events along with the news article datasets in details next.

    1. 2016 Indian Banknote Demonetisation, N1 : In 2016, the Prime minister of India announced that the
       existing Rs 500 and Rs 1, 000 banknotes would not be allowed to be used and claimed that this action shall
       reduce the usage of counterfeit notes. This decision caused considerable debate in India. 9 . We crawled 86
       news articles related to this event.

    2. Catalan Independence, N2 : The Parliament of Catalonia passed a resolution to declare the independence of
       Catalonia from Spain which led to considerable debate and discussions among the International community
       and Catalonia 10 . We crawled 47 news articles related to this event.
    4 https://en.wikipedia.org/wiki/2017 in the US
    5 https://en.wikipedia.org/wiki/2017
    6 https://en.wikipedia.org/wiki/2016 in India
    7 https://news.google.com/?hl=en-IN&gl=IN&ceid=IN:en
    8 https://newspaper.readthedocs.io/en/latest/
    9 https://en.wikipedia.org/wiki/2016 Indian banknote demonetisation
    10 https://en.wikipedia.org/wiki/Catalan independence movement


                                                            82
 3. Allegations against Harvey Weinstein, N3 : Around 80 women accused Harvey Weinstein of sexual abuse
    and assault which led to severe discussions and debates 11 . We crawled 34 news articles related to this event.

   Our dataset comprises of news events from di↵erent geographical locations, media coverage (the number of
news articles related to an event range between 34 86) to show that the proposed approach is not biased by
the news event. We discuss the pre-processing details next.


                 (a) E1                               (b) E2                                (c) E3

        Figure 2: Word cloud representing a subset of the targets generated by the proposed approach.
  Pre-processing Details We provide a brief overview of the pre-processing details hereby.

 1. We combine A related to N into a single document, D.
 2. We use SpaCy [HM17] which provides an exhaustive list of around 325 stop-words (greater than the NLTK
    library 12 ) to remove the stop words from D. We also remove the punctuation marks from D.
 3. We segregrate D into it’s consequent sentences, S. We use SpaCy [HM17] and neuralcoref 13 library for co-
    reference resolution for each sentence, i.e., s 2 S. Further, we manually create a list to identify same words
    with di↵erent spellings into a single one, like the words demonetization and demonetisation was changed to
    the word demonetization.
 4. We use NLTK POS Tagger 14 to identify the nouns, verbs, adjectives and prepositions of D. Further, if a
    word has been tagged as both noun and adjective, then the word is considered to be of that tag which has
    higher occurrences. We only consider these words while creating G from D related to N .

3.2   Experiments and Results
In this section, we describe our experiment details, followed by our observations. We compare the T identified
by the proposed approach and the existing research works related to N (here, we consider N1 , N2 and N3 ).
We consider graph based approaches which identifies targets from text as baselines, like TextRank, Clique and
Community sub-structure based approaches. To evaluate the performance of the proposed approach and the
baselines in identifying targets related to N , we measure the number of detected targets (Dtar ) and the fraction
of irrelevant targets (Firr ) detected by the proposed approach and the baselines related to N1 , N2 and N3 .
   The decision of the relevance of a target was done by a group of 3 manual annotators who were given the set
of targets, Dtar detected by the proposed approach related to an event, N1 , N2 and N3 . The manual annotators
had no previous knowledge of the news events and was instructed to mark a target as relevant or irrelevant based
on their knowledge of the news event based on A related to N . A target was marked relevant or irrelevant based
on the majority of the annotators. We repeated this for the targets detected by each of the baselines for each
of the events, i.e., N1 , N2 and N3 , respectively. On comparing the results of the proposed approach with the
baselines, as shown in table 2, we observe the proposed approach generates the maximum number of relevant
targets followed by the baseline which utilizes Clique sub-structure. Although TextRank generates more targets,
 11 https://en.wikipedia.org/wiki/Harvey Weinstein
 12 https://www.nltk.org
 13 https://pypi.org/project/neuralcoref/
 14 https://nlp.stanford.edu/software/tagger.shtml


                                                      83
it produces maximum noise and is only e↵ective in generating single word targets. Further, we observe that
Clique and Community based structure fails to indicate the order of the words in the phrase, thus requiring
severe manual intervention to identify the relevant targets which comprises of multiple words. Therefore, as
discussed in section 3.1 and highlighted in table 1, the proposed approach can identify targets of relevance as
well as a large number of targets with respect to the existing works of target identification. In figure 2, we
provide a word cloud representation of a subset of the targets identified by the proposed approach related to E1 ,
E2 and E3 respectively.
        Table 2: Comparison of the proposed approach with the existing ones in terms of Firr and Ttar .

                        Dataset     P ropapp     T extRank        Clique      Community
                                   Ttar Firr     Ttar Firr      Ttar Firr     Ttar Firr
                        E1         81    0.10    109 0.60       107 0.43      123 0.53
                        E2         345 0.22      300 0.56       212 0.26      280 0.39
                        E3         108 0.11      124 0.39       136 0.24      128 0.37
   After detecting the targets, the proposed approach identifies the polarity relationship between a pair of targets
following the approach discussed in section 2.1.2. We found the approach can e↵ectively identify 0.63, 0.68, and
0.78 of the links for E1 , E2 and E3 respectively. On further analysis, we found that the fraction of edges which
could not be resolved was either due to the inability of the proposed approach in identifying polarity relationship
from a sentence or the absence of a target with any other target in any of the sentences. We further observed
that the inability of the proposed approach to identifying the polarity relationship from a sentence was due to
the presence of domain information and complex syntactic structure in the sentences. We intuitively believe
utilizing the structural balance theory of signed networks along with the inherent features of signed networks
that can e↵ectively predict the sign of the missing links [LHK10, DMT18] can ensure identification of polarity
relationship between a pair of targets which was not resolved by the approach discussed in section 2.1.2. In
figure 3, we provide signed network representation of E1 as created by the proposed approach, which comprises a
subset of the detected targets (as the total number of detected targets is quite large) and the edges represent the
polarity relationship between the targets towards the event. On visualizing the signed network for E2 , we found
a similar distribution of targets on either side of the polarity towards the event. However, the signed network
for E3 indicates a large number of targets to be negatively connected to the event than the number of targets
connected positively to the event.


Figure 3: Signed network representing some of the targets along with their positive (or, negative) relation with
the event E1 .


3.2.1   Analyzing Failures
Although the proposed approach e↵ectively creates a signed network irrespective of the type of news event,
we investigate more closely certain cases where our proposed approach fails. In the paper, the decision of the
threshold for identifying relevant targets is done manually and is a fixed value irrespective of the event which
might a↵ect the e↵ectiveness in identifying targets. Although we have tested the e↵ectiveness of the threshold on


                                                      84
3 news events, we believe we require an exhaustive analysis on more news events to ensure it’s e↵ectiveness and
applicability. Subsequently, the current version of the paper does not rigorously explore the determination of the
polarity relationship between the targets. We believe the inclusion of signed network properties to predict the
sign of a link as one of the immediate future directions of this paper can address the limitations of the polarity
relationship detection by the proposed approach.

4   Conclusions
In this work, we propose an approach to creating the signed network representation of a news event, where the
signed network comprises of the possible targets as nodes and the polarity relationship among the targets as
edges. We propose a page rank score based edge traversal approach to identify the targets of a news article
and rely on existing natural language processing tools to identify the polarity relationship between the identified
targets from the articles of a news event. Our experimental evaluation on 3 events indicates that the proposed
approach can detect a large number of relevant targets irrespective of the type of event with no supervision
and, further, creates a signed network e↵ectively. As one of our future directions, we believe the inclusion of
information related to semantic roles of the words from the news articles and structural information from the
created graph can aid in identifying more targets and further, reduce the noise in identified targets of the current
proposed approach. We also intend to incorporate the attributes related to the signed network to identify the
sign of the missing links.

References
[AKR13]         Apoorv Agarwal, Anup Kotalwar, and Owen Rambow. Automatic extraction of social networks
                from literary text: A case study on alice in wonderland. In Proceedings of the Sixth International
                Joint Conference on Natural Language Processing, pages 1202–1208, 2013.

[AKT18]         Omar Alonso, Vasileios Kandylas, and Serge-Eric Tremblay. How it happened: Discovering and
                archiving the evolution of a story using social signals. In Proceedings of the 18th ACM/IEEE on
                Joint Conference on Digital Libraries, pages 193–202, 2018.

[Cas13]         Carlos Castillo. Traffic prediction and discovery of news via news crowds. In Proceedings of the
                22nd International Conference on World Wide Web, pages 853–854. ACM, 2013.

[CEHPS14]       Carlos Castillo, Mohammed El-Haddad, Jürgen Pfe↵er, and Matt Stempeck. Characterizing the
                life cycle of online news stories using social media reactions. In Proceedings of the 17th ACM
                conference on Computer supported cooperative work & social computing, pages 211–223. ACM,
                2014.

[DMT18]         Tyler Derr, Yao Ma, and Jiliang Tang. Signed graph convolutional networks. In 2018 IEEE
                International Conference on Data Mining (ICDM), pages 929–934. IEEE, 2018.

[DSB+ 19]       Abhisek Dash, Anurag Shandilya, Arindam Biswas, Kripabandhu Ghosh, Saptarshi Ghosh, and
                Abhijnan Chakraborty. Summarizing user-generated textual content: Motivation and methods
                for fairness in algorithmic summaries. Proceedings of the ACM on Human-Computer Interaction,
                3(CSCW):1–28, 2019.

[DWT18]         Tyler Derr, Zhiwei Wang, and Jiliang Tang. Opinions power opinions: Joint link and interac-
                tion polarity predictions in signed networks. In 2018 IEEE/ACM International Conference on
                Advances in Social Networks Analysis and Mining (ASONAM), pages 363–366. IEEE, 2018.

[FC17]          Corina Florescu and Cornelia Caragea. Positionrank: An unsupervised approach to keyphrase ex-
                traction from scholarly documents. In Proceedings of the 55th Annual Meeting of the Association
                for Computational Linguistics (Volume 1: Long Papers), pages 1105–1115, 2017.

[FGÁLJM+ 16] Milagros Fernández-Gavilanes, Tamara Álvarez-López, Jonathan Juncal-Martı́nez, Enrique
              Costa-Montenegro, and Francisco Javier González-Castaño. Unsupervised method for sentiment
              analysis in online texts. Expert Systems with Applications, 58:57–75, 2016.


                                                      85
[HAJR12]    Ahmed Hassan, Amjad Abu-Jbara, and Dragomir Radev. Extracting signed social networks from
            text. In Workshop Proceedings of TextGraphs-7 on Graph-based Methods for Natural Language
            Processing, pages 6–14. Association for Computational Linguistics, 2012.
[HM17]      Matthew Honnibal and Ines Montani. spacy 2: Natural language understanding with bloom
            embeddings, convolutional neural networks and incremental parsing. To appear, 7(1), 2017.
[HN14]      Kazi Saidul Hasan and Vincent Ng. Automatic keyphrase extraction: A survey of the state of the
            art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
            (Volume 1: Long Papers), pages 1262–1273, 2014.
[JA18]      Myungha Jang and James Allan. Explaining controversy on social media via stance summa-
            rization. In The 41st International ACM SIGIR Conference on Research & Development in
            Information Retrieval, pages 1221–1224, 2018.
[KLPM10]    Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network
            or a news media? In Proceedings of the 19th International Conference on World Wide Web, pages
            591–600. ACM, 2010.
[KMKB13]    Su Nam Kim, Olena Medelyan, Min-Yen Kan, and Timothy Baldwin. Automatic keyphrase
            extraction from scientific articles. Language resources and evaluation, 47(3):723–742, 2013.
[KWHR16]    Yaser Keneshloo, Shuguang Wang, Eui-Hong Han, and Naren Ramakrishnan. Predicting the
            popularity of news articles. In Proceedings of the 2016 SIAM International Conference on Data
            Mining, pages 441–449. SIAM, 2016.
[LHK10]     Jure Leskovec, Daniel Huttenlocher, and Jon Kleinberg. Signed networks in social media. In
            Proceedings of the SIGCHI conference on human factors in computing systems, pages 1361–1370.
            ACM, 2010.
[LPRR18]    Mirko Lai, Viviana Patti, Giancarlo Ru↵o, and Paolo Rosso. Stance evolution and twitter
            interactions in an italian political debate. In International Conference on Applications of Natural
            Language to Information Systems, pages 15–27. Springer, 2018.
[MCHV18]    Béatrice Mazoyer, Julia Cagé, Céline Hudelot, and Marie-Luce Viaud. Real-time collection of
            reliable and representative tweets datasets related to news events. In BroDyn@ ECIR, pages
            23–34, 2018.
[MKS+ 16]   Saif Mohammad, Svetlana Kiritchenko, Parinaz Sobhani, Xiaodan Zhu, and Colin Cherry.
            Semeval-2016 task 6: Detecting stance in tweets. In Proceedings of the 10th International Work-
            shop on Semantic Evaluation (SemEval-2016), pages 31–41, 2016.
[Moh16]     Saif M Mohammad. Sentiment analysis: Detecting valence, emotions, and other a↵ectual states
            from text. In Emotion measurement, pages 201–237. Elsevier, 2016.
[MSK17]     Saif M Mohammad, Parinaz Sobhani, and Svetlana Kiritchenko. Stance and sentiment in tweets.
            ACM Transactions on Internet Technology (TOIT), 17(3):26, 2017.
[MT04]      Rada Mihalcea and Paul Tarau. Textrank: Bringing order into text. In Proceedings of the 2004
            conference on empirical methods in natural language processing, pages 404–411, 2004.
[OGZ+ 17]   Yi Ouyang, Bin Guo, Jiafan Zhang, Zhiwen Yu, and Xingshe Zhou. Sentistory: multi-grained
            sentiment analysis and event summarization with crowdsourced social media data. Personal and
            Ubiquitous Computing, 21(1):97–111, 2017.
[PBMW99]    Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation
            ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
[SAC+ 18]   Axel Suarez, Dyaa Albakour, David Corney, Miguel Martinez, and José Esquivel. A data collec-
            tion for evaluating the retrieval of related tweets to news articles. In European Conference on
            Information Retrieval, pages 780–786. Springer, 2018.


                                                  86
[SAMA17]    Vinay Setty, Abhijit Anand, Arunav Mishra, and Avishek Anand. Modeling event importance
            for ranking daily news events. In Proceedings of the Tenth ACM International Conference on
            Web Search and Data Mining, pages 231–240. ACM, 2017.

[SCM16]     Shashank Srivastava, Snigdha Chaturvedi, and Tom Mitchell. Inferring interpersonal relations
            in narrative summaries. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[TBT+ 11]   Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-
            based methods for sentiment analysis. Computational linguistics, 37(2):267–307, 2011.

[TMV16]     Antoine Tixier, Fragkiskos Malliaros, and Michalis Vazirgiannis. A graph degeneracy-based
            approach to keyword extraction. In Proceedings of the 2016 conference on empirical methods in
            natural language processing, pages 1860–1870, 2016.
[VCLDD17]   Steven Van Canneyt, Philip Leroux, Bart Dhoedt, and Thomas Demeester. Modeling and pre-
            dicting the popularity of online news based on temporal and content-related features. Multimedia
            Tools and Applications, pages 1–28, 2017.
[WVG+ 16]   Henning M Wold, Linn Vikre, Jon Atle Gulla, Özlem Özgöbek, and Xiaomeng Su. Twitter topic
            modeling for breaking news detection. In WEBIST (2), pages 211–218, 2016.
[ZSAG12]    Arkaitz Zubiaga, Damiano Spina, Enrique Amigó, and Julio Gonzalo. Towards real-time sum-
            marization of scheduled events from twitter streams. In Proceedings of the 23rd ACM conference
            on Hypertext and social media, pages 319–320. ACM, 2012.


                                                 87