Using Uncertain Graphs to Automatically Generate Event Flows from News Stories Laura Christiansen Bamshad Mobasher Robin Burke Center for Web Intelligence Center for Web Intelligence Center for Web Intelligence Chicago, Illinois Chicago, Illinois Chicago, Illinois lchris10@cdm.depaul.edu mobasher@cs.depaul.edu rburke@cs.depaul.edu ABSTRACT lead to differing levels of certainty in the likelihood of those connec- Capturing the branching flow of events described in text aids a host tions. This lends itself to probabilistic or uncertain graphs, where of tasks, from summarization to narrative generation to classifica- edges have probabilities of their existence. An uncertain graph is tion and prediction of events at points along the flow. In t his paper, not discrete but is rather a template to generate all "possible worlds": we present a framework for the automatic generation of an uncer- all discrete graphs that are drawn from the edge probabilities. tain, temporally directed event graph from online sources such as We propose a framework for automatically extracting an un- news stories or social media posts. The vertices are generated using certain event graph, with edges directed by the temporal flow of Natural Language Processing techniques on the source documents events, from online sources such as news stories or social media and the probabilities associated with edges, indicating the degree posts. The first stage of this process involves event extraction us- of certainty those connections exist, are derived based on shared ing natural language processing (NLP) techniques. Part-of-speech entities among events. Graph edges are directed based on temporal (POS) tagging and semantic role labeling (SRL) allow us to extract information on events. Furthermore, we apply uncertain graph clus- predicates, which we treat as events. Entity detection labels what tering in order to reduce noise and focus on higher-level event flows. named entities are involved in those events while temporal expres- Preliminary results indicate the uncertain event graph produces a sion extraction defines the temporal ordering between events. Next, coherent navigation through events described in a corpus. we generate the edges and their probabilities based on Bayesian combination of evidence. As basic evidence, we use the extracted en- Reference Format: tities shared between events and the proximity of event references Laura Christiansen, Bamshad Mobasher, and Robin Burke. 2017. Using Uncertain Graphs to Automatically Generate Event Flows from News Stories. within the text. We focus on simple, text-based evidence of events In Proceedings of Workshop on Social Media World Sensors at ACM Hypertext and their connections but more complex information derived from 2017 (SIDEWAYS, HT’17), 6 pages, CEUR-WS.org. metadata can be utilized. Different domains offer different possible sensors for detecting and tracking events. This process generates the vertices and edges of the uncertain graph, and those edges 1 INTRODUCTION can then be directed based on the temporal ordering information Extracting a narrative progression from text opens the door for a discovered in the first stage. host of useful applications. Representations of the key stories can be Once the full event graph is generated, we use uncertain graph simplified or expanded upon to aid comprehension. Examining the clustering to reduce noise and discover higher-level abstractions, dynamics of the narrative events can reveal emergent information with clusters indicating closely related events describing a larger, and points of change that may be useful not only in understanding meta-topic within the graph. We use pKwikCluster, a clustering the story but in predicting future dynamics. One can observe how algorithm for uncertain graphs, to identify likely clusters. As a paths differ when looking at different domains, such as news sources precursor to a larger user study evaluation, we observe the flow versus social media, providing insight in how both represent events. and connections within the graph to evaluate its coherency and Understanding the flow of information over time is valuable. correctness. Our preliminary results on a dataset consisting of news Intuitively, we understand that flow is not flat. One event may articles indicates this is a viable approach to automatically capturing branch out to connect to events later in time. Likewise, many events and depicting a branching flow of events. may feed into a single event. An extracted timeline for a narrative will capture the temporal ordering but lose information on the 2 RELATED WORK connections between events. On the other hand, evidence of a con- nection between events may be incorrect. Inferring connections can Linking and tracking events is a research problem that has been addressed from a number of angles. Extending their previous work in event extraction, Rospocher et al. [6] propose a approach for automatically generating knowledge graphs based on the discov- ered events. In this knowledge graph, edges are predicates and nodes are entities as opposed to combining both within an event construct. Moving in the opposite direction, Althoff et al. [2] gener- ate timelines from knowledge graphs. The generated timelines are personalized and provide a temporal ordering but not branching SIDEWAYS 2017, Prague, Czech Republic connections. Shahaf et al. [8] developed an algorithm for generating Copyright held by the author(s). zoomable, intersecting timelines of key terms to summarize news. SIDEWAYS, HT’17, July 4, 2017, Prague, Czech Republic Laura Christiansen, Bamshad Mobasher, and Robin Burke These timelines are constructed in relation to each other and the identifies which parts of speech the different terms in a sentence terms that make up the nodes are annotated with news stories and have as well as identifying the sentence structure and with SRL, the is only intended for a high-level event representation. roles entities play within a predicate can be further identified. Take Event detection and extraction has been approached in a number the sentence "John bought a car in Boston"; using dependency pars- of ways. Work in [7] discovers a specific type of event, earthquake ing and SRL, we can identify "bought" as the predicate verb, "John" occurrences, from microblogs. User chatter on earthquakes is clas- as the subject, "a car" as the object and "in Boston" is the location. sified and filtered to act as sensors to determine when and where We consider predicates references to events rather than events; this an earthquake strikes. Also working with microblogs, [1] clusters distinction is important as the same event may have multiple refer- tweets based on keywords and locations to detect new events. Key- ences. To examine the most basic references and avoid redundancy, words, combined with the time and place they were posted, form a we focus on the smallest predicates. Predicates containing other rough event reference. In [10], events are automatically pulled from predicates were pruned. streaming news data; news relations are extracted then clustered Co-reference resolution, another established NLP task, enables to find different representations of the same event before training discovery of multiple representations of the same event within the a model to extract news relations based on that co-reference in- same document. Two predicates co-referencing each other indicate formation. This attempts to overcome issues of different linguistic the same event is discussed. Our definition of an event can now descriptions of the same event. We only address this issue indi- encompass multiple predicates based on co-references. To expand rectly, through co-reference resolution, but similarly are interested our example, if another sentence read "He bought it last Friday", in basic relations in the form of verb predicates. co-reference resolution can tell us if this instance of "bought" is When constructing networks, there may be doubts regarding referring to the same event as the predicate verb in the first event. the accuracy of connections between nodes are due to the tech- Similarly, it can tell us if "he" refers to "John". This helps better niques used to construct those connections. Link prediction may define the entities involved in an event; in our event reference we be erroneous or sensors may have detected noise. Uncertain graphs can substitute the more informative proper noun for the pronoun. tackle this problem by assigning probabilities of existence to edges; This substitution is further enhanced by Named Entity Recogni- an uncertain graph is the template with which to generate a set of tion (NER). NER identifies and classifies of named entities as people, possible discrete graphs based on those probabilities. For example, locations, or organizations. To continue our example, NER would Zhao et al. [11] use uncertain graphs to detect protein complex struc- identify "John" as a person and "Boston" as a location. Combining tures. In their graphs edges are interactions between structures, but this with co-reference resolution, we can find all co-references to a there is noise in data related to when they interact. Prachas et al. named entity and include the entity information in those events. [5], propose a method to generate the best discrete approximation Finally, the temporal relationships between events can be as- from an uncertain graph. We avoided generating discrete graphs certained through temporal expression extraction. In some cases, from our uncertain graph, relying on algorithms that approximate this finds the fixed time interval described in the text. In others, it calculations over a discrete graph set. Clustering algorithms are is is relative. The exact date of event E 1 might not be known but extended to uncertain graphs by Kollios et al. [4], and we use their we know it took place before event E 2 . be places parts of the text adaption of pKwikCluster and definition of estimated edit distance in time. Another sentence might tell us "Afterwards, John bought over an uncertain graph to aggregate our event vertices. Bonchi et coffee"; we can label the coffee purchase event as occurring after al. [3] examine how to perform core decomposition in an uncertain the car buying. By knowing this, we know the temporal flow of graph context, an approach we did not use but may be useful in events. creating higher-level representations of an uncertain event graph. We use the English language version of the Newsreader 1 pipeline to perform these NLP tasks on our dataset. Described in [9], the 3 UNCERTAIN EVENT GRAPHS pipeline is a series of NLP modules intended primarily for news text. We are not using the output of the entire pipeline; instead, our In this section, we describe our method for constructing a tempo- focus is POS tagging, dependency parsing and SRL, co-reference rally directed, uncertain event graph. First, we extract the necessary resolution, NER, and temporal relations. Each event has at least information from text using a variety of NLP techniques to con- one predicate representation and includes information on the roles struct the event vertices. Once events are defined, we proceed to within that predicate as well as any named entities involved. If an the definition of edges, their probabilities, and their direction. Fi- event contains no entities, it is removed. This describes our event nally, we aggregate events within clusters to provide a higher-level extraction from within a single text source. representation of the event graph structure. The same events may also be referenced between documents, which is not identified by the techniques described. To tackle this, 3.1 Event extraction we first look for the date range of the events. Events whose known The first stage is to discover the events described in the data. At a time intervals overlap are candidates to be combined. We also in- basic level, this process includes the identification of an action that clude events without an explicit time interval but whose document occurs and the entities involved. To this end, we define our initial publication dates are within a day of each other. This extension event references in terms of predicates. Predicates define actions makes sense in the context of news articles but should be omitted within a sentence and serve as an anchor point for additional details or replaced for other datasets. Candidates for merging then have involving the subjects and objects. To extract this information from text, we need to run POS tagging and dependency parsing. This 1 http://www.newsreader-project.eu/ Using Uncertain Graphs to Automatically Generate Event Flows from News Stories SIDEWAYS, HT’17, July 4, 2017, Prague, Czech Republic their term sets compared via Jaccard similarity, defined in equation m Õ n 1. These term sets are pruned to exclude conjunctions, articles, and dist(pi , p j ) Õ  1 punctuation. Any candidates with a Jaccard similarity greater than P(D|L) = 1 − (4) (m × n) i=1 j=1 td threshold α are combined. Additive smoothing of 0.1 is applied to equations 3 and 4 so (A ∩ B) J (A, B) = (1) that event pairs with a probability of 0 in one are not immedi- (A ∪ B) ately removed from the graph.For each pair of events, the value 3.2 Edge generation of P(L|En, D) is calculated. In the uncertain graph, this represents the probability the two events are linked and subsequently that an The vertices in the uncertain graph are the events we’ve just de- edge exists. Additional evidence can easily be incorporated using scribed. The next stage is to generate the edges in the graph graph. Bayes’ rule, extending equation 2. Taking into account the tempo- For this preliminary work, we examine basic relationships indicat- ral information extracted during the NLP stage, we can direct the ing two events are connected. As we are constructing an uncertain uncertain graph edges based on the temporal information. This graph, this requires computing the probability that a link between is either done by comparing the explicit time interval or through two events exists given the evidence at hand. Entity similarity and relative temporal relations. Edges are omitted from the graph if document co-occurrence proximity between events are the types we have no temporal information and the edge is directed from of evidence we use in the proposed approach. The first measure the earlier to the later event. If both events occur simultaneously, examines whether the same entities are involved in two events. We then the edge is bidirectional. Events without any possible edges posit two events sharing entities are more likely to be related than are pruned from the graph. We now have a temporally directed those that don’t. The second measure, intra-document proximity, uncertain graph of events. operates on the assumption that an author is not jumping from tan- gent to tangent within their writing; the closer two the descriptions of two events are within the text of a document, the more related 3.3 Event abstraction we can assume those events are. For ease of quickly interpreting a large event graph, some degree of Given those assumptions, we define the probability of a link aggregation and abstraction is useful. It provides a simpler represen- existing between two events given their entities and intra-document tation and further information on how the events are related to one proximity. Assuming both sources of evidence are conditionally another. To accomplish this aggregation, we turn to graph-based independent, we calculate the probability of a link existing given clustering. This is complicated by as the event graph is not discrete their evidence with equation 2 using Bayes rule. Let L represent but rather an uncertain graph that can be used to generate a large whether two events are linked, En the shared entities between set of discrete graphs. events, and D the the intra-document proximity between events. In [4], clustering methods were extended to apply to uncertain For P(L), we assume an ignorant prior of 0.5. graphs. We borrow their pKwikCluster, an adaptation of kwik- Cluster, to find event clusters within the graph. The pKwikCluster algorithm is simple: pick an available vertex at random as a new P(L)P(En, D|L) P(L|En, D) = cluster, add to that cluster all available neighbors with an edge prob- P(En, D) (2) ability greater than 0.5, then mark all vertices in the new cluster P(L)P(En|L)P(D|L) = as unavailable. Repeat these steps until every vertex is part of a P(En, D) cluster. This algorithm is run multiple times and the results of each where run are compared to determine which produced the best clustering. The goodness of a clustering result can be evaluated, in part, P(En, D) = P(L)P(En|L)P(D|L) + P(¬L)P(En|¬L)P(D|¬L) by the edit distance. For the non-probabilistic kwikCluster, this is P(En|L) is defined here as the average Jaccard similarity between between the graph and the cluster graph. Assume all edges between the predicates of two events. This is an average as, either through clusters are omitted; the edit distance indicates how many changes co-referencing predicates or combination of events between docu- needed to be made to the base graph’s structure to accommodate ments, an event can have more than one predicate representation. In that result. The clustering run that minimizes this metric is selected. equation 3, m and n represent the number of predicates describing events E 1 and E 2 respectively, while Eni and En j refer to the entity Õ Õ D(G, Q) = (1 − Puv ) + (Puv ) (5) set for predicates pi and p j . J (Eni , En j ) is the Jaccard similarity {u,v } ∈E Q {u,v }