Building a Knowledge Graph of Events and Consequences Using Wikidata? Oktie Hassanzadeh[0000−0001−5307−9857] IBM Research hassanzadeh@us.ibm.com Abstract. In this short paper, we present our preliminary results on building a Knowledge Graph (KG) of events and consequences with application to event forecasting and analysis. A base KG is first con- structed using existing concepts and relations in Wikidata. Using an automated unsupervised knowledge extraction pipeline, causal knowl- edge is extracted from Wikipedia articles to augment the base KG. We show examples from the base and the augmented KG, and discuss a few challenges in building a high-quality KG. We also discuss a few poten- tial directions that the Wikidata community can work on to improve the representation of event-related knowledge in Wikidata. 1 Introduction While prior work has considered knowledge-driven forecasting of future events [9, 10], curating large collections of causes and effects [4, 8], and event-based knowl- edge graphs (e.g., GDELT GKG [7]), there are no rich structured sources of knowledge around major societal events that can be queried directly to reason about the potential consequences of ongoing events. In this paper, we report on our initial results on curating such a source of knowledge from Wikidata and Wikipedia. Wikipedia is a rich source of knowledge about major events and their conse- quences. Major newsworthy events often result in many additions and new pages describing various aspects of the events in detail. In particular, there are often descriptions of causes and effects of events, either explicitly in text, or implicitly in statements, sections, or descriptions of timelines of events. An effective rep- resentation of this knowledge in the form of a rich knowledge graph can enable a deep analysis of past events and their consequences. This can in turn be used as a mechanism of predicting the potential consequences of ongoing events by mapping them to past similar events in the knowledge graph. Wikidata [12] aims at representing the rich knowledge available in Wikipedia in structured form. As shown in Figure 1, there are existing causal relations such as has cause and has effect between many event-related concepts. However, ? Distribution Statement “A” (Approved for Public Release, Distribution Unlimited). Copyright c 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 Oktie Hassanzadeh extracted causal relation (from: https://en.wikipedia.org/wiki/COVID-19_pandemic) Fig. 1. Examples of Event-Related Causal Knowledge in Wikidata and Wikipedia there are many explicit and implicit causal relations described in Wikipedia ar- ticles that are missing from Wikidata. In what follows, we first show how we can turn the existing event-related concepts and causal relations in Wikidata into a base knowledge graph of events and consequences geared towards future event prediction and analysis. We then describe an unsupervised knowledge extrac- tion pipeline that uses the textual descriptions of events in Wikipedia articles to augment the base knowledge graph. Using a few examples, we discuss the strengths and weaknesses of the approach. Finally, we discuss a few directions for future work. 2 Knowledge Graph of Events and Consequences We first construct a base knowledge graph of events and consequences from existing concepts and links in Wikidata. Since our goal is analyzing major news- worthy events and their consequences, we only include in the base knowledge graph those event types that at least one of their instances have an existing link to a Wikinews article. This way, we ensure that out of the thousands of subclasses of type occurrence (Q1190554) and their instances, we only include events that are likely to receive news coverage. We then query for all the exist- ing causal relations in Wikidata using properties such as has effect (P1542), contributing factor of (P1537), immediate cause of (P1536) and their inverse properties. We then group the event types that are linked directly or through their instances. The result is a collection of event objects that are event types (classes in Wikidata), each associated with a list of consequences which are also event types. Each consequence for an event has a list of examples with each example having a cause event instance and an effect event instance. Events and consequences are also annotated with a set of base scores derived from simple frequency analysis, e.g. the number of example pairs of instances, the number of triples for the event type and its instances, and the number of Wikipedia pages linked to instances of the type. The result is a collection of event types and their consequences, along with examples for each consequence, and scores that can be used for ranking of potential consequences for a given event. Building a Knowledge Graph of Events and Consequences Using Wikidata 3 Fig. 2. Example Events and Consequences from the Base KG Our current version of the base KG contains 50 source events (classes), 427 consequences, and 563 examples (instances). This output is a result of running 2,762 SPARQL queries to retrieve all the concepts and relations as well as their included properties and statistics. Figure 2 shows a few event types linked to coup d’état (Q45382), instances (examples) for one of the consequences, and their JSON representations. This Wikidata-based representation of events and consequences not only enables retrieval of potential consequences for a given type of event, it also enables a deeper analysis of potential consequences using the rich structured knowledge around the Wikidata concepts. As a simple example, one can group potential consequences by geographic locations associated with the cause and effect events. 3 Causal Knowledge Extraction As mentioned earlier, there are many causal relations expressed explicitly or implicitly in Wikipedia articles that cannot be found on Wikidata. We use an automated unsupervised causal knowledge extraction pipeline to augment the base KG using natural language understanding. The pipeline, shown in Figure 3, relies on pre-trained neural Question Answering (QA) and Entity Linking (EL) models. It consists of the following steps: a) A collection of causal questions are generated using a set of templates, such as “What could X cause?” or “What was a major consequence of X?” where X is a label of an event type or instance, b) a pre-trained neural QA model is used to find the answer from Wikipedia articles associated with the target event, and c) the answers are linked to Wikidata using pre-trained neural entity linking models based on BLINK [13]. At the time of this writing, we have applied the causal knowledge extraction pipeline only to the opening paragraphs of a collection of Wikipedia articles that describe instances of events that can be found in the base KG. Figure 3 shows examples of the extracted causes and consequences for the same coup d’état (Q45382) event used in the base KG example in Figure 2. Out of the six extracted consequences, one was also in the base KG (conflict (Q180684)), 4 Oktie Hassanzadeh Knowledge Graph of Events & Consequences Causal Knowledge Extraction Pipeline e1 e2 Unsupervised Causal Knowledge Extraction … … Neural Neural Question Answering Entity Linking Model Model … … Examples Fig. 3. Causal Knowledge Extraction Pipeline Fig. 4. Example Augmentations by the Causal Knowledge Extraction Pipeline one is a superclass of of an event in the base KG murder (Q132821) which is a superclass of political murder (Q1139665) and the other four extractions were not in the base KG. The figure also shows an example for the discovered consequence bomb attack (Q891854). The examples in the output KG from this pipeline also include a list of mentions that are answers from the QA model and come with a confidence score answer score, and a linking score that is the confidence score of linking the mention text to the Wikidata entity. As the examples show, the pipeline is capable of extracting some very inter- esting causal relations that could not be found on Wikidata. We have found the overall quality of the output to be high even without any effort on tuning the models and parameters. One major quality issue in the current output is the wrong direction of the edges which is also evident in Figure 3. This is mainly a result of the question answering model returning cause instead of effect and vice versa. A potential solution is applying a custom classifier on top of the output, possibly by applying the outcome of our prior work on binary causal question answering [3] and using Natural Language Inference (NLI) for causal relation classification [1]. Building a Knowledge Graph of Events and Consequences Using Wikidata 5 4 Lessons Learned & Future Work Our current results show a number of challenges, some of which could be ad- dressed by the Wikidata community: – One simple but classic problem we are facing in using event-related Wikidata entities is the inconsistency in instance of (P31) statements. One example as shown in Figure 3 is the event death of Eduardo Frei Montalva (Q5247432), which at the time of this writing, is an instance of murder (Q132821), death (Q4) and certain aspects of a person’s life (Q20127274), whereas the right class consistent with the base KG would be political murder (Q1139665) (which is a subclass of class murder (Q132821)). – Some causal relations expressed in text cannot be represented using the ex- isting entities and relations on Wikidata. For example, the Wikipedia article in the example in Figure 1 states that the pandemic has caused “tempo- rary decreases in emissions of pollutants and greenhouse gases”. There are currently no events or event types representing a decrease or a change in pol- lutants and greenhouse gases. One potential solution could be a “has effect on” relation that could link the pandemic concept to e.g. carbon dioxide emissions (Q3588927) along with attributes that could state whether the effect is temporary and whether it is a decrease or increase. – Another direction that could have a significant effect on the community would be a tighter integration between Wikinews and Wikidata. For ex- ample, the authors of Wikinews articles can be encouraged to create re- lated Wikidata items and specify causes of the events being described. On the Wikidata side, better representation of event classes and event- related concepts along with better alignment with Wikinews categories and sitelinks can provide the community with improved retrieval and news anal- ysis capabilities. We are currently working on improving our causal knowledge extraction pipeline in several ways, and performing a thorough evaluation of the quality of the extracted knowledge. A major challenge in using state-of-the-art causal relation extraction solutions [14] and benchmarks [5] is their focus on common- sense reasoning as the end application. One direction we are pursuing is publicly releasing our base KG along with a linked corpus of text from Wikipedia, that can be used as a benchmark for causal relation extraction and generic knowl- edge base completion solutions (e.g., IntKB [6]). We also plan to investigate the application of the knowledge graph in event forecasting [2] and enterprise risk management [11]. 5 Acknowledgements This research is based upon work supported in part by U.S. DARPA KAIROS Program No. FA8750-19-C-0206. The views and conclusions contained herein are 6 Oktie Hassanzadeh those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein. References 1. Bhandari, M., Feblowitz, M., Hassanzadeh, O., Srinivas, K., Sohrabi, S.: Unsu- pervised causal knowledge extraction from text using natural language inference (student abstract). In: AAAI (2021) 2. Hassanzadeh, O.: Predicting the future with Wikidata and Wikipedia. In: Pro- ceedings of the ISWC 2021 Posters & Demonstrations Tracks co-located with 20th International Semantic Web Conference (ISWC) (2021) 3. Hassanzadeh, O., Bhattacharjya, D., Feblowitz, M., Srinivas, K., Perrone, M., Sohrabi, S., Katz, M.: Answering binary causal questions through large-scale text mining: An evaluation using cause-effect pairs from human experts. In: IJCAI (2019) 4. Heindorf, S., Scholten, Y., Wachsmuth, H., Ngomo, A.C.N., Potthast, M.: CauseNet: Towards a causality graph extracted from the web. In: CIKM. ACM (2020) 5. Hosseini, P., Broniatowski, D.A., Diab, M.: Predicting directionality in causal re- lations in text. arXiv preprint arXiv:2103.13606 (2021) 6. Kratzwald, B., Kunpeng, G., Feuerriegel, S., Diefenbach, D.: IntKB: A verifiable interactive framework for knowledge base completion. In: Proceedings of the 28th International Conference on Computational Linguistics (COLING) (2020) 7. Leetaru, K., Schrodt, P.A.: GDELT: Global data on events, location, and tone, 1979–2012. In: ISA Annual Convention (2013) 8. Luo, Z., Sha, Y., Zhu, K.Q., Hwang, S., Wang, Z.: Commonsense causal reasoning between short texts. In: Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR) (2016) 9. Radinsky, K., Davidovich, S., Markovitch, S.: Learning causality for news events prediction. In: WWW (2012) 10. Radinsky, K., Horvitz, E.: Mining the web to predict future events. In: WSDM (2013) 11. Sohrabi, S., Katz, M., Hassanzadeh, O., Udrea, O., Feblowitz, M.D.: IBM scenario planning advisor: Plan recognition as AI planning in practice. In: IJCAI (2018) 12. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Com- mun. ACM 57(10), 78–85 (2014) 13. Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Zero-shot entity linking with dense entity retrieval. In: EMNLP (2020) 14. Yang, J., Han, S.C., Poon, J.: A survey on extraction of causal relations from natural language text. CoRR abs/2101.06426 (2021)