Introduction

Oktie Hassanzadeh[

Predicting the Future with Wikidata and Wikipedia

0 IBM Research

0000

0001

In this demonstration, we present a prototype knowledgebased event forecasting system powered by Wikidata and Wikipedia. The system relies on existing event-related concepts and relations in Wikidata to build a base knowledge graph of events and consequences. It then uses a combination of state-of-the-art knowledge extraction methods to augment the base knowledge graph using natural language descriptions of events and their consequences that can be found in Wikipedia articles. Using a number of use case scenarios, we show how the resulting knowledge graph can be used as a part of a human-in-the-loop explainable solution for event forecasting and analysis.

Introduction

Wikipedia is a rich source of knowledge about major events and their consequences. Major newsworthy events often result in many additions and new pages describing various aspects of the events in detail. In particular, there are often descriptions of causes and e ects of events, either explicitly in text, or implicitly in statements, sections, or descriptions of timelines of events. Figure 1 shows a few examples of such sources of causal knowledge around COVID-19 related events. An e ective representation of this knowledge in the form of a rich knowledge graph can enable a deep analysis of past events and their consequences. This can in turn be used as a mechanism of predicting the potential consequences of ongoing events by mapping them to past similar events in the knowledge graph.

Wikidata [ 10 ] aims at representing the rich knowledge available in Wikipedia in structured form. As shown in Figure 1, there are existing causal relations such as has cause and has effect between many event-related concepts. We will show in our demonstration how these existing links can be used for an analysis of potential e ects of a given type of event. More importantly, we show how we can turn the existing links into a base knowledge graph of events and consequences, and then use the textual descriptions of events in Wikipedia articles to augment the base knowledge graph. In what follows, we describe the architecture of our prototype forecasting solution. We then present a brief sketch of our demonstration plan.

(from: https://en.wikipedia.org/wiki/COVID-19_pandemic) Knowledge Graph of Events and Consequences A base knowledge graph is curated from existing concepts and links in Wikidata. Since our goal in this work is analysis of major newsworthy events, we only include in the base knowledge graph those event types that at least one of their instances have an existing link to a Wikinews article. This way, we ensure that out of the thousands of subclasses of type occurrence (Q1190554) and their instances, we only include events that are likely to receive news coverage. We then query for all the existing causal relations in Wikidata using properties such as has effect (P1542), contributing factor of (P1537), immediate cause of (P1536) and their inverse properties. We then group the event types that are linked directly or through their instances. Each link between event types is also annotated with a set of base scores derived from simple frequency analysis, e.g., the number of example pairs of instances, the number of triples for the event type and its instances, and the number of Wikipedia pages linked to instances of the type. The result is a collection of event types and their consequences, along with examples for each cause-e ect pair and scores that can be used for ranking of potential consequences for a given event. Knowledge Graph of Events & Consequences

Causal Knowledge Extraction Pipelines … e1 e2

… …

… Examples Dashboard

News Analysis

Profile

Effects Analysis Event Analysis

Cause-Effect Analysis

Unsupervised Causal Knowledge Extraction

NMeuoradleQlsA NPaetuterarnlNMLaItcMhiondgel+s Distantly Supervised Models for Causal

Knowledge Extraction Neural Relation Extraction Models

Temporal Event Analysis

Temporal Event Models Timeline Extraction Causal Knowledge Extraction Pipelines The base knowledge graph is augmented with causal knowledge extracted from Wikipedia articles using a number of causal knowledge extraction pipelines. This augmentation can be in the form of a) nding new consequences for a given event of interest, b) nding new example cause-e ect pairs of instances for a pair of event types, and c) calculating scores re ecting the likelihood or signi cance for a causal relation between two events. Given the variety of ways that causal knowledge can be captured in Wikipedia documents as depicted in Figure 1, we need a number of di erent knowledge extraction approaches. In this demonstration, we show examples of three di erent kinds of pipelines we have implemented in our prototype: Unsupervised Causal Knowledge Extraction: 1) An approach relying on pattern matching and neural Natural Language Inference (NLI) models. Brie y, the approach we show is an adaptation of the approach of Bhandari et al. [ 2 ] which is a fully unsupervised pipeline with a high precision of nearly 80% in manual evaluations. We link the output phrases to Wikidata concepts using BLINK [ 11 ], keeping only high-con dence links. 2) An approach relying on neural Question Answering (QA) models that a) generates questions using a set of templates, such as \What could X cause?" or \What was a major consequence of X?" where X is a label of an event type or instance, b) uses pre-trained neural QA models and Wikipedia articles associated with the target event to retrieve an answer for the generated questions, and c) performs entity linking to link the answer to Wikidata.

Supervised Models for Causal Knowledge Extraction: We use neural models [ 1, 4 ] trained on existing annotated data such as the BECauSE Corpus 2.0 [ 5 ] for extraction of cause-e ect phrases from a corpus of event-related Wikipedia articles, and perform entity linking [ 6 ] on the phrases to map them to Wikipedia and then Wikidata concepts in the base knowledge graph. We are also exploring distantly supervised models [ 7 ] by constructing a training set through nding passages containing labels of pairs of events in the base knowledge graph, and using a neural model of relation extraction [ 9 ] to extract new causal relations. Temporal Event Analysis: This pipeline rst extracts event timelines from timeline sections and pages (examples shown in Figure 1), then maps the extracted sequences to Wikidata events, and then applies existing and novel temporal event models [ 3 ] to the sequences of events that will facilitate more complex analysis of potential temporal and causal relations between event types along with likelihood scores that will better facilitate the ranking of potential consequences for a given event and context.

Dashboard The user dashboard exposes a number of API functions that use the knowledge graph to assist the user with event analysis and forecasting. The APIs allow the user to 1) retrieve the latest news events and their context, 2) retrieve a list of potential consequences for a given event/context, along with explanation in the form of example similar past events and consequences, and 3) rank and re-rank the consequences based on di erent criteria. The user can optionally de ne a pro le that will be used for ranking the consequences based on how interesting or surprising the consequence could be for the user. 3

Demonstration Plan

We plan to use a number of use cases involving di erent recent or ongoing events, and show the ranked list of consequences according to the base knowledge graph as well as di erent versions of the knowledge graph based on the extraction method used for knowledge augmentation. For this initial prototype demonstration, our primary focus will be on showing the quality and coverage of di erent versions of the knowledge graph, and how simple major consequences of di erent types of events are present or missing in di erent versions. We will use examples of di erent types of events, including: 1) a \protest" event, e.g., recent protests in Myanmar, highlighting some of the high-ranked extracted causes and consequences from past protest events, which include a coup d'etat; 2) a \disease outbreak" event as the cause while excluding COVID-19 related articles from our source, showing how some actual consequences of the COVID-19 outbreak show up in the ranked results of di erent pipelines; 3) a hypothetical natural disaster event and showing context-speci c forecasts, e.g., similar to the work of Radinsky et al. [ 8 ] show how an earthquake (Q7944) at a location near an ocean would result in a forecast of tsunami (Q8070). We will also highlight a number of challenging examples and wrong forecasts and discuss a number of directions for future work that could turn this simple prototype into a powerful and reliable AI assistant for analysts. 4

Acknowledgements

This research is based upon work supported in part by U.S. DARPA KAIROS Program No. FA8750-19-C-0206. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the o cial policies, either expressed or implied, of DARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.

1. Awasthy , P. , Ni , J. , Barker , K. , Florian , R.: IBM MNLP IE at CASE 2021 task 1: Multigranular and multilingual event detection on protest news . In: Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021 ). pp. 138 { 146 ( 2021 )

2. Bhandari , M. , Feblowitz , M. , Hassanzadeh , O. , Srinivas , K. , Sohrabi , S. : Unsupervised causal knowledge extraction from text using natural language inference (student abstract) . In: Thirty-Fifth AAAI Conference on Arti cial Intelligence , AAAI 2021 . pp. 15759 { 15760 ( 2021 )

3. Bhattacharjya , D. , Gao , T. , Subramanian , D. : Order-dependent event models for agent interactions . In: Proceedings of the Twenty-Ninth International Joint Conference on Arti cial Intelligence , IJCAI 2020 . pp. 1977 { 1983 ( 2020 )

4. Conneau , A. , Khandelwal , K. , Goyal , N. , Chaudhary , V. , Wenzek , G. , Guzman , F. , Grave , E. , Ott , M. , Zettlemoyer , L. , Stoyanov , V. : Unsupervised cross-lingual representation learning at scale . In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL) . pp. 8440 { 8451 ( 2020 )

5. Dunietz , J. , Levin , L.S. , Carbonell, J.G.: The BECauSE corpus 2.0: Annotating causality and overlapping relations . In: Proceedings of the 11th Linguistic Annotation Workshop , LAW@EACL. pp. 95 { 104 ( 2017 )

6. Li , B.Z. , Min , S. , Iyer , S. , Mehdad , Y. , Yih , W.: E cient one-pass end-to-end entity linking for questions . In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 , 2020 . pp. 6433 { 6441 ( 2020 )

7. Mintz , M. , Bills , S. , Snow , R. , Jurafsky , D. : Distant supervision for relation extraction without labeled data . In: ACL 2009 , Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP . pp. 1003 { 1011 ( 2009 )

8. Radinsky , K. , Davidovich , S. , Markovitch , S. : Learning to predict from textual data . J. Artif. Intell. Res . 45 , 641 { 684 ( 2012 )

9. Soares , L.B. , FitzGerald , N., Ling , J. , Kwiatkowski , T. : Matching the blanks: Distributional similarity for relation learning . In: Proceedings of the 57th Conference of the Association for Computational Linguistics , ACL 2019 . pp. 2895 { 2905 ( 2019 )

10. Vrandecic , D. , Krotzsch, M.: Wikidata: a free collaborative knowledgebase . Commun. ACM 57 ( 10 ), 78 { 85 ( 2014 )

11. Wu , L. , Petroni , F. , Josifoski , M. , Riedel , S. , Zettlemoyer , L. : Scalable zero-shot entity linking with dense entity retrieval . In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing , EMNLP 2020 . pp. 6397 { 6407 ( 2020 )