=Paper=
{{Paper
|id=Vol-2980/paper322
|storemode=property
|title=Predicting the Future
with Wikidata and Wikipedia
|pdfUrl=https://ceur-ws.org/Vol-2980/paper322.pdf
|volume=Vol-2980
|authors=Oktie Hassanzadeh
|dblpUrl=https://dblp.org/rec/conf/semweb/Hassanzadeh21
}}
==Predicting the Future
with Wikidata and Wikipedia==
Predicting the Future with
Wikidata and Wikipedia
Oktie Hassanzadeh[0000−0001−5307−9857]
IBM Research
hassanzadeh@us.ibm.com
Abstract. In this demonstration, we present a prototype knowledge-
based event forecasting system powered by Wikidata and Wikipedia. The
system relies on existing event-related concepts and relations in Wiki-
data to build a base knowledge graph of events and consequences. It then
uses a combination of state-of-the-art knowledge extraction methods to
augment the base knowledge graph using natural language descriptions
of events and their consequences that can be found in Wikipedia articles.
Using a number of use case scenarios, we show how the resulting knowl-
edge graph can be used as a part of a human-in-the-loop explainable
solution for event forecasting and analysis.
1 Introduction
Wikipedia is a rich source of knowledge about major events and their conse-
quences. Major newsworthy events often result in many additions and new pages
describing various aspects of the events in detail. In particular, there are often
descriptions of causes and effects of events, either explicitly in text, or implicitly
in statements, sections, or descriptions of timelines of events. Figure 1 shows
a few examples of such sources of causal knowledge around COVID-19 related
events. An effective representation of this knowledge in the form of a rich knowl-
edge graph can enable a deep analysis of past events and their consequences. This
can in turn be used as a mechanism of predicting the potential consequences of
ongoing events by mapping them to past similar events in the knowledge graph.
Wikidata [10] aims at representing the rich knowledge available in Wikipedia
in structured form. As shown in Figure 1, there are existing causal relations
such as has cause and has effect between many event-related concepts. We
will show in our demonstration how these existing links can be used for an anal-
ysis of potential effects of a given type of event. More importantly, we show how
we can turn the existing links into a base knowledge graph of events and conse-
quences, and then use the textual descriptions of events in Wikipedia articles to
augment the base knowledge graph. In what follows, we describe the architec-
ture of our prototype forecasting solution. We then present a brief sketch of our
demonstration plan.
Distribution Statement “A” (Approved for Public Release, Distribution Unlimited).
Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
2 O. Hassanzadeh
(from: https://en.wikipedia.org/wiki/COVID-19_pandemic)
Fig. 1. Examples of Sources of Causal Knowledge in Wikipedia & Wikidata
2 Event Forecasting System
Figure 2 shows the overall architecture of our system. At the core of the system
is a knowledge graph of events and consequences that is curated from existing
event-related concepts and relations in Wikidata. The knowledge graph is then
augmented with causal knowledge extracted from Wikipedia articles. The user
interacts with the system through a dashboard that allows performing various
analysis tasks over ongoing events and their potential consequences primarily
through matching to similar events and event types in the knowledge graph.
Knowledge Graph of Events and Consequences A base knowledge graph
is curated from existing concepts and links in Wikidata. Since our goal in this
work is analysis of major newsworthy events, we only include in the base knowl-
edge graph those event types that at least one of their instances have an existing
link to a Wikinews article. This way, we ensure that out of the thousands of
subclasses of type occurrence (Q1190554) and their instances, we only include
events that are likely to receive news coverage. We then query for all the exist-
ing causal relations in Wikidata using properties such as has effect (P1542),
contributing factor of (P1537), immediate cause of (P1536) and their
inverse properties. We then group the event types that are linked directly or
through their instances. Each link between event types is also annotated with
a set of base scores derived from simple frequency analysis, e.g., the number of
example pairs of instances, the number of triples for the event type and its in-
stances, and the number of Wikipedia pages linked to instances of the type. The
result is a collection of event types and their consequences, along with examples
for each cause-effect pair and scores that can be used for ranking of potential
consequences for a given event.
Predicting the Future with Wikidata and Wikipedia 3
Knowledge Graph of Events & Consequences Causal Knowledge Extraction Pipelines
e1 e2
Unsupervised Causal Knowledge Extraction
… … Neural QA Pattern Matching +
Models Neural NLI Models
…
…
Examples
Distantly Supervised Models for Causal
Knowledge Extraction
Dashboard Neural Relation Extraction Models
News Analysis Profile Effects Analysis
Temporal Event Analysis
Temporal Event Models Timeline Extraction
Event Analysis Cause-Effect Analysis
Fig. 2. System Architecture
Causal Knowledge Extraction Pipelines The base knowledge graph is aug-
mented with causal knowledge extracted from Wikipedia articles using a number
of causal knowledge extraction pipelines. This augmentation can be in the form
of a) finding new consequences for a given event of interest, b) finding new ex-
ample cause-effect pairs of instances for a pair of event types, and c) calculating
scores reflecting the likelihood or significance for a causal relation between two
events. Given the variety of ways that causal knowledge can be captured in
Wikipedia documents as depicted in Figure 1, we need a number of different
knowledge extraction approaches. In this demonstration, we show examples of
three different kinds of pipelines we have implemented in our prototype:
Unsupervised Causal Knowledge Extraction: 1) An approach relying on pattern
matching and neural Natural Language Inference (NLI) models. Briefly, the ap-
proach we show is an adaptation of the approach of Bhandari et al. [2] which
is a fully unsupervised pipeline with a high precision of nearly 80% in manual
evaluations. We link the output phrases to Wikidata concepts using BLINK [11],
keeping only high-confidence links. 2) An approach relying on neural Question
Answering (QA) models that a) generates questions using a set of templates,
such as “What could X cause?” or “What was a major consequence of X?”
where X is a label of an event type or instance, b) uses pre-trained neural QA
models and Wikipedia articles associated with the target event to retrieve an
answer for the generated questions, and c) performs entity linking to link the
answer to Wikidata.
Supervised Models for Causal Knowledge Extraction: We use neural models [1,
4] trained on existing annotated data such as the BECauSE Corpus 2.0 [5]
for extraction of cause-effect phrases from a corpus of event-related Wikipedia
articles, and perform entity linking [6] on the phrases to map them to Wikipedia
and then Wikidata concepts in the base knowledge graph. We are also exploring
distantly supervised models [7] by constructing a training set through finding
4 O. Hassanzadeh
passages containing labels of pairs of events in the base knowledge graph, and
using a neural model of relation extraction [9] to extract new causal relations.
Temporal Event Analysis: This pipeline first extracts event timelines from time-
line sections and pages (examples shown in Figure 1), then maps the extracted
sequences to Wikidata events, and then applies existing and novel temporal event
models [3] to the sequences of events that will facilitate more complex analysis
of potential temporal and causal relations between event types along with likeli-
hood scores that will better facilitate the ranking of potential consequences for
a given event and context.
Dashboard The user dashboard exposes a number of API functions that use
the knowledge graph to assist the user with event analysis and forecasting. The
APIs allow the user to 1) retrieve the latest news events and their context, 2)
retrieve a list of potential consequences for a given event/context, along with
explanation in the form of example similar past events and consequences, and
3) rank and re-rank the consequences based on different criteria. The user can
optionally define a profile that will be used for ranking the consequences based
on how interesting or surprising the consequence could be for the user.
3 Demonstration Plan
We plan to use a number of use cases involving different recent or ongoing events,
and show the ranked list of consequences according to the base knowledge graph
as well as different versions of the knowledge graph based on the extraction
method used for knowledge augmentation. For this initial prototype demonstra-
tion, our primary focus will be on showing the quality and coverage of different
versions of the knowledge graph, and how simple major consequences of different
types of events are present or missing in different versions. We will use examples
of different types of events, including: 1) a “protest” event, e.g., recent protests
in Myanmar, highlighting some of the high-ranked extracted causes and con-
sequences from past protest events, which include a coup d’état; 2) a “disease
outbreak” event as the cause while excluding COVID-19 related articles from
our source, showing how some actual consequences of the COVID-19 outbreak
show up in the ranked results of different pipelines; 3) a hypothetical natural
disaster event and showing context-specific forecasts, e.g., similar to the work
of Radinsky et al. [8] show how an earthquake (Q7944) at a location near an
ocean would result in a forecast of tsunami (Q8070). We will also highlight a
number of challenging examples and wrong forecasts and discuss a number of
directions for future work that could turn this simple prototype into a powerful
and reliable AI assistant for analysts.
4 Acknowledgements
This research is based upon work supported in part by U.S. DARPA KAIROS
Program No. FA8750-19-C-0206. The views and conclusions contained herein are
Predicting the Future with Wikidata and Wikipedia 5
those of the authors and should not be interpreted as necessarily representing the
official policies, either expressed or implied, of DARPA, or the U.S. Government.
The U.S. Government is authorized to reproduce and distribute reprints for
governmental purposes notwithstanding any copyright annotation therein.
References
1. Awasthy, P., Ni, J., Barker, K., Florian, R.: IBM MNLP IE at CASE 2021 task 1:
Multigranular and multilingual event detection on protest news. In: Proceedings
of the 4th Workshop on Challenges and Applications of Automated Extraction of
Socio-political Events from Text (CASE 2021). pp. 138–146 (2021)
2. Bhandari, M., Feblowitz, M., Hassanzadeh, O., Srinivas, K., Sohrabi, S.: Unsu-
pervised causal knowledge extraction from text using natural language inference
(student abstract). In: Thirty-Fifth AAAI Conference on Artificial Intelligence,
AAAI 2021. pp. 15759–15760 (2021)
3. Bhattacharjya, D., Gao, T., Subramanian, D.: Order-dependent event models for
agent interactions. In: Proceedings of the Twenty-Ninth International Joint Con-
ference on Artificial Intelligence, IJCAI 2020. pp. 1977–1983 (2020)
4. Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán,
F., Grave, E., Ott, M., Zettlemoyer, L., Stoyanov, V.: Unsupervised cross-lingual
representation learning at scale. In: Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics (ACL). pp. 8440–8451 (2020)
5. Dunietz, J., Levin, L.S., Carbonell, J.G.: The BECauSE corpus 2.0: Annotating
causality and overlapping relations. In: Proceedings of the 11th Linguistic Anno-
tation Workshop, LAW@EACL. pp. 95–104 (2017)
6. Li, B.Z., Min, S., Iyer, S., Mehdad, Y., Yih, W.: Efficient one-pass end-to-end entity
linking for questions. In: Proceedings of the 2020 Conference on Empirical Methods
in Natural Language Processing, EMNLP 2020, 2020. pp. 6433–6441 (2020)
7. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extrac-
tion without labeled data. In: ACL 2009, Proceedings of the 47th Annual Meeting
of the Association for Computational Linguistics and the 4th International Joint
Conference on Natural Language Processing of the AFNLP. pp. 1003–1011 (2009)
8. Radinsky, K., Davidovich, S., Markovitch, S.: Learning to predict from textual
data. J. Artif. Intell. Res. 45, 641–684 (2012)
9. Soares, L.B., FitzGerald, N., Ling, J., Kwiatkowski, T.: Matching the blanks: Dis-
tributional similarity for relation learning. In: Proceedings of the 57th Conference
of the Association for Computational Linguistics, ACL 2019. pp. 2895–2905 (2019)
10. Vrandecic, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Com-
mun. ACM 57(10), 78–85 (2014)
11. Wu, L., Petroni, F., Josifoski, M., Riedel, S., Zettlemoyer, L.: Scalable zero-shot
entity linking with dense entity retrieval. In: Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Processing, EMNLP 2020. pp. 6397–6407
(2020)