Towards Causal Knowledge Graphs - Position Paper
                                      Eva Blomqvist1 and Marjan Alirezaie2 and Marina Santini3


Abstract. In this position paper, we highlight that being able to                        that have not been previously encountered, based on general princi-
analyse the cause-effect relationships for determining the causal sta-                   ples. There is an active field of research developing specific ML/DL
tus among a set of events is an essential requirement in many contexts                   algorithms targeting causal learning and reasoning. However, only
and argue that cannot be overlooked when building systems target-                        targeting ML/DL-based causal reasoning does not necessarily im-
ing real-world use cases. This is especially true for medical contexts                   prove interpretability, hence there is a need to also develop methods
where the understanding of the cause(s) of a symptom, or observa-                        for producing and utilising interpretable causal models, as we shall
tion, is of vital importance. However, most approaches purely based                      discuss further in Section 3.
on Machine Learning (ML) do not explicitly represent and reason                             KGs, being symbolic models, allow to define the semantics of
with causal relations, and may therefore mistake correlation for cau-                    relations in data, at the level of formalisation necessary for an in-
sation. In the paper, we therefore argue for an approach to extract                      tended task, e.g., through ontologies if needed, and by integration
causal relations from text, and represent them in the form of Knowl-                     with ML/DL methods this supports interpretability of predictions.
edge Graphs (KG), to empower downstream ML applications, or AI                           Hence, KGs can be used to address both the main shortcomings of
systems in general, with the ability to distinguish correlation from                     ML/DL mentioned earlier, but the construction of KGs is a major
causation and reason with causality in an explicit manner. So far,                       bottleneck in their adoption, just as was the case with knowledge rep-
the bottlenecks in KG creation have been scalability and accuracy                        resentation in general, in early AI systems. Outside large companies,
of automated methods, hence, we argue that two novel features are                        such as Google, and huge crowdsourcing initiatives, such as Wikidata
required from methods for addressing these challenges, i.e. (i) the                      [32], it is usually infeasible to construct large scale KGs ”manually”.
use of Knowledge Patterns to guide the KG generation process to-                         Rather, they have to be bootstrapped from existing sources, such as
wards a certain resulting knowledge structure, and (ii) the use of a                     semi-structured data or text. Current KG generation algorithms, how-
semantic referee to automatically curate the extracted knowledge. We                     ever, either do not take into account the desired formalisation of the
claim that this will be an important step forward for supporting inter-                  KG at all, or they hard-code it into the extraction algorithm. An ex-
pretable AI systems, and integrating ML and knowledge representa-                        ample of the latter is DBPedia [20], which is specific to a Wiki source
tion approaches, such as KGs, which should also generalise well to                       and results are expressed using a fixed ontology, which means the
other types of relations, apart from causality.                                          method does not generalise to new settings or other input structures.
                                                                                         Additionally, the quality of the generated KGs is usually poor [11],
                                                                                         requiring manual curation, and further, no automated approach so far
1     Introduction                                                                       targets complex relations, e.g. causality. Therefore, it is our goal to
Knowledge Graphs (KGs) have emerged in the past decade as a                              specifically target new methods and algorithms for KG generation
prominent form of knowledge representation, frequently used by                           from text, which (a) explicitly take KG requirements into account,
large enterprises such as Google, Facebook, Amazon, Siemens, and                         e.g. allowing to flexibly specify the required schema of the output
many more [16]. A KG is simply a graph representing some set of                          graph, and (b) automate the curation process, to radically improve
data, usually coupled with a way to explicitly represent the mean-                       the quality of resulting KGs. In order to fulfil a specific set of KG
ing of the data, e.g. an ontology. This can be seen as a revival of                      requirements, as well as to achieve a sufficient level of accuracy, we
graph-based knowledge representation, with roots in the early 1970’s                     propose to use the notion of Knowledge Patterns (KPs) [?] as for-
(for instance, the term knowledge graph was used as early as 1972                        malisations of KG requirements. A KP represents both a linguistic
by [28]), but with recent advances mainly related to the Semantic                        frame that can be detected in text [2], but also the representation of
Web, such as Linked Data on the Web, and Semantic Web ontolo-                            that frame in the desired KG output formalism, i.e. similar to the no-
gies. This renewed popularity has been accelerated by two main re-                       tion of Ontology Design Patterns (ODP) [5, 4]. In order to tackle a
alisations regarding Machine Learning (ML), including Deep Learn-                        particularly important obstacle to the future development of the AI
ing (DL) models: Although outperforming humans on many specific                          field, i.e., considering the importance of causal models and reason-
tasks, ML/DL methods (i) are often unable to determine the seman-                        ing, we intend to specifically target KPs and KGs targeting complex
tics of the correlations found in the data, and (ii) lack the ability to                 causal relations.
transparently explain a prediction. A particularly challenging exam-
ple is the case of causal relations. As pointed out by [17] the future
development of AI depends on building systems that incorporate the                       2    ML - Causality and Interpretability
notion of causality, e.g. to allow the system to reason about situations
                                                                                         While ML methods perform very well in learning complex connec-
1 Linköping University, Sweden, email: eva.blomqvist@liu.se                             tions between large amounts of input and output data, there is no
2 Örebro University, Sweden, email: marjan.alirezaie@oru.se                             guarantee that they capture causation (cause and effect relations).
3 RISE, Research Institutes of Sweden, email: marina.santini@ri.se
                                                                                         This shortcoming stems in part from the ignorance of data-driven


    Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
methods with respect to reasoning techniques, which are effortlessly           need of manually creating such models for each use case. However,
applied by humans. Consider the two imaginary groups of people:                for many domains nowadays, such as e-health and patient monitoring
Group A: 100 asthmatic people with a death rate of 40%, and Group              through smart homes, both the set of potential outcomes and the set
B: 100 asthmatic people who also suffer from pneumonia, with a                 of variables are extremely large. Therefore, manually constructing
death rate of 35%. A ML method solely fed with the data can only               and maintaining causal models requires a huge effort, and cannot be
learn a nonsense result saying: asthmatics with pneumonia have more            easily adapted to a new domain. Even further, manual construction of
chances to live! [8]. The learning method has perhaps learned the as-          models representing all the environmental features and relations may
sociations (or correlations) among the variables in the data correctly.        not even be practically feasible, due to the changing nature of the
However, due to the absence of context and common sense knowl-                 environment. This has already changed the focus of research to auto-
edge, and also the lack of reasoning abilities, the method has not been        matically generating causal models [22], which is a line of research
able to explicitly and correctly capture the cause-effect relations.           we are also contributing to.
That is why the outcome of the example above is not only counterin-               Furthermore, causal relations are usually not as simple as one ex-
tuitive, but also misleading. By context, we refer to any information          plicit link between two well-defined (cause and effect) concepts. De-
that may not be represented in the observed data directly, and may             pending on the context and the conditions, we may, for instance, end
include the actual causes behind the observations, e.g., some set of           up with a set of causations with different certainty values. The ap-
background information about the setting. In the given example, peo-           propriate modelling of the causal relations also heavily depend on
ple in Group B are more high risk patients than those in Group A. The          the use cases of the resulting model, e.g., the kind of reasoning and
lower death rate of people in Group B can have different reasons, for          prediction tasks that it should support. For instance, reasoning on po-
instance, due to their high risk status they may more likely be taken          tential guideline and treatment interactions in an individual patient
to the intensive care unit (ICU) or they may be taking more effective          context, e.g., the target use case of [7], requires a highly complex
medicines, which are all factors (or features) not considered by the           causal model, while in other cases a more simple one might suffice.
learning model [29]. Additionally, some common sense knowledge,                In Fig. 1, we illustrate this through two examples. At the right (b) is a
such that additional diseases generally increase mortality rather than         highly complex conceptual model (inspired by the model in [7]) rep-
decreasing it, could have also supported a system in avoiding the er-          resenting the belief that a causal relation exists, with some frequency
roneous conclusion, e.g., through using knowledge representations              and strength. At the left (a) is a also a causal relation, but represented
as a referee for the learned model [1], as we will discuss furhter later       as a much more simple conceptual model.
on in this paper.                                                                 Our proposed method intends to address the lack of causal models,
   To provide sufficient support for a reliable and precise prediction         by automating the generation of highly accuracte causal KGs from
or diagnosis process, every prediction made by a system needs to               text. We intend to cater for the differing requirements of specific use
be perfectly transparent and interpretable by the user. This is neces-         cases by using Knowledge Patterns (KPs), similar to the conceptual
sary for any autonomous system to act as the support for humans in             models in Fig. 1 coupled with linguistic frames, to represent require-
making decisions, and even legally and ethically required in many              ments that make sure the resulting causal model enable the required
domains, including the medical domain. Although ML should def-                 type of reasoning or predictions.
initely be a part of the solution, what is predicted needs to be in-
terpretable, so that any conclusion based on that knowledge can be
explained in detail, most often including some notion of reliability or        4   Proposed Approach: Generating Causal KGs
confidence. A solution to this shortcoming of ML methods is to inte-               from Text
grate them with explicitly represented knowledge, such as in the case          The overarching goal of our research is to support the integration
of causality, a formal causal model that reflects all the possible and         of ML/DL and Knowledge Representation, for improving both ac-
existing relations, including cause-effect ones, among the concepts            curacy and interpretability of downstream AI applications. As dis-
of a given domain.                                                             cussed previously, we believe that KGs can play a crucial role in
                                                                               this integration, but then the KG construction bottleneck needs to be
3   Causal Models                                                              resolved. Therefore, we propose to develop new methods and algo-
                                                                               rithms for KG generation from text, which (a) explicitly take KG
By causal model, we refer to a parametric model that represents a set          requirements into account, e.g. allowing to flexibly specify the re-
of probability densities over variables including concepts defined in          quired schema of the output graph, and (b) automate the KG curation
a system (e.g., diseases and symptoms in the context of medicine),             process to radically improve the quality of resulting KGs with mini-
together with the plausible causal relationships between them [31].            mal human effort. In order to fulfil a specific set of KG requirements,
Once available, integration of causal information (inferred from a             as well as to achieve a sufficient level of accuracy, we argue that the
causal model) with the training (observational) data, can enable a             notion of Knowledge Patterns (KPs) [?] as formalisations of KG re-
ML/DL method to also learn the causes behind its mistakes (i.e., mis-          quirements, is a crucial concept. We here specifically focus on KPs
classification) [1], and consequently improve its performance. In this         and KGs targeting causal relations, since causal models and causal
paper we specifically target causal relations, i.e. the focus is not on        reasoning are one of the main challenges for ML approaches today.
determining the probability distributions but rather on the underlying         However, the approach we outline is generic, and by exchanging the
knowledge representation.                                                      KPs used, it can be used to target any type of complex relation that
   Although recent research reflects the considerable impact of causal         can be expressed in natural language. The proposed approach is a
inference in different domains, such as public health [15] or earth            novel combination of methods from ML/DL for NLP, with recent
science and climate change [27], it is still also challenging to involve       advancement in Knowledge Representation, such as KGs and KPs.
causal models within a learning process. One of the hindering fac-                As can be seen in Fig. 2, we propose a continuous process that
tors is, in fact, the lack of available domain-related causal models           iteratively improves its ML/DL models based on feedback from a
compatible with the data used for learning [22], which leads to the            curation step. As initial input (1), the process needs a set of KPs rep-

                                                                           2
                                                                                                                             causes            incompatibleWith

                           a)                                            b)
                                                                                                                                                                  hasAsCause      similarTo   opposedTo
                                                                                                        hasPreSituation               Event Type

                                Cause                    Eﬀect                                                                                              hasAsEﬀect                                    frequency
                                        causation                             Situation Type
                                                                                                 hasPostSituation         Transition Type                                         Causation Belief
                                                                                                                                                     Action Type                                           strength


Figure 1. Abstract (conceptual) illustration of two different KPs (here called a and b) for expressing causation, where the patterns produce models at different
levels of detail and complexity, hence, targeted at different use cases of the resulting KG. Notation in the figure is informal, but the models could be expressed
using an ontology language, such as OWL (as in [7]), in which case the boxes with rounded edges would represent classes, the unfilled arrows subclass relations,
and the filled arrows would be object or datatype properties attached to classes based on domain and range restrictions.


                                                                          4
                                                           KG
                                                        Generation                          KG
                                                                                          Curation
                                                                                          & Repair           Inference                                                         Downstream Tasks (future usage)
                                                                                                               engine

            KG                                                                                                                                                                   Use case - Medical decision support
          Require-                                                                                  5
                                                    3                                                                                                                                                     Diagnosis/explanation
           ments
                                 1                 KP                                                                                                                            Use case - e-Science for climate change
                                               Contextua-                                                                                                                                               Mitigation action/
                  KP
                                                lisation                            Tuning and                                                                                                        consequence analysis
               Selection
                                                                                     retraining                                   Causal                                         Use case …
                                 Knowledge                                2
                                  Pattern(s)                                                                                  Knowledge Graph

                                                                                               ML/NLP
                                                                                                tools


                                                                     1        Trained
                                                    Text corpus               model(s)


Figure 2. We propose to use KPs to guide the iterative ML process for extracting a Knowledge Graph from unstrutured texts, as well as automating the curation
process using a semantic referee. The generated Knowledge Graph, can later be used to support a ML method to derive causal relations from observational data.
                            Causation
  2               Cause                        Eﬀect
resenting the requirements         of the output      KG, one or more language               many cases more complex representations are needed, such as in-
models, as well as a text corpus from which to extract the KG. The                           cluding unknown variables, as introduced by Pearl [24], in the notion
                             Causation                                 Causation                                 Causation
language models then need to be tuned (2) to the relations expressed                         of Structural
                                                                                                       Cause
                                                                                                               Causal Models.Eﬀect  Creating a relation such as: COVID-
                  Cause                       Eﬀect            Cause                   Eﬀect
by the specific KPs at hand. Next, initial instantiations of the KPs,                        19:=f(SARS-CoV-2, randomness), which means that the appearance
       3 linguistic frames they represent, are detected in the text cor-
i.e., the                                                                                    of the COVID-19 disease depends on the virus and some other ran-
pus and formalised using      Causation
                                 the KP as a “schema” (3), whereafter            these
                                                                         Causation           dom vairable(s) Causation
                                                                                                                  independent      from the virus, e.g. environmental fac-
                                                                                                                               Continuous
                 Smoking                      COPD             COPD                    Cough          Covid-19                                   5
KP instantiations are merged into an initial KG (4). The initial KG is                       tors, and   features of the person  coughin question . An illustration of two
then subjected to an automated curation and repair process (5), where                        conceptual models for representing causal relations were already
the formalisation of the KPs is used by a semantic referee to detect                         given in Fig. 1. To instantiate such models (i.e., such KPs), linguistic
postential mistakes in the extracted KG, and suggest repair actions.            5       Checkexpessions
                                                                                                the graphof    causation
                                                                                                             using          suchagainst
                                                                                                                    a reasoner    as caused by, cause, as a result, for this
The result of this curation  Causation
                                  process is not only a high-quality KG, but axioms              in thedue
                                                                                             reason,    patterns
                                                                                                             to the- repair if necessary. and similar are cues identified
                                                                                                                      fact, consequently
  4 feedback sent Cause                       Eﬀect
also                    back in order to tune, or even retrain, the language                 by NLP tools, such as Part-of-Speech Taggers, Dependency Parsers,
                              is_a                        is_a                                    Additional quality checks?
models, is_a
           and to iteratively    extend the is_a
                                                KG by continuously      running the          lexicons, and the like [19, 9].
                    is_a
overall process. Below, we go into more details of the ML/DL-based                              The NLP task that will contribute to our envisioned approach is
NLPSmoking
        methods to be used,COPD    the role and nature of      Cough
                                                                  the KPs, and the           mainly Information Extraction (IE), or more specifically Relation
                 Causation                     Causation
semantic referee used for curation, respectively. is_a                                       Extraction (RE). Recent work in this field includes [30], who de-
                                                                                             scribe an innovative approach for relation learning, based on the pre-
                                                         Continuous
                           Covid-19                                                          training of a huge language model, such as BERT [10], passing sen-
4.1 Relation ExtractionCausation           from Text       cough
                                                                                             tences through its encoder to obtain an abstract notion of a relation,
Causal relations can be extracted from running text by exploiting lin-                       and then fine-tuning on a certain schema, like Wikidata or DBpe-
guistic cues and then the detected relations can be formalized, for in-                      dia, mainly containing simple binary relations. Our aim is similar,
stance,   in the form
 COPD happens      when  ofthe
                            simple
                                lungsfacts
                                        become(triples).  Fordamaged
                                                  inflamed,     example,and
                                                                          thenarrowed.
                                                                               causal        although we intend to develop a slightly different method that can
 The mainincause
relation       the is  smoking,COVID-19
                    sentence       although theiscondition
                                                     caused can by sometimes    aﬀect people
                                                                    the SARS-CoV-            bewho    have
                                                                                                  tuned  to never   smoked.
                                                                                                             a (combination       of) a set of smaller, abstract, KPs, tar-
2…virus, can be formalized as the fact <SARS-CoV-2 causes                                    geting more complex relations. An interesting aspect is that [30] are
COVID-19>, which could be represented as a triple in a standard                              also able to extract generic relations, i.e., potential schema exten-
  The main symptoms
Knowledge                  of COPDlanguage,
                Representation        are:         such as RDF4 . However, in
                                                                                                                            5
 - increasing breathlessness, particularly when you're active                  The notation is again informal, but the symbol := is here used to indicate
4 https://www.w3.org/RDF/
 - a persistent chesty cough with phlegm – some people may dismiss this as justthe
                                                                                a "smoker's   cough"
                                                                                   causal relation, and f() represents a function.
 -…

                                                                                                                      3
 Coronavirus (COVID-19)
 ...
 If you have symptoms of coronavirus (a high temperature or a new, continuous cough)...
sions, which might be a valuable addition in our proposed curation                Figure 2, we propose to tune the language models to detect the spe-
and feedback step. Earlier work on frame detection in text [14, 12],              cific KPs required, and further generate a KG from the instantiated
and generation of KGs from this, may also be relevant for compar-                 KPs.
ison, especially since [14] also applied the notion of KPs related to                Using KPs to guide the learning process makes it possible to cap-
the frames detected, however, they did not allow for the frames to be             ture different possible contextual situations separately, and target dif-
preselected as the KG requirements, or exchanged.                                 ferent causal models, each focused on a certain specific downstream
   Further, NELL [23] targets the learning of common facts, ex-                   task. Depending on the relations that are found in the text, KPs will
tracted from natural language texts. Although their approach does                 also allow us to calculate more precise certainty values for each cap-
not target a specific output structure or relation, i.e., specific KPs, the       tured cause, similar to how we have used knowledge representations
continuous improvement process is similar to our proposal. In other               as a referee for ML methods in our previous work[1]. This also al-
recent studies, such as by [25], KGs are also generated from natu-                lows us to filter out extracted knowledge that does not make sense,
ral language text, but they do not target complex relations such as               or is otherwise of questionable quality.
causality, and the approaches use a fixed output schema.                             However, this also introduces new challenges, because although
   Very little research exists on extracting more complex relations,              KPs have been studied to some extent for ontologies and the Seman-
i.e. relations that cannot be expressed as single facts (triples), and in         tic Web, there is so far no formal definition of a KP that can be used
particular causal relations, directly from text. One study that gener-            operationally (technically) by a system, in particular for KGs. For
ated causal KGs from text is [26]. The difference to our envisioned               this purpose we need to operationalise the definition in [13], by ex-
approach is mainly the types of input data, as well as that [26] tar-             panding on the connection between linguistic frames and ODPs, for
gets one fixed logical structure of the output, i.e., a single fixed KP           use within our KG extraction framework.
expressing simple direct relations between diagnoses and symptoms.
To learn a more complex formalisation of causality, we may also
need more complex learning, such as suggested by [18], who pro-
posed a method for extracting a relation graph directly from natural
                                                                                  4.3    Semantic Referee
language, where the relations express entailment rules rather than
simple facts (triples).                                                           Related to the integration of ML/DL and symbolic models, and us-
   Another area where NLP has been widely used is KG comple-                      ing knowledge representation to verify and repair results of ML/DL
tion, e.g., link and relation prediction in an existing KG. Although              algorithms, we rely on the idea of a semantic referee introduced in
we intend to generate a KG “from scratch”, the KG generation from                 our previous work [1]. In that work, we demonstrated the benefit of
instantiated KPs, as well as the subsequent curation process, have                a semantic referee applied upon a causal model in the form of an
some similarities with link and relation prediction. Hence, inspira-              ontology (OntoCity) for improving a satellite imagery data classifier.
tion may come from work such as [33], who propose to use pre-                     In particular, the ontology together with a reasoning process acted
trained language models for knowledge graph completion, scoring                   as a semantic referee to guide the ML method (i.e, the classifier).
candidate triples for addition through their KG-BERT model. This                  Using causal information represented in the ontology, the semantic
is similar to how we envision to assess potential links between the               referee was able to explain the causes behind errors, and send the ex-
instantiated KPs, when generating the overall KG. Another approach                planations as feedback to the classifier. In this way, the ML method
was recently proposed by [6], where language models such as GPT-2                 is able to know the causes behind its mistakes and therefore better
are combined with a seed KG, allowing the learning of its structure               learn from them [1]. We argue that this previous work, will be highly
and relations, whereafter the language model can generate new nodes               useful, when integrated as step (5) in our KG extraction framework,
and edges. However, our KPs are abstract and do not contain concrete              illustrated earlier in Fig. 2.
facts, which is a main difference to the seed KGs they used.


4.2    Knowledge Patterns                                                         5     Conclusion
The use of patterns in developing knowledge representation models
has a long tradition in AI, starting from the idea of Minsky in his pro-          In this paper, we propose a possible approach to capturing causal
posal of frames [21], and continued towards the notion of ontology                knowledge, in a scalable fashion, and representing it as a shared KG.
design patterns (ODP) in modern ontology engineering [5, 4]. ODPs                 We argue that the advantage of constructing causal KGs is the inte-
have also been generalised into KPs [13], where a KP may repre-                   gration of causality in reasoning and prediction processes, such as the
sents both a linguistic frame that can be detected in text [3], but also          medical diagnosis process, to improve the accuracy and reliability of
the representation of that frame in a desired output formalism. How-              existing ML/DL-based diagnosis methods, by producing transparent
ever, in [13], KPs are described and defined informally, and there is             justifications and explanations of the output.
currently no concrete formalism for representing and applying KPs                    More specifically we focus on KGs as a means for providing back-
specifically for KGs.                                                             ground knowledge and reasoning capabilities to ML/DL-based AI
   In order to capture specific types of knowledge from text, support-            systems, and target the KG creation bottleneck. In particular, we
ing a specific task, such as medical decision support, the knowledge              recognise the challenge related to causal relations, where the capabil-
extraction process needs to be carefully guided by the requirements               ity of performing causal reasoning is often lacking in pure ML-based
of the intended task of the resulting KG. Tasks may include different             systems. Therefore we propose to generate causal KGs from textual
types of queries, prediction, applying specific graph pattern matching            information, to then be used as the basis for causal models. Our novel
algorithms, or reasoning. To address this challenge we argue for ap-              framework is based on using a set of formal KPs as input, acting both
plying KPs as both a representation of the KG requirements, as well               as the requirements of the KG as well as the means for formalising
as acting as a “schema” for the resulting KG. In short, as shown in               the extracted knowledge and curate it through logical reasoning.

                                                                              4
REFERENCES                                                                                  actions of the Association for Computational Linguistics, 6, 703–717,
                                                                                            (2018).
                                                                                     [19]   Christopher S. G. Khoo, Syin Chan, and Yun Niu, ‘Extracting causal
 [1] Marjan Alirezaie, Martin Längkvist, Michael Sioutis, and Amy                          knowledge from a medical database using graphical patterns’, in Pro-
     Loutfi, ‘Semantic referee: A neural-symbolic framework for enhancing                   ceedings of the 38th Annual Meeting of the Association for Computa-
     geospatial semantic segmentation’, 10, 863–880, (2019).                                tional Linguistics, pp. 336–343, Hong Kong, (October 2000). Associa-
 [2] Collin F. Baker, Charles J. Fillmore, and John B. Lowe, ‘The berkeley                  tion for Computational Linguistics.
     framenet project’, in Proceedings of the 36th Annual Meeting of the As-         [20]   Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kon-
     sociation for Computational Linguistics and 17th International Confer-                 tokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey,
     ence on Computational Linguistics - Volume 1, ACL ’98/COLING ’98,                      Patrick van Kleef, Sören Auer, and Christian Bizer, ‘Dbpedia - A large-
     p. 86–90, USA, (1998). Association for Computational Linguistics.                      scale, multilingual knowledge base extracted from wikipedia’, Seman-
 [3] Collin F Baker, Charles J Fillmore, and John B Lowe, ‘The berke-                       tic Web, 6(2), 167–195, (2015).
     ley framenet project’, in Proceedings of the 17th international confer-         [21]   Marvin Minsky, ‘A framework for representing knowledge’, MIT-AI
     ence on Computational linguistics-Volume 1, pp. 86–90. Association                     Laboratory Memo 306.
     for Computational Linguistics, (1998).                                          [22]   Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel
 [4] Eva Blomqvist, Karl Hammar, and Valentina Presutti, ‘Engineering on-                   Dudley, ‘Deep learning for healthcare: review, opportunities and chal-
     tologies with patterns - the extreme design methodology’, in Ontology                  lenges’, Briefings in bioinformatics, 19 6, 1236–1246, (2018).
     Engineering with Ontology Design Patterns - Foundations and Appli-              [23]   Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar,
     cations, eds., Pascal Hitzler, Aldo Gangemi, Krzysztof Janowicz, Adila                 Bishan Yang, Justin Betteridge, Andrew Carlson, Bhanava Dalvi, Matt
     Krisnadhi, and Valentina Presutti, volume 25 of Studies on the Semantic                Gardner, Bryan Kisiel, et al., ‘Never-ending learning’, Communications
     Web, 23–50, IOS Press, (2016).                                                         of the ACM, 61(5), 103–115, (2018).
 [5] Eva Blomqvist and Kurt Sandkuhl, ‘Patterns in ontology engineering:             [24]   JUDEA PEARL, ‘Causal diagrams for empirical research’, Biometrika,
     Classification of ontology patterns’, in ICEIS 2005, Proceedings of the                82(4), 669–688, (12 1995).
     Seventh International Conference on Enterprise Information Systems,             [25]   Anderson Rossanez and Julio Cesar dos Reis, ‘Generating knowledge
     Miami, USA, May 25-28, 2005, eds., Chin-Sheng Chen, Joaquim Filipe,                    graphs from scientific literature of degenerative diseases’, (2019).
     Isabel Seruca, and José Cordeiro, pp. 413–416, (2005).                         [26]   Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng,
 [6] Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya,                     and David Sontag, ‘Learning a health knowledge graph from electronic
     Asli Celikyilmaz, and Yejin Choi, ‘Comet: Commonsense trans-                           medical records’, Scientific reports, 7(1), 1–11, (2017).
     formers for automatic knowledge graph construction’, arXiv preprint             [27]   J. Runge, S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle,
     arXiv:1906.05317, (2019).                                                              M. Glymour, C. andKretschmer, M.D. Mahecha, E.H. van Nes, J. Pe-
 [7] V. Carretta Zamborlini, Knowledge Representation for Clinical Guide-                   ters, R. Quax, M. Reichstein, B. Scheffer, M. Schölkopf, P. Spirtes,
     lines: with applications to Multimorbidity Analysis and Literature                     G. Sugihara, J. Sun, Ka. Zhang, and J. Zscheischler, ‘Inferring causa-
     Search, Ph.D. dissertation, Vrije Universiteit Amsterdam, 2017.                        tion from time series with perspectives in earth system sciences’, Na-
 [8] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and                     ture Communications, (2019).
     Noemie Elhadad, ‘Intelligible models for healthcare: Predicting pneu-           [28]   Edward W. Schneider, ‘Course modularization applied: The interface
     monia risk and hospital 30-day readmission’, in Proceedings of the 21th                system and its implications for sequence control and data analysis’,
     ACM SIGKDD International Conference on Knowledge Discovery and                         (1973).
     Data Mining, KDD ’15, p. 1721–1730, New York, NY, USA, (2015).                  [29]   Peter Schulam and Suchi Saria, ‘Reliable decision support using coun-
     Association for Computing Machinery.                                                   terfactual models’, in Advances in Neural Information Processing Sys-
 [9] Tirthankar Dasgupta, Rupsa Saha, Lipika Dey, and Abir Naskar, ‘Au-                     tems 30, eds., I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fer-
     tomatic extraction of causal relations from text using linguistically in-              gus, S. Vishwanathan, and R. Garnett, 1697–1708, Curran Associates,
     formed deep neural networks’, in Proceedings of the 19th Annual SIG-                   Inc., (2017).
     dial Meeting on Discourse and Dialogue, pp. 306–316, Melbourne,                 [30]   Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom
     Australia, (July 2018). Association for Computational Linguistics.                     Kwiatkowski, ‘Matching the blanks: Distributional similarity for rela-
[10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova,                      tion learning’, in Proceedings of the 57th Annual Meeting of the Asso-
     ‘Bert: Pre-training of deep bidirectional transformers for language un-                ciation for Computational Linguistics, pp. 2895–2905, (2019).
     derstanding’, in Proceedings of the 2019 Conference of the North Amer-          [31]   Peter Spirtes, ‘Introduction to causal inference’, J. Mach. Learn. Res.,
     ican Chapter of the Association for Computational Linguistics: Human                   11, 1643–1662, (August 2010).
     Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–              [32]   Denny Vrandecic and Markus Krötzsch, ‘Wikidata: a free collaborative
     4186, (2019).                                                                          knowledgebase’, Commun. ACM, 57(10), 78–85, (2014).
[11] Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Ret-            [33]   Liang Yao, Chengsheng Mao, and Yuan Luo, ‘Kg-bert: Bert for knowl-
     tinger, ‘Linked data quality of dbpedia, freebase, opencyc, wikidata,                  edge graph completion’, arXiv preprint arXiv:1909.03193, (2019).
     and YAGO’, Semantic Web, 9(1), 77–129, (2018).
[12] Marco Fossati, Emilio Dorigatti, and Claudio Giuliano, ‘N-ary relation
     extraction for simultaneous t-box and a-box knowledge base augmen-
     tation’, Semantic Web, 9(4), 413–439, (2018).
[13] Aldo Gangemi and Valentina Presutti, ‘Towards a pattern science for
     the semantic web’, Semantic Web, 1(1-2), 61–68, (2010).
[14] Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, An-
     drea Giovanni Nuzzolese, Francesco Draicchio, and Misael Mongiovı̀,
     ‘Semantic web machine reading with fred’, Semantic Web, 8(6), 873–
     893, (2017).
[15] Thomas A. Glass, Steven N. Goodman, Miguel A. Hernán, and
     Jonathan M. Samet, ‘Causal inference in public health’, (March 2013).
[16] Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato,
     Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sab-
     rina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli,
     Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas
     Schmelzeisen, Juan Sequeda, Steffen Staab, and Antoine Zimmermann.
     Knowledge graphs, 2020.
[17] Pearl J. and Mackenzie D., The book of why: the new science of cause
     and effect, Basic Books, 2018.
[18] Mohammad Javad Hosseini, Nathanael Chambers, Siva Reddy,
     Xavier R Holt, Shay B Cohen, Mark Johnson, and Mark Steedman,
     ‘Learning typed entailment graphs with global soft constraints’, Trans-


                                                                                 5