Towards Causal Knowledge Graphs - Position Paper Eva Blomqvist1 and Marjan Alirezaie2 and Marina Santini3 Abstract. In this position paper, we highlight that being able to that have not been previously encountered, based on general princi- analyse the cause-effect relationships for determining the causal sta- ples. There is an active field of research developing specific ML/DL tus among a set of events is an essential requirement in many contexts algorithms targeting causal learning and reasoning. However, only and argue that cannot be overlooked when building systems target- targeting ML/DL-based causal reasoning does not necessarily im- ing real-world use cases. This is especially true for medical contexts prove interpretability, hence there is a need to also develop methods where the understanding of the cause(s) of a symptom, or observa- for producing and utilising interpretable causal models, as we shall tion, is of vital importance. However, most approaches purely based discuss further in Section 3. on Machine Learning (ML) do not explicitly represent and reason KGs, being symbolic models, allow to define the semantics of with causal relations, and may therefore mistake correlation for cau- relations in data, at the level of formalisation necessary for an in- sation. In the paper, we therefore argue for an approach to extract tended task, e.g., through ontologies if needed, and by integration causal relations from text, and represent them in the form of Knowl- with ML/DL methods this supports interpretability of predictions. edge Graphs (KG), to empower downstream ML applications, or AI Hence, KGs can be used to address both the main shortcomings of systems in general, with the ability to distinguish correlation from ML/DL mentioned earlier, but the construction of KGs is a major causation and reason with causality in an explicit manner. So far, bottleneck in their adoption, just as was the case with knowledge rep- the bottlenecks in KG creation have been scalability and accuracy resentation in general, in early AI systems. Outside large companies, of automated methods, hence, we argue that two novel features are such as Google, and huge crowdsourcing initiatives, such as Wikidata required from methods for addressing these challenges, i.e. (i) the [32], it is usually infeasible to construct large scale KGs ”manually”. use of Knowledge Patterns to guide the KG generation process to- Rather, they have to be bootstrapped from existing sources, such as wards a certain resulting knowledge structure, and (ii) the use of a semi-structured data or text. Current KG generation algorithms, how- semantic referee to automatically curate the extracted knowledge. We ever, either do not take into account the desired formalisation of the claim that this will be an important step forward for supporting inter- KG at all, or they hard-code it into the extraction algorithm. An ex- pretable AI systems, and integrating ML and knowledge representa- ample of the latter is DBPedia [20], which is specific to a Wiki source tion approaches, such as KGs, which should also generalise well to and results are expressed using a fixed ontology, which means the other types of relations, apart from causality. method does not generalise to new settings or other input structures. Additionally, the quality of the generated KGs is usually poor [11], requiring manual curation, and further, no automated approach so far 1 Introduction targets complex relations, e.g. causality. Therefore, it is our goal to Knowledge Graphs (KGs) have emerged in the past decade as a specifically target new methods and algorithms for KG generation prominent form of knowledge representation, frequently used by from text, which (a) explicitly take KG requirements into account, large enterprises such as Google, Facebook, Amazon, Siemens, and e.g. allowing to flexibly specify the required schema of the output many more [16]. A KG is simply a graph representing some set of graph, and (b) automate the curation process, to radically improve data, usually coupled with a way to explicitly represent the mean- the quality of resulting KGs. In order to fulfil a specific set of KG ing of the data, e.g. an ontology. This can be seen as a revival of requirements, as well as to achieve a sufficient level of accuracy, we graph-based knowledge representation, with roots in the early 1970’s propose to use the notion of Knowledge Patterns (KPs) [?] as for- (for instance, the term knowledge graph was used as early as 1972 malisations of KG requirements. A KP represents both a linguistic by [28]), but with recent advances mainly related to the Semantic frame that can be detected in text [2], but also the representation of Web, such as Linked Data on the Web, and Semantic Web ontolo- that frame in the desired KG output formalism, i.e. similar to the no- gies. This renewed popularity has been accelerated by two main re- tion of Ontology Design Patterns (ODP) [5, 4]. In order to tackle a alisations regarding Machine Learning (ML), including Deep Learn- particularly important obstacle to the future development of the AI ing (DL) models: Although outperforming humans on many specific field, i.e., considering the importance of causal models and reason- tasks, ML/DL methods (i) are often unable to determine the seman- ing, we intend to specifically target KPs and KGs targeting complex tics of the correlations found in the data, and (ii) lack the ability to causal relations. transparently explain a prediction. A particularly challenging exam- ple is the case of causal relations. As pointed out by [17] the future development of AI depends on building systems that incorporate the 2 ML - Causality and Interpretability notion of causality, e.g. to allow the system to reason about situations While ML methods perform very well in learning complex connec- 1 Linköping University, Sweden, email: eva.blomqvist@liu.se tions between large amounts of input and output data, there is no 2 Örebro University, Sweden, email: marjan.alirezaie@oru.se guarantee that they capture causation (cause and effect relations). 3 RISE, Research Institutes of Sweden, email: marina.santini@ri.se This shortcoming stems in part from the ignorance of data-driven Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). methods with respect to reasoning techniques, which are effortlessly need of manually creating such models for each use case. However, applied by humans. Consider the two imaginary groups of people: for many domains nowadays, such as e-health and patient monitoring Group A: 100 asthmatic people with a death rate of 40%, and Group through smart homes, both the set of potential outcomes and the set B: 100 asthmatic people who also suffer from pneumonia, with a of variables are extremely large. Therefore, manually constructing death rate of 35%. A ML method solely fed with the data can only and maintaining causal models requires a huge effort, and cannot be learn a nonsense result saying: asthmatics with pneumonia have more easily adapted to a new domain. Even further, manual construction of chances to live! [8]. The learning method has perhaps learned the as- models representing all the environmental features and relations may sociations (or correlations) among the variables in the data correctly. not even be practically feasible, due to the changing nature of the However, due to the absence of context and common sense knowl- environment. This has already changed the focus of research to auto- edge, and also the lack of reasoning abilities, the method has not been matically generating causal models [22], which is a line of research able to explicitly and correctly capture the cause-effect relations. we are also contributing to. That is why the outcome of the example above is not only counterin- Furthermore, causal relations are usually not as simple as one ex- tuitive, but also misleading. By context, we refer to any information plicit link between two well-defined (cause and effect) concepts. De- that may not be represented in the observed data directly, and may pending on the context and the conditions, we may, for instance, end include the actual causes behind the observations, e.g., some set of up with a set of causations with different certainty values. The ap- background information about the setting. In the given example, peo- propriate modelling of the causal relations also heavily depend on ple in Group B are more high risk patients than those in Group A. The the use cases of the resulting model, e.g., the kind of reasoning and lower death rate of people in Group B can have different reasons, for prediction tasks that it should support. For instance, reasoning on po- instance, due to their high risk status they may more likely be taken tential guideline and treatment interactions in an individual patient to the intensive care unit (ICU) or they may be taking more effective context, e.g., the target use case of [7], requires a highly complex medicines, which are all factors (or features) not considered by the causal model, while in other cases a more simple one might suffice. learning model [29]. Additionally, some common sense knowledge, In Fig. 1, we illustrate this through two examples. At the right (b) is a such that additional diseases generally increase mortality rather than highly complex conceptual model (inspired by the model in [7]) rep- decreasing it, could have also supported a system in avoiding the er- resenting the belief that a causal relation exists, with some frequency roneous conclusion, e.g., through using knowledge representations and strength. At the left (a) is a also a causal relation, but represented as a referee for the learned model [1], as we will discuss furhter later as a much more simple conceptual model. on in this paper. Our proposed method intends to address the lack of causal models, To provide sufficient support for a reliable and precise prediction by automating the generation of highly accuracte causal KGs from or diagnosis process, every prediction made by a system needs to text. We intend to cater for the differing requirements of specific use be perfectly transparent and interpretable by the user. This is neces- cases by using Knowledge Patterns (KPs), similar to the conceptual sary for any autonomous system to act as the support for humans in models in Fig. 1 coupled with linguistic frames, to represent require- making decisions, and even legally and ethically required in many ments that make sure the resulting causal model enable the required domains, including the medical domain. Although ML should def- type of reasoning or predictions. initely be a part of the solution, what is predicted needs to be in- terpretable, so that any conclusion based on that knowledge can be explained in detail, most often including some notion of reliability or 4 Proposed Approach: Generating Causal KGs confidence. A solution to this shortcoming of ML methods is to inte- from Text grate them with explicitly represented knowledge, such as in the case The overarching goal of our research is to support the integration of causality, a formal causal model that reflects all the possible and of ML/DL and Knowledge Representation, for improving both ac- existing relations, including cause-effect ones, among the concepts curacy and interpretability of downstream AI applications. As dis- of a given domain. cussed previously, we believe that KGs can play a crucial role in this integration, but then the KG construction bottleneck needs to be 3 Causal Models resolved. Therefore, we propose to develop new methods and algo- rithms for KG generation from text, which (a) explicitly take KG By causal model, we refer to a parametric model that represents a set requirements into account, e.g. allowing to flexibly specify the re- of probability densities over variables including concepts defined in quired schema of the output graph, and (b) automate the KG curation a system (e.g., diseases and symptoms in the context of medicine), process to radically improve the quality of resulting KGs with mini- together with the plausible causal relationships between them [31]. mal human effort. In order to fulfil a specific set of KG requirements, Once available, integration of causal information (inferred from a as well as to achieve a sufficient level of accuracy, we argue that the causal model) with the training (observational) data, can enable a notion of Knowledge Patterns (KPs) [?] as formalisations of KG re- ML/DL method to also learn the causes behind its mistakes (i.e., mis- quirements, is a crucial concept. We here specifically focus on KPs classification) [1], and consequently improve its performance. In this and KGs targeting causal relations, since causal models and causal paper we specifically target causal relations, i.e. the focus is not on reasoning are one of the main challenges for ML approaches today. determining the probability distributions but rather on the underlying However, the approach we outline is generic, and by exchanging the knowledge representation. KPs used, it can be used to target any type of complex relation that Although recent research reflects the considerable impact of causal can be expressed in natural language. The proposed approach is a inference in different domains, such as public health [15] or earth novel combination of methods from ML/DL for NLP, with recent science and climate change [27], it is still also challenging to involve advancement in Knowledge Representation, such as KGs and KPs. causal models within a learning process. One of the hindering fac- As can be seen in Fig. 2, we propose a continuous process that tors is, in fact, the lack of available domain-related causal models iteratively improves its ML/DL models based on feedback from a compatible with the data used for learning [22], which leads to the curation step. As initial input (1), the process needs a set of KPs rep- 2 causes incompatibleWith a) b) hasAsCause similarTo opposedTo hasPreSituation Event Type Cause Effect hasAsEffect frequency causation Situation Type hasPostSituation Transition Type Causation Belief Action Type strength Figure 1. Abstract (conceptual) illustration of two different KPs (here called a and b) for expressing causation, where the patterns produce models at different levels of detail and complexity, hence, targeted at different use cases of the resulting KG. Notation in the figure is informal, but the models could be expressed using an ontology language, such as OWL (as in [7]), in which case the boxes with rounded edges would represent classes, the unfilled arrows subclass relations, and the filled arrows would be object or datatype properties attached to classes based on domain and range restrictions. 4 KG Generation KG Curation & Repair Inference Downstream Tasks (future usage) engine KG Use case - Medical decision support Require- 5 3 Diagnosis/explanation ments 1 KP Use case - e-Science for climate change Contextua- Mitigation action/ KP lisation Tuning and consequence analysis Selection retraining Causal Use case … Knowledge 2 Pattern(s) Knowledge Graph ML/NLP tools 1 Trained Text corpus model(s) Figure 2. We propose to use KPs to guide the iterative ML process for extracting a Knowledge Graph from unstrutured texts, as well as automating the curation process using a semantic referee. The generated Knowledge Graph, can later be used to support a ML method to derive causal relations from observational data. Causation 2 Cause Effect resenting the requirements of the output KG, one or more language many cases more complex representations are needed, such as in- models, as well as a text corpus from which to extract the KG. The cluding unknown variables, as introduced by Pearl [24], in the notion Causation Causation Causation language models then need to be tuned (2) to the relations expressed of Structural Cause Causal Models.Effect Creating a relation such as: COVID- Cause Effect Cause Effect by the specific KPs at hand. Next, initial instantiations of the KPs, 19:=f(SARS-CoV-2, randomness), which means that the appearance 3 linguistic frames they represent, are detected in the text cor- i.e., the of the COVID-19 disease depends on the virus and some other ran- pus and formalised using Causation the KP as a “schema” (3), whereafter these Causation dom vairable(s) Causation independent from the virus, e.g. environmental fac- Continuous Smoking COPD COPD Cough Covid-19 5 KP instantiations are merged into an initial KG (4). The initial KG is tors, and features of the person coughin question . An illustration of two then subjected to an automated curation and repair process (5), where conceptual models for representing causal relations were already the formalisation of the KPs is used by a semantic referee to detect given in Fig. 1. To instantiate such models (i.e., such KPs), linguistic postential mistakes in the extracted KG, and suggest repair actions. 5 Checkexpessions the graphof causation using suchagainst a reasoner as caused by, cause, as a result, for this The result of this curation Causation process is not only a high-quality KG, but axioms in thedue reason, patterns to the- repair if necessary. and similar are cues identified fact, consequently 4 feedback sent Cause Effect also back in order to tune, or even retrain, the language by NLP tools, such as Part-of-Speech Taggers, Dependency Parsers, is_a is_a Additional quality checks? models, is_a and to iteratively extend the is_a KG by continuously running the lexicons, and the like [19, 9]. is_a overall process. Below, we go into more details of the ML/DL-based The NLP task that will contribute to our envisioned approach is NLPSmoking methods to be used,COPD the role and nature of Cough the KPs, and the mainly Information Extraction (IE), or more specifically Relation Causation Causation semantic referee used for curation, respectively. is_a Extraction (RE). Recent work in this field includes [30], who de- scribe an innovative approach for relation learning, based on the pre- Continuous Covid-19 training of a huge language model, such as BERT [10], passing sen- 4.1 Relation ExtractionCausation from Text cough tences through its encoder to obtain an abstract notion of a relation, Causal relations can be extracted from running text by exploiting lin- and then fine-tuning on a certain schema, like Wikidata or DBpe- guistic cues and then the detected relations can be formalized, for in- dia, mainly containing simple binary relations. Our aim is similar, stance, in the form COPD happens when ofthe simple lungsfacts become(triples). Fordamaged inflamed, example,and thenarrowed. causal although we intend to develop a slightly different method that can The mainincause relation the is smoking,COVID-19 sentence although theiscondition caused can by sometimes affect people the SARS-CoV- bewho have tuned to never smoked. a (combination of) a set of smaller, abstract, KPs, tar- 2…virus, can be formalized as the fact , which could be represented as a triple in a standard also able to extract generic relations, i.e., potential schema exten- The main symptoms Knowledge of COPDlanguage, Representation are: such as RDF4 . However, in 5 - increasing breathlessness, particularly when you're active The notation is again informal, but the symbol := is here used to indicate 4 https://www.w3.org/RDF/ - a persistent chesty cough with phlegm – some people may dismiss this as justthe a "smoker's cough" causal relation, and f() represents a function. -… 3 Coronavirus (COVID-19) ... If you have symptoms of coronavirus (a high temperature or a new, continuous cough)... sions, which might be a valuable addition in our proposed curation Figure 2, we propose to tune the language models to detect the spe- and feedback step. Earlier work on frame detection in text [14, 12], cific KPs required, and further generate a KG from the instantiated and generation of KGs from this, may also be relevant for compar- KPs. ison, especially since [14] also applied the notion of KPs related to Using KPs to guide the learning process makes it possible to cap- the frames detected, however, they did not allow for the frames to be ture different possible contextual situations separately, and target dif- preselected as the KG requirements, or exchanged. ferent causal models, each focused on a certain specific downstream Further, NELL [23] targets the learning of common facts, ex- task. Depending on the relations that are found in the text, KPs will tracted from natural language texts. Although their approach does also allow us to calculate more precise certainty values for each cap- not target a specific output structure or relation, i.e., specific KPs, the tured cause, similar to how we have used knowledge representations continuous improvement process is similar to our proposal. In other as a referee for ML methods in our previous work[1]. This also al- recent studies, such as by [25], KGs are also generated from natu- lows us to filter out extracted knowledge that does not make sense, ral language text, but they do not target complex relations such as or is otherwise of questionable quality. causality, and the approaches use a fixed output schema. However, this also introduces new challenges, because although Very little research exists on extracting more complex relations, KPs have been studied to some extent for ontologies and the Seman- i.e. relations that cannot be expressed as single facts (triples), and in tic Web, there is so far no formal definition of a KP that can be used particular causal relations, directly from text. One study that gener- operationally (technically) by a system, in particular for KGs. For ated causal KGs from text is [26]. The difference to our envisioned this purpose we need to operationalise the definition in [13], by ex- approach is mainly the types of input data, as well as that [26] tar- panding on the connection between linguistic frames and ODPs, for gets one fixed logical structure of the output, i.e., a single fixed KP use within our KG extraction framework. expressing simple direct relations between diagnoses and symptoms. To learn a more complex formalisation of causality, we may also need more complex learning, such as suggested by [18], who pro- posed a method for extracting a relation graph directly from natural 4.3 Semantic Referee language, where the relations express entailment rules rather than simple facts (triples). Related to the integration of ML/DL and symbolic models, and us- Another area where NLP has been widely used is KG comple- ing knowledge representation to verify and repair results of ML/DL tion, e.g., link and relation prediction in an existing KG. Although algorithms, we rely on the idea of a semantic referee introduced in we intend to generate a KG “from scratch”, the KG generation from our previous work [1]. In that work, we demonstrated the benefit of instantiated KPs, as well as the subsequent curation process, have a semantic referee applied upon a causal model in the form of an some similarities with link and relation prediction. Hence, inspira- ontology (OntoCity) for improving a satellite imagery data classifier. tion may come from work such as [33], who propose to use pre- In particular, the ontology together with a reasoning process acted trained language models for knowledge graph completion, scoring as a semantic referee to guide the ML method (i.e, the classifier). candidate triples for addition through their KG-BERT model. This Using causal information represented in the ontology, the semantic is similar to how we envision to assess potential links between the referee was able to explain the causes behind errors, and send the ex- instantiated KPs, when generating the overall KG. Another approach planations as feedback to the classifier. In this way, the ML method was recently proposed by [6], where language models such as GPT-2 is able to know the causes behind its mistakes and therefore better are combined with a seed KG, allowing the learning of its structure learn from them [1]. We argue that this previous work, will be highly and relations, whereafter the language model can generate new nodes useful, when integrated as step (5) in our KG extraction framework, and edges. However, our KPs are abstract and do not contain concrete illustrated earlier in Fig. 2. facts, which is a main difference to the seed KGs they used. 4.2 Knowledge Patterns 5 Conclusion The use of patterns in developing knowledge representation models has a long tradition in AI, starting from the idea of Minsky in his pro- In this paper, we propose a possible approach to capturing causal posal of frames [21], and continued towards the notion of ontology knowledge, in a scalable fashion, and representing it as a shared KG. design patterns (ODP) in modern ontology engineering [5, 4]. ODPs We argue that the advantage of constructing causal KGs is the inte- have also been generalised into KPs [13], where a KP may repre- gration of causality in reasoning and prediction processes, such as the sents both a linguistic frame that can be detected in text [3], but also medical diagnosis process, to improve the accuracy and reliability of the representation of that frame in a desired output formalism. How- existing ML/DL-based diagnosis methods, by producing transparent ever, in [13], KPs are described and defined informally, and there is justifications and explanations of the output. currently no concrete formalism for representing and applying KPs More specifically we focus on KGs as a means for providing back- specifically for KGs. ground knowledge and reasoning capabilities to ML/DL-based AI In order to capture specific types of knowledge from text, support- systems, and target the KG creation bottleneck. In particular, we ing a specific task, such as medical decision support, the knowledge recognise the challenge related to causal relations, where the capabil- extraction process needs to be carefully guided by the requirements ity of performing causal reasoning is often lacking in pure ML-based of the intended task of the resulting KG. Tasks may include different systems. Therefore we propose to generate causal KGs from textual types of queries, prediction, applying specific graph pattern matching information, to then be used as the basis for causal models. Our novel algorithms, or reasoning. To address this challenge we argue for ap- framework is based on using a set of formal KPs as input, acting both plying KPs as both a representation of the KG requirements, as well as the requirements of the KG as well as the means for formalising as acting as a “schema” for the resulting KG. In short, as shown in the extracted knowledge and curate it through logical reasoning. 4 REFERENCES actions of the Association for Computational Linguistics, 6, 703–717, (2018). [19] Christopher S. G. Khoo, Syin Chan, and Yun Niu, ‘Extracting causal [1] Marjan Alirezaie, Martin Längkvist, Michael Sioutis, and Amy knowledge from a medical database using graphical patterns’, in Pro- Loutfi, ‘Semantic referee: A neural-symbolic framework for enhancing ceedings of the 38th Annual Meeting of the Association for Computa- geospatial semantic segmentation’, 10, 863–880, (2019). tional Linguistics, pp. 336–343, Hong Kong, (October 2000). Associa- [2] Collin F. Baker, Charles J. Fillmore, and John B. Lowe, ‘The berkeley tion for Computational Linguistics. framenet project’, in Proceedings of the 36th Annual Meeting of the As- [20] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kon- sociation for Computational Linguistics and 17th International Confer- tokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, ence on Computational Linguistics - Volume 1, ACL ’98/COLING ’98, Patrick van Kleef, Sören Auer, and Christian Bizer, ‘Dbpedia - A large- p. 86–90, USA, (1998). Association for Computational Linguistics. scale, multilingual knowledge base extracted from wikipedia’, Seman- [3] Collin F Baker, Charles J Fillmore, and John B Lowe, ‘The berke- tic Web, 6(2), 167–195, (2015). ley framenet project’, in Proceedings of the 17th international confer- [21] Marvin Minsky, ‘A framework for representing knowledge’, MIT-AI ence on Computational linguistics-Volume 1, pp. 86–90. Association Laboratory Memo 306. for Computational Linguistics, (1998). [22] Riccardo Miotto, Fei Wang, Shuang Wang, Xiaoqian Jiang, and Joel [4] Eva Blomqvist, Karl Hammar, and Valentina Presutti, ‘Engineering on- Dudley, ‘Deep learning for healthcare: review, opportunities and chal- tologies with patterns - the extreme design methodology’, in Ontology lenges’, Briefings in bioinformatics, 19 6, 1236–1246, (2018). Engineering with Ontology Design Patterns - Foundations and Appli- [23] Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, cations, eds., Pascal Hitzler, Aldo Gangemi, Krzysztof Janowicz, Adila Bishan Yang, Justin Betteridge, Andrew Carlson, Bhanava Dalvi, Matt Krisnadhi, and Valentina Presutti, volume 25 of Studies on the Semantic Gardner, Bryan Kisiel, et al., ‘Never-ending learning’, Communications Web, 23–50, IOS Press, (2016). of the ACM, 61(5), 103–115, (2018). [5] Eva Blomqvist and Kurt Sandkuhl, ‘Patterns in ontology engineering: [24] JUDEA PEARL, ‘Causal diagrams for empirical research’, Biometrika, Classification of ontology patterns’, in ICEIS 2005, Proceedings of the 82(4), 669–688, (12 1995). Seventh International Conference on Enterprise Information Systems, [25] Anderson Rossanez and Julio Cesar dos Reis, ‘Generating knowledge Miami, USA, May 25-28, 2005, eds., Chin-Sheng Chen, Joaquim Filipe, graphs from scientific literature of degenerative diseases’, (2019). Isabel Seruca, and José Cordeiro, pp. 413–416, (2005). [26] Maya Rotmensch, Yoni Halpern, Abdulhakim Tlimat, Steven Horng, [6] Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, and David Sontag, ‘Learning a health knowledge graph from electronic Asli Celikyilmaz, and Yejin Choi, ‘Comet: Commonsense trans- medical records’, Scientific reports, 7(1), 1–11, (2017). formers for automatic knowledge graph construction’, arXiv preprint [27] J. Runge, S. Bathiany, E. Bollt, G. Camps-Valls, D. Coumou, E. Deyle, arXiv:1906.05317, (2019). M. Glymour, C. andKretschmer, M.D. Mahecha, E.H. van Nes, J. Pe- [7] V. Carretta Zamborlini, Knowledge Representation for Clinical Guide- ters, R. Quax, M. Reichstein, B. Scheffer, M. Schölkopf, P. Spirtes, lines: with applications to Multimorbidity Analysis and Literature G. Sugihara, J. Sun, Ka. Zhang, and J. Zscheischler, ‘Inferring causa- Search, Ph.D. dissertation, Vrije Universiteit Amsterdam, 2017. tion from time series with perspectives in earth system sciences’, Na- [8] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and ture Communications, (2019). Noemie Elhadad, ‘Intelligible models for healthcare: Predicting pneu- [28] Edward W. Schneider, ‘Course modularization applied: The interface monia risk and hospital 30-day readmission’, in Proceedings of the 21th system and its implications for sequence control and data analysis’, ACM SIGKDD International Conference on Knowledge Discovery and (1973). Data Mining, KDD ’15, p. 1721–1730, New York, NY, USA, (2015). [29] Peter Schulam and Suchi Saria, ‘Reliable decision support using coun- Association for Computing Machinery. terfactual models’, in Advances in Neural Information Processing Sys- [9] Tirthankar Dasgupta, Rupsa Saha, Lipika Dey, and Abir Naskar, ‘Au- tems 30, eds., I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fer- tomatic extraction of causal relations from text using linguistically in- gus, S. Vishwanathan, and R. Garnett, 1697–1708, Curran Associates, formed deep neural networks’, in Proceedings of the 19th Annual SIG- Inc., (2017). dial Meeting on Discourse and Dialogue, pp. 306–316, Melbourne, [30] Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, and Tom Australia, (July 2018). Association for Computational Linguistics. Kwiatkowski, ‘Matching the blanks: Distributional similarity for rela- [10] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, tion learning’, in Proceedings of the 57th Annual Meeting of the Asso- ‘Bert: Pre-training of deep bidirectional transformers for language un- ciation for Computational Linguistics, pp. 2895–2905, (2019). derstanding’, in Proceedings of the 2019 Conference of the North Amer- [31] Peter Spirtes, ‘Introduction to causal inference’, J. Mach. Learn. Res., ican Chapter of the Association for Computational Linguistics: Human 11, 1643–1662, (August 2010). Language Technologies, Volume 1 (Long and Short Papers), pp. 4171– [32] Denny Vrandecic and Markus Krötzsch, ‘Wikidata: a free collaborative 4186, (2019). knowledgebase’, Commun. ACM, 57(10), 78–85, (2014). [11] Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Ret- [33] Liang Yao, Chengsheng Mao, and Yuan Luo, ‘Kg-bert: Bert for knowl- tinger, ‘Linked data quality of dbpedia, freebase, opencyc, wikidata, edge graph completion’, arXiv preprint arXiv:1909.03193, (2019). and YAGO’, Semantic Web, 9(1), 77–129, (2018). [12] Marco Fossati, Emilio Dorigatti, and Claudio Giuliano, ‘N-ary relation extraction for simultaneous t-box and a-box knowledge base augmen- tation’, Semantic Web, 9(4), 413–439, (2018). [13] Aldo Gangemi and Valentina Presutti, ‘Towards a pattern science for the semantic web’, Semantic Web, 1(1-2), 61–68, (2010). [14] Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, An- drea Giovanni Nuzzolese, Francesco Draicchio, and Misael Mongiovı̀, ‘Semantic web machine reading with fred’, Semantic Web, 8(6), 873– 893, (2017). [15] Thomas A. Glass, Steven N. Goodman, Miguel A. Hernán, and Jonathan M. Samet, ‘Causal inference in public health’, (March 2013). [16] Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d’Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sab- rina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, and Antoine Zimmermann. Knowledge graphs, 2020. [17] Pearl J. and Mackenzie D., The book of why: the new science of cause and effect, Basic Books, 2018. [18] Mohammad Javad Hosseini, Nathanael Chambers, Siva Reddy, Xavier R Holt, Shay B Cohen, Mark Johnson, and Mark Steedman, ‘Learning typed entailment graphs with global soft constraints’, Trans- 5