<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Using a General Prior Knowledge Graph to Improve Data-Driven Causal Network Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Meghamala</forename><surname>Sinha</surname></persName>
							<email>sinham@oregonstate.edu</email>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical Engineering and Computer Science</orgName>
								<orgName type="institution">Oregon State University</orgName>
								<address>
									<postCode>97331</postCode>
									<settlement>Corvallis</settlement>
									<region>OR</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stephen</forename><forename type="middle">A</forename><surname>Ramsey</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">School of Electrical Engineering and Computer Science</orgName>
								<orgName type="institution">Oregon State University</orgName>
								<address>
									<postCode>97331</postCode>
									<settlement>Corvallis</settlement>
									<region>OR</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Biomedical Sciences</orgName>
								<orgName type="institution">Oregon State University</orgName>
								<address>
									<postCode>97331</postCode>
									<settlement>Corvallis</settlement>
									<region>OR</region>
									<country key="US">United States</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">Stanford University</orgName>
								<address>
									<addrLine>March 22-24</addrLine>
									<postCode>2021</postCode>
									<settlement>Palo Alto</settlement>
									<region>California</region>
									<country key="US">USA</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Using a General Prior Knowledge Graph to Improve Data-Driven Causal Network Learning</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0CC2D2BC1EF9EDFACF82B83AEBA4814A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T08:18+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Causal inference</term>
					<term>Structure learning</term>
					<term>Knowledge graph</term>
					<term>Informative prior</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We describe a method "Kg2Causal" for using a large-scale, general-purpose biomedical knowledge graph as a prior for data-driven causal network structure learning. Given a set of observed nodes in a dataset, and some relationship edges between the nodes derived from a knowledge graph, Kg2Causal uses the knowledge graph-derived edges to guide the data-driven inference of a causal Bayesian network. We tested Kg2Causal on several real-world biological datasets with known ground-truth networks and demonstrate improvement in network learning accuracy, relative to a baseline of an uninformative network structure prior. We also demonstrate the application of our method if data are collected under different experimental conditions including interventions on the observed variables.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Causal modeling is a useful analytical tool in various fields due to its applicability in action planning, prediction and diagnosis <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr" target="#b4">3,</ref><ref type="bibr" target="#b5">4,</ref><ref type="bibr" target="#b6">5]</ref>. However, learning a causal Bayesian network (CBN) solely from data is a challenging task <ref type="bibr" target="#b7">[6,</ref><ref type="bibr" target="#b8">7,</ref><ref type="bibr" target="#b9">8]</ref>. CBN learning can be thought of as model selection problem in which model is a directed acyclic graph (DAG), where problem is to find the graph 𝐺 that maximizes some objective (score) function of dataset 𝐷. In some CBN learning methods, score function is likelihood 𝑝(𝐷 | 𝐺) representing overall fit of 𝐺 to 𝐷 in context of a generative model for the data. For a dataset with 𝑛 observables (features), the number of possible DAGs-and thus requirement for data-grows super-exponentially with 𝑛 <ref type="bibr" target="#b10">[9]</ref>. In most network learning applications, prior knowledge exists about causal (or suspected causal) relationships among the observables; such prior knowledge can be valuable resource for network structure learning <ref type="bibr" target="#b11">[10]</ref>. Supposing that prior knowledge can be represented as prior probability 𝑝(𝐺) on network structure, one can alternatively choose as basis for CBN scoring function, the posterior probability 𝑝(𝐺 | 𝐷) = 𝑝(𝐷 | 𝐺) 𝑝(𝐺)/𝑝(𝐷). In contrast to substantial amount of work done on variety of marginal likelihood and scoring methods, less attention has been given to functional form (and associated parameterization) of the prior 𝑝(𝐺) for application contexts where structured prior knowledge is available. Without expert knowledge, standard network inference approaches, by default, assume uniform (uninformative) prior which can lead to erroneous relationships or relationship orientations both due to (i) size of space of networks and (ii) degeneracy of Markov-equivalent networks. Proper incorporation of informative priors can enhance model efficiency <ref type="bibr" target="#b12">[11]</ref> and can also overcome weakness of smaller dataset.</p><p>For most applications of causal modeling, some prior knowledge is available. For example, in medicine, most cases have prior knowledge about etiology, symptoms, and treatment of underlying diseases or conditions which can be obtained from biomedical literature or knowledgebases. Although there is in general large scale availability of structured prior knowledge (for example, ontologies) in various scientific domains, these mostly comprise disparate information sources in various standards and formats, which poses a challenge to integrate them into single structure. These problems motivated building of large multi-graphs called knowledge graphs (KG) <ref type="bibr" target="#b13">[12]</ref> that incorporate structured knowledge from multiple sources within a consistent schema. Knowledge graph is a term of art to mean a large graph-structured model to store interlinked relationships between nodes representing concepts <ref type="bibr" target="#b14">[13]</ref>. These large-scale networks accommodate structural information which can be leveraged for reasoning, recommendation or decision making. We hypothesized that combining information from structured databases of general prior knowledge with causal modeling based on context-specific multivariate measurements will improve accuracy of learned network compared to result of datadriven causal modeling without incorporating prior knowledge. In this work, we propose a method, "Kg2Causal", for extracting relations as pairs of nodes from a knowledge graph, and for incorporating them as priors on corresponding edges in a score-based, data-driven causal network learning method. In this study, prior edges from knowledge graph are accounted for in prior probability of the graph, using the method of Castelos and Siebes <ref type="bibr" target="#b12">[11]</ref>. We used a large-scale biomedical knowledge graph (KG) <ref type="foot" target="#foot_0">1</ref> that we and collaborators (see Acknowledgments) had previously constructed (see Sec. 2.5) containing millions of nodes representing drugs, genes, diseases, or phenotypes (Fig. <ref type="figure" target="#fig_0">1b</ref>), as well as edges between nodes representing various types of predicate relationships. For the measurement-based network learning component of KG2Causal, we used an optimizing method combining Tabu search algorithm <ref type="bibr" target="#b15">[14]</ref> with Bayesian Dirichlet uniform (BDeu) <ref type="bibr" target="#b16">[15,</ref><ref type="bibr" target="#b17">16]</ref> network score. Using five different multivariate molecular biology datasets for which groundtruth networks were available (see Sec. 4), we empirically analysed network learning accuracy of "Kg2Causal" along with various types of uninformative prior to test the usefulness of adding KG-based priors. We provide a comparative benchmark of our methods performance over five real-world biological datasets and two synthetic datasets of varying sizes where we found that Kg2Causal had superior network learning accuracy to methods that do not use general knowledge-base as network structure prior. Finally, we demonstrate (Sec. 4.3) the application of Kg2Causal if data are collected under different experimental conditions including interventions on the observed variables. We implemented "Kg2Causal" in the R programming language (leveraging the bnLearn package <ref type="bibr" target="#b18">[17]</ref>) and provide the code as open-source software<ref type="foot" target="#foot_1">2</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work and Background</head><p>In this section, we describe Kg2Causal's conceptual foundations including CBNs, score-based causal modeling, interventions, and knowledge graph-based priors in network learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Causal network: Brief Overview</head><p>A causal Bayesian Network <ref type="bibr">[1,</ref><ref type="bibr">2]</ref> is a DAG 𝐺 = (𝑉 , 𝐸), where 𝑉 = {𝑉 1 , … , 𝑉 𝑛 } denotes the set of variables (nodes) and 𝐸 ⊂ 𝑉 × 𝑉 denotes the causal relationships (edges). For an edge (𝑉 𝑖 , 𝑉 𝑗 ), we say that 𝑉 𝑖 is a parent (cause) of 𝑉 𝑗 , and 𝑉 𝑗 is a child (effect) of 𝑉 𝑖 . We will use Pa(𝑉 𝑖 ) to denote the set of parents of 𝑉 𝑖 . The conditional probability distribution 𝑝 𝑖 defines the probability of 𝑉 𝑖 given the states of its parents Pa(𝑉 𝑖 ). A causal network represents a joint distribution 𝑝 over variables 𝑉 as long as it satisfies two main assumptions: a) Causal Markov: Any given variable 𝑉 𝑖 is independent of its non-descendants, conditioned on all of its direct causes. The assumption implies that the joint distribution 𝑝(𝑉 ) can be factored as: 𝑝(𝑉 ) = ∏ 𝑛 𝑖=1 𝑝 𝑖 (𝑉 𝑖 | Pa(𝑉 𝑖 )). b) Faithfulness: The joint distribution 𝑝(𝑉 1 , … , 𝑉 𝑛 ) is faithful to 𝐺 if every conditional independence relation in 𝑝 is entailed by the causal Markov assumption applied to 𝐺 <ref type="bibr" target="#b19">[18]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Constructing a causal network</head><p>Let us assume, we have a dataset 𝐷 having observations over set of 𝑛 variables. One of the main classes of causal learning approaches is the Score-based approach which is derived from classic Bayesian method where a scoring function evaluates the fit of graph 𝐺 to data 𝐷 <ref type="bibr" target="#b17">[16,</ref><ref type="bibr" target="#b7">6]</ref> with a higher value indicating better fit. A search algorithm is used to explore the space of all possible graphs, to maximize the scoring function. Typical heuristic algorithms used for this purpose include hill-climbing or Tabu search approaches <ref type="bibr" target="#b15">[14]</ref>. Other common score based methods are GDS <ref type="bibr" target="#b20">[19]</ref> and Gies <ref type="bibr" target="#b7">[6]</ref>. According to standard Bayesian rule, a causal graph 𝐺, is a 𝐷𝐴𝐺 learned from given data 𝐷 as 𝑝(𝐺 | 𝐷) ∝ 𝑝(𝐺)𝑝(𝐷 | 𝐺), where 𝑝(𝐺) is prior distribution over space of all possible DAGs reflecting prior knowledge and 𝑝(𝐷 | 𝐺) is marginal likelihood of the data 𝐷. As described in Sec. 3, the Kg2Causal method incorporates a score-based approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Learning with interventions</head><p>Interventions-external manipulations of nodes ("targets") in a network-are important to detect causal relations that can help disambiguate Markov equivalent sub-networks <ref type="bibr" target="#b17">[16]</ref>. Let 𝐼 𝑒 represent set of target nodes that are altered in interventional experiment 𝑒 and 𝑂 𝑒 = 𝑉 \𝐼 𝑒 be the complementary set of observational variables. Each intervention can have one or more targets whose conditional probabilities are changed (so that, conditioned on intervention, target variable's distribution may depend only on a (possibly empty) subset of its parent observables). Hence, each intervention results in deletion of arrows pointing towards the intervened nodes. The joint distribution of 𝑝 after intervention is 𝑝(𝑉 1 , ..., 𝑉 𝑛 ) = ∏ 𝑉 𝑖 ∈𝑂 𝑒 𝑝 𝑖 (𝑉 𝑖 | 𝑃𝑎(𝑉 𝑖 ))⋅∏ 𝑉 𝑗 ∈𝐼 𝑒 𝑝 ′ 𝑗 (𝑉 𝑗 | 𝑃𝑎(𝑉 𝑗 )), where 𝑝(𝑉 𝑖 | 𝑃𝑎(𝑉 𝑖 )) is conditional probability similar to 𝑉 𝑖 , given that 𝑉 𝑖 is not a target node, and 𝑝 ′ (𝑉 𝑗 | 𝑃𝑎 ′ (𝑉 𝑗 )) is post-intervention conditional probability of 𝑉 𝑗 given its new set of parents 𝑃𝑎 ′ (𝑉 𝑗 ). For a so-called "perfect" intervention, one would set 𝑃𝑎 ′ (𝑉 𝑗 ) = ∅ <ref type="bibr">[1]</ref>. Score-based approaches are well-suited to mixed interventional-observational datasets, in contrast to constraint-based approaches which are applicable to observational data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Incorporation of Priors</head><p>In this subsection we introduce three types of uninformative priors on the network structure 𝑝(𝐺), uniform prior, marginal prior, and Bayesian variable selection prior (VSP). We then describe the knowledge graph-based prior that we use in Kg2Causal method. In cases for lack of prior knowledge, default choice for prior 𝑝(𝐺) is a uniform prior distribution, as follows:</p><formula xml:id="formula_0">𝑝(𝐸 ∪ {𝑉 𝑖 , 𝑉 𝑗 } | 𝐷) 𝑝(𝐸 | 𝐷) = 𝑝(𝐸 ∪ {𝑉 𝑖 , 𝑉 𝑗 }) 𝑝(𝐸) 𝑝(𝐷 | 𝐸 ∪ {𝑉 𝑖 , 𝑉 𝑗 }) 𝑝(𝐷 | 𝐸)</formula><p>where nodes 𝑉 𝑖 and 𝑉 𝑗 can have three possible cases 𝑉 𝑖 ⇒V 𝑗 (representing (𝑉 𝑖 , 𝑉 𝑗 ) ∈ 𝐸),V 𝑖 ⇐V 𝑗 (representing (𝑉 𝑗 , 𝑉 𝑖 ) ∈ E) or 𝑉 𝑖 ⇎V 𝑗 (no arc) and each have equal probability of occurrence. So the probability for these edges are assigned as 𝑝(𝑉 𝑖 ⇒V 𝑗 ) = 𝑝(𝑉 𝑖 ⇐V 𝑗 ) = 𝑝(𝑉 𝑖 ⇎ 𝑉 𝑗 ) = 1/3, since we know that 𝑝(𝑉 𝑖 ⇒V 𝑗 ) + 𝑝(𝑉 𝑖 ⇐ 𝑉 𝑗 ) + 𝑝(𝑉 𝑖 ⇎ 𝑉 𝑗 ) = 1. This implies 𝑝(𝑉 𝑖 ⇒ 𝑉 𝑗 ) + 𝑝(𝑉 𝑖 ⇐ 𝑉 𝑗 ) = 2/3, which means a higher promotion for the inclusion of new arcs and favouring the propagation of false positives in 𝐺. Hence, its not always a good idea to use uniform prior specially for cases where data is not too supportive of the DAG learned and where 𝑛 is large. A better version of uniform prior is to use marginal probabilities instead, where an independent prior can be assumed for each arc with same independent marginal probabilities as uniform priors, also called marginal uniform <ref type="bibr" target="#b21">[20]</ref>. In this case, the probability of inclusion of each edge is assigned as 𝑝(𝑉 𝑖 ⇒ 𝑉 𝑗 ) = 𝑝(𝑉 𝑖 ⇐ 𝑉 𝑗 ) = 1/4 and 𝑝(𝑉 𝑖 ⇎ 𝑉 𝑗 ) = 1/2. Compared to the uniform prior, the marginal uniform prior is less prone to false-positive edges in the posterior-probability-maximizing graph. The Bayesian variable selection prior (VSP) assigns a probability of inclusion of possible parent nodes, with the default being 1/𝑛. The heart of Kg2Causal is the use of an informative prior based on a general-purpose knowledge graph; for this purpose we use an edge decomposition technique described by Castelo and Siebes <ref type="bibr" target="#b12">[11]</ref>. For any pair of vertices (𝑉 𝑖 , 𝑉 𝑗 ) for which an edge 𝑉 𝑖 ⇒V 𝑗 exists in the general-purpose knowledge graph, we assign a prior probability (𝛽 = 1/2) on those edges, with probability 1/4 for 𝑉 𝑗 ⇒V 𝑖 and probability 1/4 for 𝑉 𝑖 ⇎V 𝑗 , since the later two are alternate edges that have no corresponding edge in the general knowledge graph we use the uniform probability distribution as shown in Fig. <ref type="figure" target="#fig_1">2</ref>. In this way we can create a complete prior probability (from partial knowledge) over the network 𝐺; on log scale, we define 𝑝(𝐺) as log 𝑝(𝐺) = ∑ 𝑉 𝑖 ⇔𝑉 𝑗 ∈𝐸, 𝑖≠𝑗 log 𝑝(𝑉 𝑖 ⇔ 𝑉 𝑗 ) + ∑ 𝑉 𝑖 ⋯𝑉 𝑗 ∈𝐸, 𝑖≠𝑗 log 𝑝(𝑉 𝑖 ⇎ 𝑉 𝑗 ). A "knowledge graph" <ref type="bibr" target="#b14">[13]</ref> is a multigraph consisting of nodes and edges (labeled by relationship type or description of instance attributes) between them. Although most relationships in knowledge graphs are between entities and context-based associations, these do not always imply causal relationship. Nevertheless, such links are strong association that can strengthen causal relationships that we seek to discover. The key idea of Kg2Causal is to use links from large knowledge graphs as generalised prior information to aid in data-driven network learning in highly specific application contexts. For this work, we leveraged a general biomedical knowledge graph that we and collaborators (see Acknowledgments) had constructed, KG1 <ref type="foot" target="#foot_2">3</ref> . KG1 has 130,443 nodes, 3.5M edges, 11 node semantic types, and 17 edge relation types, and was compiled from 20 different biomedical knowledge-bases (Monarch, COHD, ChEMBL, DGIdb, DisGeNet, Disease Ontology, GeneProf, HMDB, KEGG, miRBase, miRGate, mychem.info, mygene.info, NCBI Gene, OMIM, Pathway Commons, Pharos, PubChem, Reactome, and UniprotKB). We hosted KG1 in a Neo4j database (ver. 3.5.13) and used the Cypher query language to search for concept mappings between ground-truth network variables and concept nodes in the KG1 knowledge graph, and for edge connections between mapped concepts within KG1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">Knowledge Graphs</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Our Approach</head><p>We developed KG2Causal to leverage a general-purpose biomedical knowledge graph (see Sec. 2.5) in order to improve context-specific, data-driven network learning from multivariate observations; such observations could consist of gene expression measurements, proteomics measurements, or electronic health records. The key ideas of our approach are (i) mapping each variable in the dataset to a node in the knowledge graph, and querying relationships between them; (ii) extracting a subgraph containing the connected variables with edges between them; and (iii) use this edge set as our prior knowledge to guide the optimizing scoring step for inferring causal network. Mathematically, given a dataset 𝐷, with a set 𝑉 of observable variables and given a general-purpose prior knowledge graph Γ as a multigraph, we want to learn a causal graph 𝐺 𝑓 = (𝑉 , 𝐸) that approximately maximizes the posterior probability, i.e., argmax 𝐺 (𝑝(𝐺 | 𝐷, Γ)), given a prior 𝑝(𝐺 | Γ). As a comparison, we used three uninformative prior distributions, namely uniform, marginal and Bayesian variable selection priors with each dataset in order to understand whether or not-and to what extent-using an informative network prior improves accuracy of causal network learning in a biomedical context. The Kg2Causal network discovery workflow, illustrated in Figure <ref type="figure" target="#fig_2">3</ref>, consists of the following steps:</p><p>• Map variables 𝑉 to nodes in Γ, and extract a list 𝛽 of edges from Γ among the nodes (collapsing same-direction multiedges to single edges). Step 2 -Cleaning and discretizing the data.</p><p>Step 3 -Generating 100 random DAGs using the observed nodes.</p><p>Step 4 -Optimizing each of the 100 DAGs using Tabu search. In this step, we also extract edges present in the KG and incorporate them as a prior.</p><p>Step 5 -Calculating probability of occurrence for each possible arc in the 100 optimized DAGs.</p><p>Step 6 -Constructing the final network by selecting arc strengths above a threshold.</p><p>• Generate 100 random DAGs with nodes 𝑉 . We empirically determined, based on our previous study <ref type="bibr" target="#b22">[21]</ref>, that this number is adequate for the medium-to-large datasets 4.</p><p>• In the score function, we include edge probability contributions from the prior knowledge graph (we assign probability 0.5 for every edge in 𝛽). For each DAG, we used the stochastic algorithm Tabu <ref type="bibr" target="#b15">[14]</ref> to find a DAG that maximizes standard Bayesian Dirichlet equivalent uniform scoring function (BDeu) <ref type="bibr" target="#b16">[15,</ref><ref type="bibr" target="#b17">16]</ref>.</p><p>• The previous step yields 100 optimized networks. Using these we compute the probability of each possible directed edge as its empirical frequency of occurrence among the DAGs. For example, if an edge (𝑋 , 𝑌 ) appears in 80 out of 100 optimized DAGs, we assign it an empirical probability of 0.80. We store the edge probabilities in a list.</p><p>• We threshold the edge probabilities in order to obtain the set of edges 𝐸 for 𝐺 𝑓 . Based on empirical studies, we chose a threshold of 0.85.</p><p>We chose Tabu for its robustness, simplicity (uses few parameters) and history-dependent ("memory"), although Kg2Causal is in principle compatible with any optimizing method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Observational experiment</head><p>In the case where the dataset 𝐷 is purely observational (i.e., no interventions) from a single experiment, Kg2Causal can be implemented algorithmically as described above; we provide a pseudocode description of the "observational" formulation of Kg2Causal in Algorithm 1. Next, from each DAG in randomDAG, we learn an optimized network and store the 𝑛 networks in a list optimizedDAG. In this step, we also pass 𝛽 to createRandomDAG. Next, using the list of networks in optimizedDAG, we compute the probabilistic arc strength for each ordered pair of nodes as its empirical frequency, using procedure edgeStrength and store them as edgeProb. Finally, we use learnCausalDAG where we select the edges with weight above a predefined Threshold. Algorithm 2: Kg2Causal for mixed observational-interventional data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Mix of Observational and Interventional experiments</head><p>With causal network learning based on a single observational dataset, it is difficult to differentiate between compatible Markov equivalent models <ref type="bibr" target="#b23">[22]</ref>. In the simple case of three variables 𝑉 𝑖 , 𝑉 𝑗 and 𝑉 𝑘 , there are three possible causal models 𝑉 𝑖 ⇒ 𝑉 𝑗 ⇒ 𝑉 𝑘 , 𝑉 𝑖 ⇐ 𝑉 𝑗 ⇐ 𝑉 𝑘 , and 𝑉 𝑖 ⇐ 𝑉 𝑗 ⇒ 𝑉 𝑘 ; all three structures are Markov equivalent. This ambiguity can be resolved by incorporating measurements from interventional experiments, causing the Markov equivalent structures to have different likelihoods. However, in real-world settings, it is difficult to obtain such interventional measurements as compared to observational measurement <ref type="bibr" target="#b24">[23]</ref>. Even when interventional datasets are available, learning a causal network from mixed observational and interventional data is challenging, for two reasons: (i) datasets collected from different experiments under different environmental conditions or batches are not identically distributed, in which case their underlying causal structures may differ leading to errors if network inference is applied to the combined set of measurements; and (ii) in real-world settings interventions are not "perfect" but rather "uncertain" (i.e., "imperfect" or "fat-hand"), meaning that the interventions have other unknown targets, which if ignored would likely yield spurious interactions in network discovery. To deal with such cases, based on our previous study demonstrating the effectiveness of the Learn and Vote algorithm <ref type="bibr" target="#b22">[21,</ref><ref type="bibr" target="#b25">24]</ref>, we extended Kg2Causal to include learning from a multi-experiment dataset using a voting-based integration method where experiment-specific causal networks are learned and combined by weighted averaging into a consensus causal network. The additional steps in Algorithm 2 are as follows:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Analysis and Results</head><p>In this section, we describe the observational datasets and ground-truth networks (Sec. 4.1) and the simulated mixed interventional-observational datasets (Sec. 4.2) that we analyzed. We present (Sec. 4.3) the results of empirical studies of network learning performance of Kg2Causal on these datasets in comparison to other types of network structure priors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Observational datasets that were analyzed</head><p>To assess performance of Kg2Causal on biological network inference problems, we empirically analyzed five real world datasets for which published ground-truth networks were available:</p><p>Hepatic encephalopathy: This is a clinical study about a serious liver complication called hepatic encephalopathy (HE) <ref type="bibr" target="#b26">[25]</ref> with conditions like electrolyte disorders, infections, poor spirits. It is a categorical dataset with eight nodes and ground-truth containing ten edges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sachs et al. T cell signaling:</head><p>This is a study on mixed observational and interventional experiments to infer causal connections between eleven protein and phospholipids in the intracellular signaling network of individual human CD4+ T-cells <ref type="bibr" target="#b27">[26]</ref>. The dataset contains measurement of gene expressions with ground truth network containing twenty edges.</p><p>Hematopoietic Stem Cell Differentiation (HSC): This is a real-world gene regulatory network to study underlying myeloid differentiation from multipotent myeloid progenitors to megakaryocytes, erythrocytes, granulocytes and monocytes <ref type="bibr" target="#b28">[27]</ref> in mammals <ref type="bibr" target="#b29">[28]</ref>. The dataset contains measurement of gene expressions with ground-truth network having thirty edges.</p><p>Gonadal Sex Determination (GSD): This a real-world model which represents the gonadal differentiation circuit which monitors the transformation of the bipotential gonadal primordium (BGP) into either female or male gonads <ref type="bibr" target="#b30">[29]</ref>. The network consists of eighteen genes and one node for the urogenital ridge. The dataset contains measurement of gene expressions with ground-truth network containing seventy nine edges <ref type="bibr" target="#b29">[28]</ref>.</p><p>Yeast cell cycle: This is a dataset derived from a network model of thirty genes participating in cell-cycle regulation of yeast <ref type="bibr" target="#b31">[30]</ref>. The dataset was created by integrating gene expression data with transitive protein-protein interaction. The ground-truth network has 317 edges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Mixed observational-interventional datasets</head><p>We tested Kg2Causal using Sachs et al. interventional dataset and simulated observational and interventional measurement data from synthetic networks using the bnlearn package. For observational data, we drew random samples and for interventional data, we set some target nodes in the network to fixed values in order to create mutilated networks <ref type="bibr" target="#b32">[31]</ref> before drawing samples from them. To simulate an uncertain intervention (or "fat-hand") <ref type="bibr" target="#b33">[32]</ref> we intervened one or more of child nodes of the intervention's target node.</p><p>Cancer: This is a synthetic network <ref type="bibr" target="#b34">[33]</ref> on causes and consequences of lung cancer. We simulated data from one observational and one interventional experiment with equal number of samples (500) from each experiment to avoid bias. For interventional experiment we generated a mutilated network: cancer_mut with one intervention (node Smoker).</p><p>Asia: This is a synthetic network <ref type="bibr" target="#b35">[34]</ref>, about occurrence of lung disease and their epidemiological connection a prior visit to Asia. We simulated one observational and two interventional experiment from the synthetic network with equal number of samples (500) from each experiment to avoid bias. For the interventional experiments we conducted experiments to generate two mutilated networks: asia_mut1 with one intervention (node "Lung Cancer") and asia_mut2 with two intervention (at nodes "Lung Cancer" and "Tuberculosis").</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Analysis of results</head><p>In this section we present results of empirical studies of network learning performance on the five observational datasets (see Sec. 4.1) and three mixed observational-interventional datasets (see Sec. 4.2), for Kg2Causal in comparison with three other types of network structure priors.</p><p>To quantify the performance, we considered presence of an edge in the ground truth network as a "true positive" and absence of an edge as a "true negative" causal arc. For the observational datasets, we used Algorithm 1 with indicated prior (KG, uniform, marginal, or Bayesian VSP) as described in Sec. 3. For mixed interventional-observational datasets, we used Algorithm 2 with the indicated prior. For each dataset, we found (Fig. <ref type="figure" target="#fig_4">5</ref>) that using general knowledge graph  as prior improves performance, by ROC, precision/recall, F1, and accuracy. Quantitatively, Kg2Causal had higher area under ROC curve (AUROC) and area under precision-recall curve (AUPR) scores than network learning with three non-KG priors tested, for the five observational (Table <ref type="table" target="#tab_0">1</ref>) and three mixed interventional-observational (Table <ref type="table" target="#tab_1">2</ref>) datasets. Moreover, the results of comparative analysis of Kg2Causal performance on mixed datasets (Table <ref type="table" target="#tab_1">2</ref>) show effect of pooling data from different experiments (Algorithm 1) as compared to voting (Algorithm 2) for such cases: pooling is better for small network (Cancer) (consistent with our previous findings <ref type="bibr" target="#b22">[21]</ref>), whereas voting is better for medium-sized networks (Asia and Sachs).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion and Conclusion</head><p>A limitation of this study is that due to lack of availability large ground-truth causal networks, all datasets analyzed in this work are for small to medium sized networks (8-30 nodes); due to scalability issue of score-based methods, Kg2Causal method as described here would be challenging to apply to larger networks (many hundreds to thousands of nodes and beyond), which is an area of future work. Further, we plan to explore ways to incorporate a network structure prior in constraint based algorithms (for example, PC algorithm <ref type="bibr">[2]</ref>), given (in general) more favorable scalability of constraint-based algorithms and given the overwhelming preponderance of observational-only datasets that are available. We also want to evaluate alternative methods (other than the method <ref type="bibr" target="#b12">[11]</ref> that we are using) for incorporating priors and compare them. Present work clearly demonstrates, for the case of causal network learning from small-to medium-sized biomedical or biological datasets, the importance of aggregating and leveraging structured prior knowledge in order to maximize network learning accuracy.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: (a) A network containing known relationships between lungs condition and diseases, (b) corresponding sub-graph in a knowledge graph</figDesc><graphic coords="2,89.29,371.54,229.17,86.28" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Complete prior by edge decomposition technique</figDesc><graphic coords="5,360.14,121.12,145.85,67.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Workflow of Kg2Causal: Step 1 -Data collection (can be a combination of observational and/or interventional studies). For example, in this figure, we show that node C (in red, top center) has been intervened. Step 2 -Cleaning and discretizing the data. Step 3 -Generating 100 random DAGs using the observed nodes. Step 4 -Optimizing each of the 100 DAGs using Tabu search. In this step, we also extract edges present in the KG and incorporate them as a prior. Step 5 -Calculating probability of occurrence for each possible arc in the 100 optimized DAGs. Step 6 -Constructing the final network by selecting arc strengths above a threshold.</figDesc><graphic coords="6,150.06,84.19,295.16,214.79" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Algorithm 1: Kg2Causal for observational data. We start by creating 𝑛 random starting DAGs using procedure createRandomNet and store them as randomDAG. Next, from each DAG in randomDAG, we learn an optimized network and store the 𝑛 networks in a list optimizedDAG. In this step, we also pass 𝛽 to createRandomDAG. Next, using the list of networks in optimizedDAG, we compute the probabilistic arc strength for each ordered pair of nodes as its empirical frequency, using procedure edgeStrength and store them as edgeProb. Finally, we use learnCausalDAG where we select the edges with weight above a predefined Threshold. Algorithm 2: Kg2Causal for mixed observational-interventional data.</figDesc><graphic coords="7,89.91,84.19,415.45,119.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Empirical performance (ROC, precision-vs-recall, F1 vs. cutoff, and accuracy vs. cutoff) of Kg2Causal in each of five datasets compared to learning with uninformative priors (uniform, marginal, and Bayesian VSP).</figDesc><graphic coords="9,313.59,349.38,82.17,67.62" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 Performance of "Kg2Causal" versus three uninformative priors, for network learning from observa- tional data:</head><label>1</label><figDesc>Each row represents a specific real-world dataset (see Sec. 4.1) with corresponding ground-truth network and is split by performance metric (AUROC, AUPR). Rows are ordered by network size (# of nodes).</figDesc><table><row><cell cols="2">Dataset Size</cell><cell cols="5">Metric Uniform Marginal Bayesian VSP Kg2Causal</cell></row><row><cell></cell><cell></cell><cell>AUROC</cell><cell>0.800</cell><cell>0.789</cell><cell>0.800</cell><cell>0.842</cell></row><row><cell>HE</cell><cell>8</cell><cell>AUPR</cell><cell>0.810</cell><cell>0.799</cell><cell>0.812</cell><cell>0.854</cell></row><row><cell></cell><cell></cell><cell>AUROC</cell><cell>0.857</cell><cell>0.854</cell><cell>0.849</cell><cell>0.879</cell></row><row><cell>Sachs</cell><cell>11</cell><cell>AUPR</cell><cell>0.732</cell><cell>0.729</cell><cell>0.718</cell><cell>0.800</cell></row><row><cell></cell><cell></cell><cell>AUROC</cell><cell>0.705</cell><cell>0.702</cell><cell>0.701</cell><cell>0.745</cell></row><row><cell>HSC</cell><cell>11</cell><cell>AUPR</cell><cell>0.547</cell><cell>0.542</cell><cell>0.543</cell><cell>0.564</cell></row><row><cell></cell><cell></cell><cell>AUROC</cell><cell>0.656</cell><cell>0.660</cell><cell>0.656</cell><cell>0.676</cell></row><row><cell>GSD</cell><cell>18</cell><cell>AUPR</cell><cell>0.457</cell><cell>0.460</cell><cell>0.458</cell><cell>0.473</cell></row><row><cell></cell><cell></cell><cell>AUROC</cell><cell>0.569</cell><cell>0.556</cell><cell>0.537</cell><cell>0.623</cell></row><row><cell>Yeast</cell><cell>30</cell><cell>AUPR</cell><cell>0.619</cell><cell>0.606</cell><cell>0.581</cell><cell>0.662</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 Performance of "Kg2Causal" versus three uninformative priors, for network learning from mixed interventional-observational data:</head><label>2</label><figDesc>Rows correspond to datasets and columns correspond to types of priors, split into analyses where data are pooled (per Algorithm 1) or voting is used (per Algorithm 2).</figDesc><table><row><cell cols="2">Dataset Size</cell><cell>Metric</cell><cell cols="2">Uniform</cell><cell cols="2">Marginal</cell><cell cols="2">Bayesian VSP</cell><cell cols="2">Kg2Causal</cell></row><row><cell></cell><cell></cell><cell></cell><cell cols="8">Pooling Voting Pooling Voting Pooling Voting Pooling Voting</cell></row><row><cell></cell><cell></cell><cell cols="2">AUROC 0.750</cell><cell>0.700</cell><cell>0.750</cell><cell>0.740</cell><cell>0.750</cell><cell>0.780</cell><cell>0.813</cell><cell>0.833</cell></row><row><cell>Cancer</cell><cell>5</cell><cell>AUPR</cell><cell>0.776</cell><cell>0.677</cell><cell>0.776</cell><cell>0.690</cell><cell>0.776</cell><cell>0.720</cell><cell>0.809</cell><cell>0.738</cell></row><row><cell></cell><cell></cell><cell cols="2">AUROC 0.878</cell><cell>0.944</cell><cell>0.878</cell><cell>0.944</cell><cell>0.887</cell><cell>0.884</cell><cell>0.903</cell><cell>0.956</cell></row><row><cell>Asia</cell><cell>8</cell><cell>AUPR</cell><cell>0.711</cell><cell>0.902</cell><cell>0.711</cell><cell>0.905</cell><cell>0.736</cell><cell>0.852</cell><cell>0.817</cell><cell>0.940</cell></row><row><cell></cell><cell></cell><cell cols="2">AUROC 0.857</cell><cell>0.873</cell><cell>0.854</cell><cell>0.867</cell><cell>0.849</cell><cell>0.855</cell><cell>0.879</cell><cell>0.883</cell></row><row><cell>Sachs</cell><cell>11</cell><cell>AUPR</cell><cell>0.732</cell><cell>0.777</cell><cell>0.729</cell><cell>0.739</cell><cell>0.718</cell><cell>0.728</cell><cell>0.800</cell><cell>0.812</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/RTXteam/RTX/code/reasoningtool/kg-construction</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/meghasin/Kg2Causal</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/RTXteam/RTX/code/reasoningtool/kg-construction</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgments</head><p>This work was supported in part by the National Center for Advancing Translational Sciences (NCATS) through the Biomedical Data Translator program (OT2TR002520 &amp; OT2TR003428 to SAR). We thank David Koslicki, Eric Deutsch, Yao Yao, Zheng Liu, Deqing Qu, Finn Womack, and Ujjval Kumaria for their work on constructing the KG1 knowledge graph.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m">) that produced 𝑘 datasets with observed variables as (𝑉 ) and known intervention targets as 𝐼 𝑁 𝑇</title>
				<imprint/>
	</monogr>
	<note>Let there be 𝑘 experiments (can be observational and/or interventional. if any</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m">Repeat steps</title>
				<imprint>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
	<note>from Sec. 3) for all 𝑘 experiments</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Causality: models, reasoning, and inference</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pearl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Econometric Theory</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page">46</biblScope>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Causation, prediction, and search. Adaptive computation and machine learning</title>
		<author>
			<persName><forename type="first">P</forename><surname>Spirtes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Glymour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Scheines</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
			<publisher>MIT Press</publisher>
			<pubPlace>Cambridge, MA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Student evaluation model using bayesian network in an intelligent e-learning system</title>
		<author>
			<persName><forename type="first">B</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sinha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Institute of Integrative Omics and Applied Biotechnology (IIOAB)</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A probabilistic approach for detection and analysis of cognitive flow</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chatterjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Saha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">BMA@ UAI</title>
				<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="44" to="53" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Method and system for detection and analysis of cognitive flow</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chatterjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Saha</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">US Patent</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page">164</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Learning equivalence classes of Bayesian-network structures</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Chickering</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J Mach Learn Res</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="445" to="498" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Improving Markov chain Monte Carlo model search for data mining</title>
		<author>
			<persName><forename type="first">P</forename><surname>Giudici</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Castelo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="127" to="158" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks</title>
		<author>
			<persName><forename type="first">N</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Koller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine learning</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="95" to="125" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Experiments in stochastic computation for high-dimensional graphical models</title>
		<author>
			<persName><forename type="first">B</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Carvalho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dobra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Carter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>West</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Statistical Science</title>
		<imprint>
			<biblScope unit="page" from="388" to="400" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Network inference using informative priors</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mukherjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">P</forename><surname>Speed</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proc Nat Acad Sci</title>
		<imprint>
			<biblScope unit="volume">105</biblScope>
			<biblScope unit="page" from="14313" to="14318" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Priors on network structures. biasing the search for Bayesian networks</title>
		<author>
			<persName><forename type="first">R</forename><surname>Castelo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Siebes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int J Approx Reason</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="39" to="57" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Marttinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Yu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2002.00388</idno>
		<title level="m">A survey on knowledge graphs: Representation, acquisition and applications</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Towards a definition of knowledge graphs</title>
		<author>
			<persName><forename type="first">L</forename><surname>Ehrlinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wöß</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">SEMANTiCS (Posters, Demos, SuCCESS)</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="1" to="4" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Future paths for integer programming and links to artificial intelligence</title>
		<author>
			<persName><forename type="first">F</forename><surname>Glover</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers &amp; operations research</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="533" to="549" />
			<date type="published" when="1986">1986</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Learning Bayesian networks: The combination of knowledge and statistical data</title>
		<author>
			<persName><forename type="first">D</forename><surname>Heckerman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Geiger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Chickering</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page" from="197" to="243" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Causal discovery from a mixture of experimental and observational data</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">F</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yoo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence</title>
				<meeting>the Fifteenth conference on Uncertainty in artificial intelligence</meeting>
		<imprint>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
			<date type="published" when="1999">1999</date>
			<biblScope unit="page" from="116" to="125" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Scutari</surname></persName>
		</author>
		<idno type="arXiv">arXiv:0908.3817</idno>
		<title level="m">Learning Bayesian networks with bnlearn</title>
				<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">The role of assumptions in causal discovery</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Druzdzel</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hauser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bühlmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J Mach Learn Res</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page" from="2409" to="2464" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">On the prior and posterior distributions used in graphical modelling</title>
		<author>
			<persName><forename type="first">M</forename><surname>Scutari</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Bayesian Analysis</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="505" to="532" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Voting-based integration algorithm improves causal network learning from interventional and observational data: an application to cell signaling network inference</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tadepalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Ramsey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Plos one</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page">e0245776</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Probabilistic graphical models: principles and techniques</title>
		<author>
			<persName><forename type="first">D</forename><surname>Koller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Friedman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bach</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>MIT press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Causal reasoning through intervention, Causal learning: Psychology, philosophy, and computation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hagmayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Sloman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Lagnado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Waldmann</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="86" to="100" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Sinha</surname></persName>
		</author>
		<title level="m">Causal structure learning from experiments and observations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Application of tabu search-based Bayesian networks in exploring related factors of liver cirrhosis complicated with hepatic encephalopathy and disease identification</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific Reports</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="1" to="8" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Causal protein-signaling networks derived from multiparameter single-cell data</title>
		<author>
			<persName><forename type="first">K</forename><surname>Sachs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pe'er</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Lauffenburger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">P</forename><surname>Nolan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">308</biblScope>
			<biblScope unit="page" from="523" to="529" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network</title>
		<author>
			<persName><forename type="first">J</forename><surname>Krumsiek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Marr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Schroeder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Theis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">PLOS ONE</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pratapa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Jalihal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">N</forename><surname>Law</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bharadwaj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Murali</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature Methods</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="147" to="154" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">A boolean network model of human gonadal sex determination</title>
		<author>
			<persName><forename type="first">O</forename><surname>Ríos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Frias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodríguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kofman</surname></persName>
		</author>
		<author>
			<persName><surname>Merchant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Theor Biol Medic Model</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page">26</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Fusing gene expressions and transitive protein-protein interactions for inference of gene regulatory networks</title>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Rajapakse</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BMC Systems Biology</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">37</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Graphical models for probabilistic and causal reasoning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pearl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Quantified representation of uncertainty and imprecision</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="367" to="389" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Exact Bayesian structure learning from uncertain interventions</title>
		<author>
			<persName><forename type="first">D</forename><surname>Eaton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Murphy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence and Statistics</title>
		<imprint>
			<biblScope unit="page" from="107" to="114" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">B</forename><surname>Korb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">E</forename><surname>Nicholson</surname></persName>
		</author>
		<title level="m">Bayesian artificial intelligence</title>
				<imprint>
			<publisher>CRC Press</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Local computations with probabilities on graphical structures and their application to expert systems</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">L</forename><surname>Lauritzen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Spiegelhalter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J Roy Stat Soc B</title>
		<imprint>
			<biblScope unit="page" from="157" to="224" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
