<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>HC@AIxIA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Risk Assessment of Lymph Node Metastases in Endometrial Cancer Patients: A Causal Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessio Zanga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alice Bernasconi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter J.F. Lucas</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanny Pijnenborg</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Casper Reijnen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Scutari</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Stella</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science and Advanced Analytics, F. Hofmann - La Roche Ltd</institution>
          ,
          <addr-line>Basel</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Informatics</institution>
          ,
          <addr-line>Systems and Communication (DISCo)</addr-line>
          ,
          <institution>University of Milano - Bicocca</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Evaluative Epidemiology Unit, Department of Research, Fondazione IRCCS Istituto Nazionale dei Tumori</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>RadboudUMC</institution>
          ,
          <addr-line>Nijmegen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Twente</institution>
          ,
          <addr-line>Enschede</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Assessing the pre-operative risk of lymph node metastases in endometrial cancer patients is a complex and challenging task. In principle, machine learning and deep learning models are flexible and expressive enough to capture the dynamics of clinical risk assessment. However, in this setting we are limited to observational data with quality issues, missing values, small sample size and high dimensionality: we cannot reliably learn such models from limited observational data with these sources of bias. Instead, we choose to learn a causal Bayesian network to mitigate the issues above and to leverage the prior knowledge on endometrial cancer available from clinicians and physicians. We introduce a causal discovery algorithm for causal Bayesian networks based on bootstrap resampling, as opposed to the single imputation used in related works. Moreover, we include a context variable to evaluate whether selection bias results in learning spurious associations. Finally, we discuss the strengths and limitations of our findings in light of the presence of missing data that may be missing-not-at-random, which is common in real-world clinical settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Causal discovery</kwd>
        <kwd>Causal networks</kwd>
        <kwd>Bayesian networks</kwd>
        <kwd>Missing mechanism</kwd>
        <kwd>Selection bias</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Artificial Intelligence in Medicine</title>
        <p>
          State of the Art. Artificial Intelligence (AI) has found many applications in medicine [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
and, more specifically, in cancer research [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] in the form of predictive models for diagnosis
[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], prognosis [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and therapy planning [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As a subfield of AI, Machine Learning (ML) and in
particular Deep Learning (DL) has achieved significant results, especially in image processing
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Nonetheless, ML and DL models have limited explainability [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] because of their black-box
design, which limits their adoption in the clinical field: clinicians and physicians are reluctant
to include models that are not transparent in their decision process [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. While recent research
on Explainable AI (XAI) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] has attacked this problem, DL models are still opaque and dificult
to interpret. In contrast, in Probabilistic Graphical Models (PGMs) the interactions between
diferent variables are encoded explicitly: the joint probability distribution  of the variables
of interest factorizes according to a graph , hence the "graphical" connotation. Bayesian
Networks (BNs) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], which we will describe in Section 3.1, are an instance of PGMs that can be
used as causal models. In turn, this makes them ideal to use as decision support systems and
overcome the limitations of the predictions based on probabilistic associations produced by
other ML models [
          <xref ref-type="bibr" rid="ref11">11, 12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Lymph Node Metastases in Endometrial Cancer Patients</title>
        <p>Background. The present paper focuses on the development of a BN predictive model for
endometrial cancer (EC). Endometrial cancer is cancer of the mucous lining, or endometrium,
of the uterus. It is a common gynecological disease afecting hundreds of thousands of women
worldwide. Although most patients with EC are diagnosed at an early stage of the disease
and have a favorable prognosis, approximately 90,000 patients around the world die every
year because of EC [13]. Surgery to remove the uterus (hysterectomy), possibly together with
the ovaries (ovariectomy), is the typical initial treatment for EC; the choice of neo-adjuvant
(pre-surgery) or adjuvant (post-surgery) treatments depends on patient outcome prognosis. The
presence of pelvic and/or para-aortic lymph node metastases (LNM) is one of the most important
prognostic factors for poor outcome. The identification of LNM during the primary treatment
makes it possible to choose a suitable adjuvant treatment and improve survival in node-positive
EC [14, 15]. However, no consensus exists on how to determine which patients will benefit
from lymphadenectomy (or lymph node dissection): this procedure is usually performed after
or concomitant with surgery to evaluate evidence for the spread of cancer, which helps the
medical team determine the progress of and treatment options for a patient’s malignancy). In
clinical early-stage EC, lymphadenectomy has been observed to have a marginal impact on
EC outcomes and to be associated with substantial long-term comorbidities. The diagnostic
accuracy for LNM is limited: approximately 50% of LNM is found in low- or intermediate-risk
patients [16, 17].</p>
        <p>Objectives. This work uses the BN model from Reijnen et al. [18] as a starting point to
improve the state of the art in two ways:
• Extending the BN model to include the hospital of treatment as an additional variable to
detect, estimate and control for potential selection bias.
• Addressing the bias introduced by the missing imputation step, which could induce
spurious correlations, hindering the interpretability of the discovered relationships.
• Developing a causal model that integrates domain expert knowledge with observational
data to better identify patients with EC designated as low or intermediate risk to develop
LNM, in order to support stakeholders for decision-making.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Individualized treatment aims to minimize unnecessary exposure to therapy-related morbidity
and at the same time ofers proper management according to patients’ risk-stratification. In the
context of EC, predicting the risk of LNM before surgical treatment has received limited attention
in the literature. Koskas et al. [19] evaluated the performance of BNs models within their cohort
of 519 patients. Only one model achieved an AUC greater than 0.75,1 highlighting the need for
improved pre-operative risk stratification. Subsequent works [ 20, 21, 22] identified biomarkers
such as p53 and L1CAM as potential prognostic predictors, together with patients baseline
comorbidities and tumors characteristics such as histology, grading and staging. More recently,
Reijnen et al. [18] developed a model for the prediction of LNM and of disease-specific survival
(DSS) in EC patients. This model, called ENDORISK, is a BN built on clinical, histopathological
and molecular biomarkers that can be assessed pre-operatively, allowing for patient counseling
and shared decision-making before surgery. ENDORISK was shown to be competitive in both
goodness of fit and predictive accuracy, achieving AUC values between 0.82 and 0.85 [23].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Causal Bayesian Networks</title>
        <p>Firstly, we will summarize those key definitions for BNs and causal models that we will need to
describe our contributions in Section 3.</p>
        <p>Definition 1 (Graph). A graph  =(V, E) is a mathematical object represented by a tuple of two
sets: a finite set of nodes V and a finite set of edges E ⊆ V × V. In the following pages (V, E) will
be omitted if not specified otherwise.</p>
        <p>We will focus on directed graphs where (,  ) ̸= (, ), which is graphically represented
as  →  . A directed graph encodes a set of ordinal relationships, i.e. in  →  the node 
is called parent of  and  is said to be the child of . Therefore, the set of parents of  is
Pa(), while the set of children of  is Ch().</p>
        <p>A directed path  is a finite ordered set of nodes  = (0 → · · · → ) such that each
adjacent pair of nodes (, +1) in  is a directed edge in E. A cycle is a path where the first
and the last node are the same node. A graph is acyclic if it contains no cycle, also called a
Directed Acyclic Graph (DAG).</p>
        <p>
          Definition 2 (Causal Graph). A causal graph  = (V, E) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is a graph that encodes the
cause-efect relationships of a system.
        </p>
        <p>Causes &amp; Efects. The set V contains the variables that describe the behavior of the system
under study, whereas the set E contains the edges that make explicit the interplay of the
variables. In particular, for each directed edge (,  ) ∈ E,  is said to be a direct cause of  ,
1The "Area Under the Curve", defined in page 7.
whereas  is called direct efect of . This definition is recursive: a variable  that is the direct
cause of , but not of  , is said to be an indirect cause of  .</p>
        <p>This mapping between a causal graph  and the cause-efect relationships is formalized by
the causal edge assumption [24].</p>
        <p>Definition 3 (Causal Edge Assumption). Let  = (V, E) be a causal graph. The value
assigned to each variable  ∈ V is completely determined by the function  given its parents:
 :=  (Pa())
∀ ∈ V</p>
        <p>The causal edge assumption allows us to interpret the edges of a causal graph in a
nonambiguous way: it enforces a recursive relationship over the structure of the graph, establishing
a chain of functional dependencies. Hence, this class of graphical models is inherently
explainable, even for researchers approaching them for the first time.</p>
        <p>When the causal graph is not known a priori, it is possible to recover it from a combination
of prior knowledge and data driven approaches. Such problem is called Causal Discovery [25].
Definition 4 (Causal Discovery). Let * be the true but unknown graph in the space of possible
graphs G from which the data set D has been generated. The Causal Discovery problem consists in
recovering * given the data set D and the prior knowledge K.</p>
        <p>
          Once the causal graph * is recovered, it is possible to build a PGM with the given structure.
For example, BNs [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] are a widely known type of PGM.
        </p>
        <p>Definition 5 (Bayesian Network). Let be  a DAG and let  (X) be a global probability
distribution with parameters Θ . A BN ℬ = (, Θ) is a model in which each variable of X is a vertex
of  and  (X) factorizes into local probability distributions according to :
 (X) = ∏︁  ( | Pa())
∈X
(1)
(2)</p>
        <p>The key diference between a BN and a Causal BN (CBN) is the semantic interpretation of its
edges. Indeed, in a CBN an edge represents a cause-efect relationship between two variables,
whereas the same edge in a BN entails only a probabilistic dependence.</p>
        <p>Definition 6 (Causal Bayesian Network).
ciated DAG  is a causal graph.</p>
        <p>A Causal BN ℬ = (, Θ) is a BN where the
asso</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Causal Discovery with Observational and Missing Data</title>
        <p>Causal discovery algorithms are usually divided into two classes: constraint-based and
scorebased. The two classes have been extended to handle missing data in diferent ways:
constraintbased algorithms rely on test-wise deletion [26] to perform conditional independence tests
eficiently in order to mitigate the impact of missing observations, while score-based approaches
alternate data imputation and causal discovery [27].</p>
        <p>Causal Discovery with Missing Data. By default, causal discovery algorithms are not
designed to handle incomplete data. However, we can combine them with missing value
imputation approaches to complete the data and reduce the problem to a standard causal
discovery. A widely-used application of this idea is the Expectation Maximization (EM) [28]
algorithm. In particular, the Structural EM [27] algorithm is specifically designed to iteratively
run the imputation step performed by EM and a causal discovery step performed by a score-based
algorithm, alternating them until convergence.</p>
        <p>Greedy Search: The Hill-Climbing Approach. A widely applied score-based algorithm for
causal discovery is Greedy Search (GS) [29]. GS traverses the space G of the possible DAGs
over the set of variables V, selecting the optimal graph * by a greedy evaluation of a function
, known as the scoring criterion. There are multiple strategies to implement GS, one of which
is called Hill-Climbing (HC). At its core, HC repeatedly applies three fundamental operations
to change the current recovered structure, moving from a graph to another, across the graphs
space G. These “moves” are the addition, deletion or reversal of an edge. If a move improves the
score , then the graph is updated accordingly. The procedure halts when no moves improve
the score and returns a DAG.</p>
        <p>While the graphs space G contains every graph that could be generated given the vertices
V, only a subset of them are compatible with the probability distribution induced by the
observed data. Moreover, not every graph compatible with said distribution is necessarily causal.
Therefore, it is possible to shrink the search space by adding constraints in terms of structural
properties, that is, by requiring or forbidding the existence of an edge in the optimal graph * .
Encoding Prior Knowledge. One could restrict the set of admissible graphs by encoding
prior knowledge through required or forbidden edge lists [30]. For instance, it is possible to
leverage expert knowledge to identify known relationships and encode them as required edges.
These lists can also encode a partial ordering when potential causes of other variables are known.</p>
        <p>For example, suppose that clinicians want to include their prior knowledge on the interaction
between biomarkers and LNM into the CBN. This inclusion would happen during the execution
of the causal discovery algorithm and, therefore, requires that the experts’ knowledge is encoded
programmatically. Causal discovery algorithms essentially learn a set of ordinal, parent-child
relationships: it is natural to encode prior knowledge in the same form. For instance, if we
know that p53 is not a direct cause of LNM, then the translation of such a concept would be
p53 ̸∈ Pa(LNM). If, on the other hand, we know that LNM is a direct cause of L1CAM then we
would have L1CAM ∈ Pa(LNM) . This is a direct consequence of the Causal Edge Assumption
(Definition 3). Even this simple example shows the flexibility of this approach, allowing to
encode diferent sources of prior knowledge without any restrictions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>Causal discovery algorithms provide a correct solution to the causal discovery problem in the
limit of the number of samples [31]. However, in real-world applications the available data are
ifnite, especially in medicine, where data samples are usually small. As a result, even small
amounts of noise in the data may result in a diferent structure. Therefore, it is important to
quantify our confidence in the presence of each edge in the causal BN, also called the "strength"
of an edge.</p>
      <sec id="sec-4-1">
        <title>Estimating Edge Strength: A Bootstrap Approach. The estimation of the strength of an</title>
        <p>edge was performed through a bootstrap approach [32]. Here, a custom version with Structural
EM is reported in Algorithm 1, described as follows. Line 1, the procedure takes as input a data
set D, prior knowledge K, hyperparameters  for Structural EM, number of bootstraps  and
number of samples to draw . Line 2, the confidence matrix C is initialized. Lines 3-6, the
data set D is re-sampled  times with replacement, drawing  observations for each bootstrap
following a uniform distribution. For each sampled data set D ⊆ D, the causal discovery
algorithm is applied to induce a corresponding graph . Finally, line 7, is responsible to compute
the strength of each edge as the relative frequency of inclusion across the  bootstraps.</p>
        <p>The causal discovery algorithm developed is described in Algorithm 2. Line 1 is based on the
confidence matrix estimation computed by Algorithm 1. Line 2, the causal graph  is initialized
to the empty graph and, line 3, the associated confidence matrix C, i.e. the matrix containing
the edges strength, is computed. Line 4 describes a generic strategy to select the edges to insert
into  given C. Here, we relied on a threshold  to filter irrelevant edges to build the “average
graph”. Lines 5-6, the CBN parameters Θ are fitted given  by applying EM [28] to the data set
D with missing data.</p>
        <p>Definition and Selection of Variables. To conduct this analysis we used the cohort
presented by Reijnen et al. An overview of the cohort and the procedures done for data collection
can be found in [18]. Briefly, the retrospective multicenter cohort study included 763 patients,
with a median age 65 years, surgically treated for endometrial cancer between 1995 and 2013
at one of the 10 participating European hospitals. Clinical and histopathological variables
with prognostic value for the prediction of LNM were identified by a systematic review of the
literature. The used variables could be divided into three major temporal tiers:
• Pre-operative clinical, histopathological variables and biomarkers: Estrogen Receptor (ER)
expression, Progesteron Recepter (PR) expression, L1CAM (cell migration) expression, p53
(tumour suppressor gene) expression, cervical cytology, platelets counts (thrombocytosis),
lymphadenopathy on MRI or CT, lymphovascular space invasion (LVSI), Ca-125 serum
levels and pre-operative tumor grade,
• Post-operative/treatment variables: adjuvant therapy (Chemotherapy and/or Radiotherapy),
post-operative tumour grade,
• Late post-operative outcomes: 1-,3-,5-year disease-specific survival (DSS), Lymph Nodes</p>
        <p>Metastases (LNM), Myometrial Invasion.</p>
        <p>All the described variables are discrete variables, with cardinality ranging from 2 to 3. Two
main changes were done in comparison to published works: addition of hospital of treatment
(10 levels) in the model and separation of adjuvant therapy into two diferent dichotomous
variables (chemotherapy and radiotherapy).</p>
        <p>Training and Testing. The data set D was split in a train set and a test set following a 70/30
ratio. For each configuration of hyperparameters (, , ,  ), we applied Algorithm 2 to the
train set, with the same prior knowledge K. The resulting BNs were evaluated on the test set
by estimating the probability of LNM. The hyperparameter tuning was performed following a
grid search, as suggested in [31]. While cross validation (CV) is generally preferred over a naïve
train-test splitting, hyperparameter tuning over a learning procedure based on Structural EM
is computationally expensive and, therefore, it would require a nonignorable amount of time
when coupled with CV. Moreover, we considered the possibility to further split the train set to
obtain a validation set, but the reduced sample size hindered the feasibility of this additional
step. Finally, we computed the sensitivity, specificity, ROC and AUC 2 for each CBN model.
Definition 7 (Sensitivity &amp; Specificity). Given a binary classification problem, the confusion
matrix is a 2 × 2 squared integer matrix resulting from the application of a classification algorithm.
The values on the main diagonal are called true positives (  ) and true negatives (  ), while
the values on the of diagonal are false positives (   ) and false negatives (  ). Then, the true
positive ratio (  ) and the true negative ratio (  ) are defined as follows:
   =</p>
        <p>+  
(3)
   =</p>
        <p>+  
(4)
The    and    are also called sensitivity and specificity, respectively.</p>
        <p>Definition 8 (ROC &amp; AUC). The Receiving Operating Characteristic (ROC) curve is a plot of
sensitivity and (1 - specificity) measures at diferent thresholds. The Area Under the Curve (AUC) is
the area under the ROC curve.</p>
        <p>Algorithm 1 Confidence matrix from missing data and prior knowledge.</p>
        <p>1: procedure ConfidenceMatrix(D, K, , ,  )
2: C ← 0 ◁ Initialize a |V| × |
3: for  ∈ [1, ] do
4: D ← Sample(D, )
5:  ← StructuralEM(D, K,  )
6: C[,  ] ← C[,  ] + 1, ∀ (,  ) ∈ E
7: C ← C/
8: return C
V| matrix, with V the variables in D.</p>
        <p>◁ Sample from D with replacement.</p>
        <p>◁ Learn  from D and K.</p>
        <p>◁ Increment the edge count.
◁ Normalize the confidence matrix.</p>
        <p>2We are aware of the issues related to the selection of a causal model based on its classification performances,
hence, we took into account both in-sample and out-of-sample metrics.</p>
        <p>0.9
0.8
C
U
A
e
lp0.7
m
a
s
−
f
o
−
t
uO0.6
0.5
0.85
0.80
Parents space
cardinality
6
5
4
3
2
1
0
Algorithm 2 Learn CBN from missing data and prior knowledge.</p>
        <p>1: procedure CBN(D, K, , , ,  )
2:  ← (V, ∅) ◁ Initialize an empty graph over the variables V in D.
3: C ← ConfidenceMatrix(D, K, , ,  ) ◁ Compute the confidence matrix.
4: Insert edges into  following a strategy w.r.t. C and  .
5: Θ ˆ ← EM(, D) ◁ Estimate the parameters using EM.
6: ℬ ← (, Θ ˆ ) ◁ Build the CBN given  and Θ ˆ .
7: return ℬ</p>
        <p>In−sample vs. Out−of−sample AUC Scatterplot</p>
        <p>Figure 1 is a scatter plot of the results of the execution of Algorithm 2. The color mapping allows
to clearly distinguish three well-separated clusters, grouped by the parents space cardinality.
Specifically, the red cluster represents the models where LNM has no parents, the light-red
cluster contains models where LNM has only Chemotherapy as parent, and finally the blue
cluster where both Chemotherapy and Histology are parents of LNM.</p>
        <p>The structure presented in Figure 2 is built by encoding prior causal knowledge elicited
by clinicians and randomized controlled trials (RCTs). The encoding process is performed by
adding a directed edge from the expected cause to its efect. Each edge addition is supported by
biological and physiological knowledge, either obtained by querying experts or from reviewed
literature, without observational data. Note, for example, that the Therapy node does not have
incoming edges, since therapy is always assigned at random in an RCT (and only the outcome
matters).</p>
        <p>The graph presented in Figure 3 is the result of the application of Algorithm 2 on the collected
data set and encoded prior knowledge based on partial temporal ordering of variables. The
ER</p>
        <p>PR
PostoperativeGrade</p>
        <p>Cytology</p>
        <p>Therapy
MyometrialInvasion</p>
        <p>LVSI</p>
        <p>LNM
Platelets</p>
        <p>CA125
L1CAM
p53</p>
        <p>CTMRI</p>
        <p>Recurrence</p>
        <p>Survival1yr</p>
        <p>Survival3yr
Survival5yr
Therapy node is split into Radiotherapy and Chemotherapy to highlight the diferent impact of
adjuvant treatments.</p>
        <p>The two graphs share a common subset of edges, e.g. the ones related to Recurrence and
Survivals. A major diference stands in the edges related to the biomarkers cluster. Indeed, while
in Figure 2 biomarkers, such as p53, CA125 and L1CAM, are assumed to be strongly related
to LNM, in the recovered graph the PreoperativeGrade is observed as common parent of the
variables contained in such cluster. Moreover, no biomarker is directly connected to LNM, not
as a parent nor as a child, calling for further analyses of the collected data.</p>
        <p>Such similarities and diferences also appear in Figure 4, where Hospital is introduced to
explore the potential presence of latent efects and selection bias. While the graph in Figure 4
is not completely diferent from the one in Figure 3 in terms of observed substructures, the
latter encodes diferent independence statements due to the presence of the newly introduced
Hospital.</p>
        <p>The crucial diference stands in the semantic interpretation of Hospital, which in this case is
not to be intended as a direct cause of its children, but rather as a proxy for others unobserved
variables or biases, i.e. a context variable. Indeed, while it could be that population heterogeneity
across hospitals afects the choice of adjuvant treatments, it would be nonsensical to conclude</p>
        <p>Platelets
Radiotherapy
p53</p>
        <p>ER
PR
that Hospital is a cause of Ca-125. Nonetheless, the causal discovery procedure includes a
set of edges that are related to spurious associations present in the data set. For example, the
directed edge that connects Hospital to p53 is an instance of such pattern, which could be
caused by a missing-not-at-random (MNAR) mechanism [33]. Another example of the impact
of biases is represented by the directed edge from Hospital to PostoperativeGrade. In this case,
an unbalanced distribution of patients’ grading across geographical regions, which Hospital is a
proxy of, could act as a potential source of selection bias [34].</p>
        <p>The ROC curve depicted in Figure 5 is obtained by predicting the probability of the LNM class
on the test set, given the CBN fitted on the structure in Figure 4 and the train set. It achieves
an AUC of 0.883, with associated 95% CI 0.775-0.991, which is higher than the one obtained in
Hospital
[18], although it was not possible to compare the metrics using a significance test due to the
diferent test sets.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Works</title>
      <p>Given the known limitations of data-driven approaches when applied to observational data,
causal discovery techniques are used to explore and mitigate the impact of spurious associations
during the learning process. In this work we explored the task of learning a causal representation
to assess the pre-operative risk of developing LNM in endometrial cancer patients. Furthermore,
the recovered models were extended to include information from context variables, aiming to
uncover previously unobserved efects.</p>
      <p>The resulting procedure takes advantage of pre-existing techniques to reduce the bias
introduced during the imputation step in a bootstrap approach. This enabled us to compute
the strength of the observed associations in the obtained models across multiple re-sampled
instances, allowing a step of model averaging to recover less frequent substructures. The risk
assessment is performed by predicting the probability of developing LNM using a CBN fitted
on the recovered structure and given train set, showing an increased AUC over previous works.</p>
      <p>Still, we highlighted a set of potential issues that need to be addressed in future works.
Missingness Mechanism. With the introduction of the Hospital variable we observed a set
of edges that hint to the presence of a potential missing-not-at-random pattern. If this is the
case, then it would require careful consideration in order to reduce the bias introduced during
the missing imputation step.</p>
      <p>Efect of Adjuvant Therapy. Once a causal graph is obtained, it is theoretically possible to
estimate the causal efect of each adjuvant therapy, either single or combined, on the
development of LNM. Before directly computing the the efect, there are assumptions that need to be
carefully verified, e.g. positivity, consistency, unconfoundedness and non interference [24].
Impact of Selection Bias. While it is clear that observing an association between Hospital
and other variables it is not suficient to conclude that, indeed, there is a selection bias, it is a
strong hint that there are other unobserved variables that influence the causal mechanism. It
could be interesting to assess which is the impact of the selection bias mediated by the Hospital
variable alone.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors would like to thank anonymous reviewers for taking the time and efort necessary
to review the manuscript. We sincerely appreciate all valuable comments and suggestions,
which helped us to improve the quality of the manuscript.</p>
      <p>Alessio Zanga was granted a Ph.D. scholarship by F. Hofmann-La Roche Ltd.
of Causal Inference, in: Probabilistic and Causal Inference: The Works of Judea Pearl, 2020,
pp. 1–62. URL: https://causalai.net/r60.pdf.
[12] S. Lee, E. Bareinboim, Structural causal bandits: Where to intervene?, in: Advances in</p>
      <p>Neural Information Processing Systems, volume 2018-Decem, 2018, pp. 2568–2578.
[13] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, A. Jemal, Global cancer
statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers
in 185 countries, CA: A Cancer Journal for Clinicians 68 (2018) 394–424. doi:10.3322/
caac.21492.
[14] S. M. de Boer, M. E. Powell, L. Mileshkin, et al., Adjuvant chemoradiotherapy versus
radiotherapy alone in women with high-risk endometrial cancer (portec-3): patterns of
recurrence and post-hoc survival analysis of a randomised phase 3 trial, The Lancet
Oncology 20 (2019) 1273–1285. doi:10.1016/S1470-2045(19)30395-X.
[15] D. Matei, V. Filiaci, M. E. Randall, et al., Adjuvant chemotherapy plus radiation for locally
advanced endometrial cancer, New England Journal of Medicine 380 (2019) 2317–2326.
doi:10.1056/NEJMoa1813181.
[16] S. Bendifallah, G. Canlorbe, P. Collinet, E. Arsène, F. Huguet, C. Coutant, D. Hudry,
O. Graesslin, E. Raimond, C. Touboul, E. Daraï, M. Ballester, Just how accurate are the
major risk stratification systems for early-stage endometrial cancer?, British Journal of
Cancer 112 (2015) 793–801. doi:10.1038/bjc.2015.35.
[17] J. Trovik, E. Wik, H. M. Werner, C. Krakstad, H. Helland, I. Vandenput, T. S. Njolstad,
I. M. Stefansson, J. Marcickiewicz, S. Tingulstad, A. C. Staf, F. Amant, L. A. Akslen, H. B.
Salvesen, Hormone receptor loss in endometrial carcinoma curettage predicts lymph node
metastasis and poor outcome in prospective multicentre trial, European Journal of Cancer
49 (2013) 3431–3441. doi:10.1016/j.ejca.2013.06.016.
[18] C. Reijnen, E. Gogou, N. C. Visser, H. Engerud, J. Ramjith, L. J. Van Der Putten, K. Van
De Vijver, M. Santacana, P. Bronsert, J. Bulten, M. Hirschfeld, E. Colas, A. Gil-Moreno,
A. Reques, G. Mancebo, C. Krakstad, J. Trovik, I. S. Haldorsen, J. Huvila, M. Koskas,
V. Weinberger, M. Bednarikova, J. Hausnerova, A. A. Van Der Wurf, X. Matias-Guiu,
F. Amant, L. F. Massuger, M. P. Snijders, H. V. Kusters-Vandevelde, P. J. Lucas, J. M.
Pijnenborg, Preoperative risk stratification in endometrial cancer (ENDORISK) by a
Bayesian network model: A development and validation study, PLoS Medicine 17 (2020).
doi:10.1371/journal.pmed.1003111.
[19] M. Koskas, M. Fournier, A. Vanderstraeten, F. Walker, D. Timmerman, I. Vergote, F. Amant,
Evaluation of models to predict lymph node metastasis in endometrial cancer: A
multicentre study, European Journal of Cancer 61 (2016) 52–60. doi:10.1016/j.ejca.2016.03.
079.
[20] G. Getz, S. B. Gabriel, K. Cibulskis, et al., Integrated genomic characterization of
endometrial carcinoma, Nature 497 (2013). doi:10.1038/nature12113.
[21] F. K. Kommoss, A. N. Karnezis, F. Kommoss, A. Talhouk, F. A. Taran, A. Staebler, C. B.</p>
      <p>Gilks, D. G. Huntsman, B. Krämer, S. Y. Brucker, J. N. McAlpine, S. Kommoss, L1cam
further stratifies endometrial carcinoma patients with no specific molecular risk profile,
British Journal of Cancer 119 (2018). doi:10.1038/s41416-018-0187-6.
[22] L. J. V. D. Putten, N. C. Visser, K. V. D. Vijver, M. Santacana, P. Bronsert, J. Bulten,
M. Hirschfeld, E. Colas, A. Gil-Moreno, A. Garcia, G. Mancebo, F. Alameda, J. Trovik,
R. K. Kopperud, J. Huvila, S. Schrauwen, M. Koskas, F. Walker, V. Weinberger, L. Minar,
E. Jandakova, M. P. Snijders, S. V. D. B.-V. Erp, X. Matias-Guiu, H. B. Salvesen, F. Amant,
L. F. Massuger, J. M. Pijnenborg, L1cam expression in endometrial carcinomas: An enitec
collaboration study, British Journal of Cancer 115 (2016). doi:10.1038/bjc.2016.235.
[23] P. Vinklerová, P. Ovesná, J. Hausnerová, J. M. A. Pijnenborg, P. J. F. Lucas, C. Reijnen,
S. Vrede, V. Weinberger, External validation study of endometrial cancer preoperative
risk stratification model (endorisk), Frontiers in Oncology 12 (2022). doi: 10.3389/fonc.
2022.939226.
[24] J. Pearl, M. Glymour, N. P. Jewell, Causal inference in statistics: A primer, John Wiley \&amp;</p>
      <p>Sons, 2016.
[25] A. Zanga, E. Ozkirimli, F. Stella, A Survey on Causal Discovery: Theory and Practice,
International Journal of Approximate Reasoning 151 (2022) 101–129. doi:10.1016/J.</p>
      <p>IJAR.2022.09.004.
[26] E. V. Strobl, S. Visweswaran, P. L. Spirtes, Fast causal inference with non-random
missingness by test-wise deletion, International Journal of Data Science and Analytics 6 (2018)
47–62. doi:10.1007/S41060-017-0094-6.
[27] N. Friedman, The Bayesian Structural EM, Proceedings of the Fourteenth Conference on
Uncertainty and Artificial Intelligence (1998). URL: http://www.cs.huji.ac.il/~nir/Papers/
Fr2.pdf.
[28] S. L. Lauritzen, The EM algorithm for graphical association models with missing data,
Computational Statistics and Data Analysis 19 (1995). doi:10.1016/0167-9473(93)
E0056-A.
[29] M. Scutari, C. Vitolo, A. Tucker, Learning Bayesian networks from big data with
greedy search: computational complexity and eficient implementation, Statistics
and Computing 29 (2019) 1095–1108. URL: https://doi.org/10.1007/s11222-019-09857-1.
doi:10.1007/s11222-019-09857-1.
[30] C. Meek, Strong Completeness and Faithfulness in Bayesian Networks (2013). URL:
http://arxiv.org/abs/1302.4973.
[31] P. Spirtes, C. N. Glymour, R. Scheines, D. Heckerman, Causation, prediction, and search,</p>
      <p>MIT press, 2000.
[32] N. Friedman, M. Goldszmidt, A. Wyner, Data Analysis with Bayesian Networks: A</p>
      <p>Bootstrap Approach (2013). URL: http://arxiv.org/abs/1301.6695.
[33] M. Scutari, Bayesian network models for incomplete and dynamic data (2020). doi:10.</p>
      <p>1111/stan.12197.
[34] K. M. Esterling, D. Brady, E. Schwitzgebel, The Necessity of Construct and External Validity
for Generalized Causal Claims, OSF Preprints (2021) 1–41. URL: https://osf.io/2s8w5/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kaul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Enslin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Gross</surname>
          </string-name>
          , History of artificial intelligence in medicine,
          <source>Gastrointestinal Endoscopy</source>
          <volume>92</volume>
          (
          <year>2020</year>
          )
          <fpage>807</fpage>
          -
          <lpage>812</lpage>
          . doi:
          <volume>10</volume>
          .1016/J.GIE.
          <year>2020</year>
          .
          <volume>06</volume>
          .040.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Troyanskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Trajanoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Carpenter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Razavian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oliver</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence and cancer</article-title>
          .,
          <source>Nature cancer 1</source>
          (
          <year>2020</year>
          )
          <fpage>149</fpage>
          -
          <lpage>152</lpage>
          . URL: http://www.ncbi.nlm.nih. gov/pubmed/35122011. doi:
          <volume>10</volume>
          .1038/s43018-020-0034-6.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges</article-title>
          ,
          <source>Cancer Letters</source>
          <volume>471</volume>
          (
          <year>2020</year>
          )
          <fpage>61</fpage>
          -
          <lpage>71</lpage>
          . URL: https://linkinghub. elsevier.com/retrieve/pii/S0304383519306135. doi:
          <volume>10</volume>
          .1016/j.canlet.
          <year>2019</year>
          .
          <volume>12</volume>
          .007.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Elemento</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leslie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lundin</surname>
          </string-name>
          , G. Tourassi,
          <article-title>Artificial intelligence in cancer research, diagnosis and therapy</article-title>
          ,
          <source>Nature Reviews Cancer</source>
          <volume>21</volume>
          (
          <year>2021</year>
          )
          <fpage>747</fpage>
          -
          <lpage>752</lpage>
          . URL: https://www. nature.com/articles/s41568-021-00399-1. doi:
          <volume>10</volume>
          .1038/s41568-021-00399-1.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence in cancer therapy</article-title>
          .,
          <string-name>
            <surname>Science</surname>
          </string-name>
          (New York, N.Y.)
          <volume>367</volume>
          (
          <year>2020</year>
          )
          <fpage>982</fpage>
          -
          <lpage>983</lpage>
          . URL: http://www.ncbi.nlm.nih.gov/pubmed/32108102. doi:
          <volume>10</volume>
          .1126/science.aaz3023.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hosny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. B.</given-names>
            <surname>Schabath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Giger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Birkbak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mehrtash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Allison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Arnaout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Abbosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Dunn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Mak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Tamimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Tempany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Swanton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Hofmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Gillies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J. W. L.</given-names>
            <surname>Aerts</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence in cancer imaging: Clinical challenges and applications, CA: A Cancer Journal for Clinicians (2019) caac</article-title>
          .21552. URL: https://onlinelibrary.wiley.com/doi/abs/10.3322/caac. 21552. doi:
          <volume>10</volume>
          .3322/caac.21552.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Holzinger</surname>
          </string-name>
          , G. Langs,
          <string-name>
            <given-names>H.</given-names>
            <surname>Denk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zatloukal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Causability and explainability of artificial intelligence in medicine</article-title>
          ,
          <source>WIREs Data Mining and Knowledge Discovery</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <article-title>e1312</article-title>
          . URL: https://onlinelibrary.wiley.com/doi/10.1002/widm.1312. doi:
          <volume>10</volume>
          .1002/widm. 1312.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pumplun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fecho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Wahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Buxmann</surname>
          </string-name>
          ,
          <article-title>Adoption of Machine Learning Systems for Medical Diagnostics in Clinics: Qualitative Interview Study</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>23</volume>
          (
          <year>2021</year>
          )
          <article-title>e29301</article-title>
          . URL: https://www.jmir.org/
          <year>2021</year>
          /10/e29301. doi:
          <volume>10</volume>
          .2196/29301.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stefik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Stumpf</surname>
          </string-name>
          , G.-
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <source>XAI-Explainable artificial intelligence</source>
          ,
          <source>Science Robotics</source>
          <volume>4</volume>
          (
          <year>2019</year>
          ). URL: https://www.science.org/doi/10.1126/ scirobotics.aay7120. doi:
          <volume>10</volume>
          .1126/scirobotics.aay7120.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <surname>BAYESIAN NETWORKS</surname>
          </string-name>
          ,
          <source>Technical Report</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bareinboim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Correa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ibelind</surname>
          </string-name>
          , T. Icard,
          <article-title>On Pearl's Hierarchy and the Foundations</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>