Case Study on the Development of a Recommender for
Apple Disease Diagnosis with a Knowledge-based Bayesian
Network
Gabriele Sottocornola1 , Sanja Baric1 , Fabio Stella2 and Markus Zanker1
1
    Free University of Bozen-Bolzano, Piazza Università, 1, 39100 Bolzano, Italy
2
    University of Milano-Bicocca, Piazza dell’Ateneo Nuovo, 1, 20126 Milano, Italy


                                             Abstract
                                             This paper presents a case-study of a knowledge-based recommender system capable to diagnose post-harvest diseases of
                                             apples. It describes the process of knowledge elicitation and construction of a Bayesian Network reasoning system as well as
                                             its evaluation with three different types of studies involving diseased apples. The ground truth of diseased instances has
                                             been established by genome sequencing in a lab. The paper demonstrates the performance differences of knowledge-based
                                             reasoning mechanisms due to different users interacting with the system under different conditions and proposes methods
                                             for boosting the performance by likelihood evidence learned from the estimated consensus of users’ and expert’s interactions.

                                             Keywords
                                             Case Study in Agriculture, Knowledge-based Recommendation, Bayesian Network, Likelihood Evidence


1. Introduction                                                                                                       propose BN-DSSApple a decision support system based
                                                                                                                      on the framework of Bayesian Networks (BN), a graphical
Apple trees are the most common temperate fruit tree                                                                  probabilistic method to reason about uncertainty rela-
species, since their fruits can be stored for prolonged pe-                                                           tionships among symptoms, signs, and diseases. The user
riods of time under controlled atmosphere conditions.                                                                 observation (i.e., the evidence) is elicited incrementally
However, physiological disorders and pathogenic mi-                                                                   through an adaptive question-answering interface, illus-
croorganisms can deteriorate the quality and quantity of                                                              trated by visual explanation of the requested information
the production during storage, and lead to considerable                                                               in order to facilitate user understanding. Furthermore,
economic losses [1]. For instance, in Northern Europe,                                                                we illustrate the process adopted to build the diagnos-
storage losses due to pathogenic microorganisms were                                                                  tic knowledge base with the help of a domain expert in
estimated to reach up to 10% in integrated production                                                                 the field of post-harvest apple diseases. We analyse and
and up to 30% in organic production [2]. Therefore, an                                                                address the problem of transferability of such an expert
effective knowledge-based recommender system, able to                                                                 model to a larger cohort of users with different exper-
timely suggest a correct diagnosis of diseases manifested                                                             tise levels. We thoroughly tested BN-DSSApple under
on stored apples, is of crucial importance. For instance,                                                             different experimental conditions, simulated in 3 user
it depends on the exact pathogen species to decide on                                                                 studies, to prove the effectiveness of the system and its
the right strategy for immediate damage containment                                                                   transferability across different environments.
and/or to recommend a plant protection scheme for the                                                                    The methodological contribution of this case study is
following year. In order to reliably determine the na-                                                                organized according to this pipeline: a) in Section 3.1,
ture of the disease, several macroscopic symptoms, such                                                               we describe the application domain and the implemented
as appearance, color, texture and consistency of the rot                                                              BN-DSSApple system; b) in Section 3.2, we illustrate the
need to be considered by the system. Hence, we should                                                                 process of knowledge elicitation from a domain expert for
provide a practical interface to elicit user feedback on                                                              crafting the knowledge base of the BN; c) in Section 3.3,
manifested symptoms on a diseased apple in order to                                                                   we formalize the recommendation mechanism responsi-
guide the reasoning to recommend a diagnosis. Thus, we                                                                ble for the suggestion of a suitable diagnosis given the
                                                                                                                      user feedback; d) in Section 3.4, we define the trasportabil-
3rd Edition of Knowledge-aware and Conversational Recommender
Systems (KaRS) & 5th Edition of Recommendation in Complex                                                             ity problem of a knowledge-based model and we propose
Environments (ComplexRec) Joint Workshop @ RecSys 2021,                                                               a possible solution, exploiting the so-called likelihood
September 27–1 October 2021, Amsterdam, Netherlands                                                                   evidence.
Envelope-Open gsottocornola@unibz.it (G. Sottocornola); sanja.baric@unibz.it
(S. Baric); fabio.stella@unimib.it (F. Stella); markus.zanker@unibz.it
(M. Zanker)
Orcid 0000-0001-9983-2330 (G. Sottocornola)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
2. Background                                                                3. Methodology
A Bayesian Network (BN) [3, 4] is defined by its two                         3.1. System Description
main components: the qualitative part represented by its
graphical structure and the quantitative part consisting                     The presented knowledge-based decision support system,
of the conditional probabilities. More formally, a BN is                     named BN-DSSApple, is conceptualized as an interactive
graphically represented as a directed acyclic graph (DAG)                    easy-to-use web application that allows users with dif-
𝒢 = (𝑁 , 𝐸), where 𝑁 = {𝑛1 , 𝑛2 , … , 𝑛𝑙 } denotes the set                   ferent levels of domain expertise in the area of apple
of 𝑙 nodes and 𝐸 ⊆ 𝑁 × 𝑁 the set of directed edges be-                       production (e.g., farmers, quality controllers, and storage
tween pairs of nodes. Each node 𝑛𝑖 ∈ 𝑁 in the DAG 𝒢                          workers), to perform in-field diagnosis of post-harvest
is mapped one-to-one with a random variable 𝑋𝑖 ∈ 𝒳,                          diseases of apple fruit, relying solely on the observed
where 𝒳 denotes the set of random variables involved                         macroscopic symptoms on the stored fruit. The system
in the model. A random variable 𝑋𝑖 ∈ 𝒳 is represented                        is designed as a recommender engine which collects the
by a set of exclusive values (or states) in which the vari-                  feedback of the user (i.e., the evidence) on a specific apple
able might be observed 𝑉 𝑎𝑙(𝑋𝑖 ) = {𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑚 }, where             fruit (i.e., the target apple), in order to suggest a suitable
  𝑗                                                                          diagnosis (i.e., a set of recommended diseases). The rea-
𝑥𝑖 ∈ 𝑉 𝑎𝑙(𝑋𝑖 ) denotes the 𝑗-th value of variable 𝑋𝑖 . We use
                       𝑗                                                     soning mechanism is performed by a Bayesian Network
the notation 𝑋𝑖 = 𝑥𝑖 for an observed event, to express that                  (BN) based on an ad-hoc knowledge base, constructed
variable 𝑋𝑖 ∈ 𝒳 is observed (or instantiated) in the state                   with the help of a domain expert (as described in 3.2).
  𝑗
𝑥𝑖 ∈ 𝑉 𝑎𝑙(𝑋𝑖 ). A conditional probability table (CPT) is asso-                  Specifically, the system collects user’s feedback about
ciated to each random variable 𝑋𝑖 ∈ 𝒳. The CPT specifies                     the target apple by asking a set of dynamic multiple-
the conditional probability distribution 𝑃(𝑋𝑖 |𝑝𝑎(𝑋𝑖 )) ∈ 𝒫                  choice questions related to the macroscopic features of
over the states of 𝑋𝑖 . Where, 𝒫 represents the set of                       the observed symptoms (e.g., the shape of the rot, the ori-
conditional probabilities in the model, and 𝑝𝑎(𝑋𝑖 ) ⊂ 𝒳                      gin of the infection, etc.). Each question is illustrated with
denotes the set of the so-called parents of the variable 𝑋𝑖                  exemplary pictures, facilitating also non-expert users in
associated to the node 𝑛𝑖 in the DAG 𝒢. Specifically, the                    their understanding. Each question is mapped to a spe-
parent set of 𝑋𝑖 is composed by every variable 𝑋𝑗 ∈ 𝒳                        cific variable in the BN model. This part of the system is
associated to the node 𝑛𝑗 in the DAG 𝒢, connected with                       dynamic, since the system incrementally adapts the ques-
a directed edge to 𝑛𝑖 (the so-called child node). More for-                  tions path based on the previous answers given by the
mally, 𝑝𝑎(𝑋𝑖 ) = {𝑋𝑗 ∈ 𝒳 ∶ (𝑛𝑗 , 𝑛𝑖 ) ∈ 𝐸}. We can further                   user. For instance, when the system gets the information
define an ancestor variable 𝑎𝑛(𝑋𝑖 ) of the variable 𝑋𝑖 , and                 that spores are visible on the infected apple, it will inquiry
a descendant variable 𝑑𝑒(𝑋𝑖 ) of variable 𝑋𝑖 , if exists a di-               the user about further features of those spores (i.e., their
rected path (i.e., a set of directed edges) connecting node                  mass distribution, colour, and origin). Furthermore, the
𝑛𝑎 (associated with variable 𝑎𝑛(𝑋𝑖 )) to 𝑛𝑖 (associated with                 system provides full flexibility to the user, i.e., it allows
variable 𝑋𝑖 ), and 𝑛𝑖 to 𝑛𝑑 (associated with variable 𝑑𝑒(𝑋𝑖 ));              to navigate the questions path back and forth in order to
namely {(𝑛𝑎 , 𝑛𝑗 ), (𝑛𝑗 , 𝑛𝑖 ), (𝑛𝑖 , 𝑛ℎ ), … , (𝑛𝑔 , 𝑛𝑑 )} ⊂ 𝐸. It is im-   revise previous answers, to provide multiple answers, or
portant to mention that the DAG 𝒢 of the BN specifies                        to skip questions in case of lacking confidence.
a set of probabilistic relationships among variables in
the model. Namely, if an edge (𝑛𝑗 , 𝑛𝑖 ) ∈ 𝐸 exists in the
graph, this generally implies that a causal relation holds                   3.2. Knowledge Elicitation for Bayesian
between the variables 𝑋𝑗 and 𝑋𝑖 , associated to nodes 𝑛𝑗                          Network
and 𝑛𝑖 . Specifically, we typically assume that the parent                   In order to build a diagnostic reasoning system based on
𝑋𝑗 represents the cause and child 𝑋𝑖 represents the ef-                      Bayesian network (i.e., both the network structure and
fect in the domain. Thus, a fundamental assumption of                        the CPTs) two options are available: learn from the data
conditional (in)dependence between variables could be                        or elicit the knowledge from the domain literature or the
derived. This assumption is the Local Markov Assump-                         experts, or any combination of the above. At the best
tion (or Local Independence Assumption), and it states                       of our knowledge, no datasets are publicly available to
that: given its parents 𝑝𝑎(𝑋𝑖 ) ⊂ 𝒳, defined in the DAG 𝒢,                   learn significant relationships among apple diseases and
a variable 𝑋𝑖 is conditionally independent of all its non-                   macroscopic symptoms. Thus, we started by analysing a
descendant variables. More formally, for each variable 𝑋𝑖 :                  large OWL ontology which captures the entire life cycle
(𝑋𝑖 ⟂ 𝑋𝑗 |𝑝𝑎(𝑋𝑖 )), where 𝑋𝑗 ∉ 𝑑𝑒(𝑋𝑖 ), set of descendants of                of apple cultivation, production, handling, and storage,
𝑋𝑖 . This property allows to specify the joint distribution                  presented in [5]. Hence, we extracted a smaller quantita-
over the space of the variables 𝒳 in the BN model through                    tive part of the presented ontology suitable for our goal,
                                                        𝑙
the probability factorization 𝑃(𝒳 ) = ∏𝑖=1 𝑃(𝑋𝑖 |𝑝𝑎(𝑋𝑖 )),                   which allows a simple reasoning mechanism connecting
usually referred to as the chain rule for Bayesian networks.                 symptoms to diseases, thanks to a set of SWRL rules [6].
The graphical structure of this ontology is represented in       of advancement of the post-harvest infection, namely
Figure 1. At the best of our knowledge, the difficult task       Val(Stage) = {early, medium, late}. This workaround al-
to (semi)-automatically construct a BN from a domain             lows the expert to visualize a specific condition of the
ontology is still under-explored in the literature. Few          disease and thus specify a more reliable likelihood of the
practical, heuristic solutions can be found [7, 8], which        symptoms.
can hardly be applied to our case. The main limitation of           The final BN-DSSApple graph is reported in Figure 2.
such an effort lays in the fact that the two frameworks dif-     The central nodes in the network, bolded and empty, rep-
fer in the purpose they are used for. An ontology is more        resent the two hidden diagnosis variables, namely Disease
suitable to describe concepts and qualitative relationships      and Stage. On the top part of the network, coloured in
(of different nature), while the BN requires quantitative        grey, are the nodes related to the lesion properties. On
definitions (i.e., probabilistic) of correlation relationships   the right-most part, colored in yellow, are the rot prop-
related to the reasoning mechanism of phenomena [9].             erties, while on the left-most part, colored in green, are
                                                                 the lesion origin nodes. Finally, in the central-bottom
                                                                 part, colored in orange, are represented the nodes related
                                                                 to the lesion type and other symptoms, under those, col-
                                                                 ored in cyan, the nodes representing the properties of
                                                                 the other symptoms.


Figure 1: The initial ontology for BN-DSSApple.


   We overcame this problem by directly interviewing
a domain expert for the construction of the knowledge
base. Specifically, we divided this task into two distinct       Figure 2: The graph of the Bayesian Network for DSSApple.
phases: during the first phase, we identified the random
variables (i.e., the macroscopic symptoms) which are rel-
evant in the diagnostic process; during the second phase,           In the second phase, we interviewed the domain expert
we determined the probability values (i.e., the CPTs) quan-      in order to define the quantitative probabilistic dependen-
titatively linking the diseases to the symptoms. We firstly      cies among variables. For simplicity, we decided to start
asked the domain expert to review the available ontol-           from a situation where all the symptom variables are con-
ogy, enrich and adapt it in order to obtain an effective         ditionally independent among each other, given the states
tool for the diagnosis of post-harvest diseases of apple         of Disease and Stage. Furthermore, they all depends from
based on visible macroscopic symptoms on it. After few           the two hidden variables responsible for the assessment
rounds of interaction, we agreed with a set of 27 discrete       of the diagnosis (i.e., Disease and Stage). We indicate the
random variables (12 boolean and 15 categorical) related         Disease variable as 𝐷 ∈ 𝒟, where 𝒟 defines the set of
to macroscopical symptoms and signs that could be ob-            hidden variables for the model. 𝑉 𝑎𝑙(𝐷) = {𝑑 1 , 𝑑 2 , … 𝑑 𝑛 }
served on the infected apple skin and pulp, together with        represents the set of states of the variable 𝐷, where 𝑑 𝑖 is
two hidden (target) variables, namely Disease and Stage.         the 𝑖-th state of the Disease variable (i.e., the 𝑖-th disease
We assumed that a target apple could be infected by one          in our pool). The Stage variable is referred as 𝑇 ∈ 𝒟
and only one disease and thus, the random variable Dis-          and 𝑉 𝑎𝑙(𝑇 ) = {𝑡 1 , 𝑡 2 , … 𝑡 𝑚 } represents the set of states of
ease encodes the whole set of bacterial diseases of our          variable 𝑇, where 𝑡 𝑖 is the 𝑖-th state of the Stage variable.
study, namely the 7 diseases Val(Disease) = {alternaria_rot,     All other (observed) variables in the model are referred
alternaria_spot, bitter_rot, botrytis, mucor_rot, neofabraea,    as symptom variables and they belong to the set 𝒮. A
penicillium}. The Stage random variable was introduced           generic symptom variable 𝑆𝑖 ∈ 𝒮 is represented by a set
                                                                                                       𝑞           𝑗
to facilitate the experts’ probability elicitation task. The     of states 𝑉 𝑎𝑙(𝑆𝑖 ) = {𝑠𝑖1 , 𝑠𝑖2 , … 𝑠𝑖 }, where 𝑠𝑖 is the 𝑗-th state
variable represents three discrete and symbolic stages           of the symptom variable 𝑆𝑖 . Moreover, we adapted the
procedures described in [10] for eliciting expert proba-             value in order to avoid null probabilities, then values
bilities of our network. Specifically, we adopted a mixed            are normalized such that ∑𝑟∈𝑉 𝑎𝑙(𝑅) 𝑃(𝑟) = 1.0. This pro-
symbolic questionnaire to facilitate the expert express-             cess completely defines a probability distribution for the
ing the conditional probability of each event. In more               categorical random variable 𝑅.
details, two techniques were applied depending on the
support of the variable. For boolean variables (for each             3.3. Recommendation Mechanism
symptom variable 𝑆𝑖 ∈ 𝒮 such that 𝑉 𝑎𝑙(𝑆𝑖 ) = {𝑡𝑟𝑢𝑒, 𝑓 𝑎𝑙𝑠𝑒}),
the expert was invited to answer the question: “How fre-             In this section, we detail how a ranked list of recom-
quently do you observe symptom 𝑆𝑖 = 𝑡𝑟𝑢𝑒, given that you             mended diseases (i.e., a diagnosis) is computed after the
have an apple infected by disease 𝐷 = 𝑑𝑙 at stage 𝑇 = 𝑡𝑗 ?”.         user provides the feedback on a target apple, answering
We allowed her to select one option on a pre-defined 6-              the questions asked by the system.
point scale, including Always (A), Very often (V), Often (O),           The reasoning mechanism of the BN allows to perform
Sometimes (S), Rarely (R), and Never (N). The expert had             the inference, namely, to estimate the posterior proba-
to fill a form, providing the answer for each combination            bility distribution on a target unobserved variable (i.e.,
of 𝑑𝑙 ∈ 𝐷 × 𝑡𝑗 ∈ 𝑇. The symbolic scale is converted into an          the Disease variable 𝐷), given any set S ∈ 𝒮 of observed
actual probability 𝑃(𝑆𝑖 = 𝑡𝑟𝑢𝑒|𝐷 = 𝑑𝑙 , 𝑇 = 𝑡𝑗 ) according           variables as provided by the user (i.e., the evidence E).
to the scheme reported in Table 1. The complementary                 The evidence set E is constructed incrementally by the
probability is consequentially defined as 𝑃(𝑆𝑖 = 𝑓 𝑎𝑙𝑠𝑒|𝐷 =          application. At each step, the application requests the
𝑑𝑙 , 𝑇 = 𝑡𝑗 ) = 1 − 𝑃(𝑆𝑖 = 𝑡𝑟𝑢𝑒|𝐷 = 𝑑𝑙 , 𝑇 = 𝑡𝑗 ).                   user to answer a multiple-choice question, related to a
                                                                     symptom variable 𝑆𝑖 ∈ 𝒮. When the user submits the
                                                                                       𝑗
              answer              𝑃(𝑆𝑖 = 𝑡𝑟𝑢𝑒|𝑑𝑙 , 𝑡𝑗 )              observed state 𝑠𝑖 ∈ 𝑉 𝑎𝑙(𝑆𝑖 ), BN-DSSApple includes the
                                                                                                                            𝑗
              Always (A)          0.999                              new information into the evidence set, E ∪ 𝑆𝑖 = 𝑠𝑖 . At
              Very often (V)      0.8                                the end, of this elicitation process, the application will
              Often (O)           0.6                                have access to the complete information provided by
              Sometimes (S)       0.3                                the user on the infected target apple, she wants to diag-
              Rarely (R)          0.01                               nose. It is important to mention that the BN inference
              Never (N)           0.001                              mechanism is robust to missing values, hence, the user
                                                                     is not forced to provide observations for every symptom
Table 1
                                                                     variable 𝑆𝑖 ∈ 𝒮 in the model. Thus, if the user skips the
Scale to convert expert knowledge into actual probabilities.
How frequently do you observe symptom 𝑆𝑖 = 𝑡𝑟𝑢𝑒, given that
                                                                     question related to variable 𝑆𝑚 ∈ 𝒮, the evidence set E
you have apples infected by disease 𝑑𝑙 at stage 𝑡𝑗 ?                 will not include an observation for that variable, 𝑆𝑚 ∉ E.
                                                                     Thus, the goal of the reasoning system is to provide a
                                                                     probability over the set of candidate diseases (i.e., the
   For categorical variables (i.e., each symptom variable            possible diagnosis). We estimate the posterior probabil-
𝑆𝑖 ∈ 𝒮 such that 𝑉 𝑎𝑙(𝑆𝑖 ) = {𝑠𝑖1 , 𝑠𝑖2 , … , 𝑠𝑖𝑚 }, where 𝑚 > 2),   ity distribution 𝑃(𝐷|E) through an algorithm called loopy
such a process would have been too burdensome for the                belief propagation [11]. The loopy belief propagation is
expert. Thus, we decided to adopt a lighter, yet effective,          an approximate message-passing method to perform in-
approach. For each categorical symptom variable 𝑆𝑖 ∈ 𝒮,              ference on graphical models. In few words, the algorithm
given a specific disease 𝐷 = 𝑑𝑙 at stage 𝑇 = 𝑡𝑗 , the expert         iteratively updates the marginal distribution 𝑃(𝑁 ) of a
was invited to simply indicate which values of 𝑉 𝑎𝑙(𝑆𝑖 )             node 𝑁 ∈ 𝒢, by updating the outgoing message, at the
are likely to be observed. Furthermore, we agreed on a               current iteration, from the node 𝑁 to each of its neigh-
3-point symbolic annotation to denote the likelihood of              bors V ∈ 𝒢 in terms of the previous iteration’s incoming
each reported value, namely, common (no parenthesis),                messages from V.
less common (one parenthesis), and rare (two parenthe-                  In our recommendation engine, after completing the
sis). The assumption underneath this choice is that many             evidence collection process for a target apple 𝑎, the pos-
symptom values are never observed under some condi-                  terior probability computed by the BN when evidence E
tions (i.e., resulting CPTs are sparse) and could be ignored         is provided, is considered as a diagnosis score 𝑠(𝑑𝑖 )𝑎 for
to speed up the elicitation process. In order to convert             each disease 𝑑𝑖 ∈ 𝐷. Namely, this probability distribu-
likelihood annotations into actual probability distribu-             tion represents the confidence of the system over each
tion values we adopted the following heuristic. Please               disease 𝑑𝑖 ∈ 𝐷 being the correct diagnosis for the target
consider a random variable 𝑅 with 𝑉 𝑎𝑙(𝑅) = {𝑎, 𝑏, 𝑐, 𝑑},            apple 𝑎. More formally, given the provided evidence set
which is annotated as follows by the the expert: a: com-                                     𝑝           𝑞
                                                                     E = {𝑆1 = 𝑠1𝑜 , 𝑆2 = 𝑠2 , … 𝑆𝑙 = 𝑠𝑙 }, defined as the set of
mon, b: less common, c: rare, and d is ignored; then                                       𝑗
𝑃(𝑎) = 2𝑃(𝑏) = 4𝑃(𝑐) = 1.0 and 𝑃(𝑑) = 0.0. Further-                  each observed state 𝑠𝑖 ∈ 𝑉 𝑎𝑙(𝑆𝑖 ) for each random variable
more, a small value 𝜖 = 0.001 is added to each probability           𝑆𝑖 ∈ 𝒮, the diagnosis score related to target apple 𝑎 for
disease 𝑑𝑖 ∈ 𝐷 is computed as:                                         The problem of transferability is long-lasting in ma-
                                                                   chine learning and statistics and it has been addressed
                    𝑠(𝑑𝑖 )𝑎 = 𝑃(𝐷 = 𝑑𝑖 |E)                  (1)
                                                                   in causal terms, referred to as transportability [12, 14],
The ranked list of the 𝑘 suggested diseases 𝑅𝑘 =                   as well as in statistical terms, in the context of super-
{𝑑 1 , 𝑑 2 , … , 𝑑 𝑘 } shown to the user is then based on the      vised learning, where it is also known as covariate shift
score for each disease, such that 𝑠(𝑑 𝑖 ) ≥ 𝑠(𝑑 𝑖+1 ). The         or sample selection bias [15, 16]. One of the most common
parameter 𝑘 controls for the flexibility of the system to          approaches applies a direct correction to the learned prob-
show more or less recommended diseases to the user. In             ability distribution based on the estimates on the testing
our evaluation, the parameter is fixed to 𝑘 = 3.                   set [13]. Specifically inspired by the work presented in
                                                                   [17], we proposed a methodology, referred to as likeli-
3.4. Transferability and Likelihood                                hood evidence and tailored to our BN-based application, to
                                                                   correct the expert-defined distribution 𝒫 𝑒𝑥𝑝 towards the
     Evidence                                                      one derived by users 𝒫 𝑢𝑠𝑟 . We define the likelihood evi-
In knowledge-based modeling, but also with standard                dence (or likelihood finding) for each random symptom
supervised learning, we often face the problem of trans-           variable 𝑆𝑖 ∈ 𝒮 of our BN-DSSApple. Specifically, when a
ferring such a model on a different environment (i.e., pro-        symptom variables 𝑆𝑖 is observed and thus instantiated
viding external validity). This type of situation is referred      by a user, we assume that a certain degree of uncertainty
to as the transferability problem [12, 13]. For instance,          is associated with it (i.e., the difference of knowledge and
it might be difficult to allow a vast set of users, with           expertise between the user and the expert). We define the
different expertise level, to effectively exploit a diagnos-       actual user observation with another random variable
tic expert model, based on domain-specific knowledge.              𝑂𝑖 , such that 𝑉 𝑎𝑙(𝑂𝑖 ) = 𝑉 𝑎𝑙(𝑆𝑖 ), to distinguish it from the
In our application, the knowledge base of BN-DSSApple              variable as it should be observed by an expert 𝑆𝑖 . We
has been built with the information derived from do-               represent the uncertainty degree with a likelihood ratio
main literature and empirical knowledge of a domain                𝐿(𝑆𝑖 ), formally defined as:
expert. Nevertheless, different sets of users, with less ex-                                𝑗                           𝑗
perience in the field, might perceive the same attributes                         𝐿(𝑆𝑖 = 𝑠𝑖 ) = 𝑃(𝑂𝑖 = 𝑜𝑖𝑙 |𝑆𝑖 = 𝑠𝑖 )           (2)
(i.e., the symptoms) in a different way. In fact, the user
perception is mediated by her personal experience and              which represents the probability of a user observing value
specific knowledge biases. This mismatch invalidates the           𝑜𝑖𝑙 ∈ 𝑉 𝑎𝑙(𝑂𝑖 ) given that, in the same situation, the expert
                                                                                               𝑗
effectiveness and hence the diagnostic performance of              would have observed 𝑠𝑖 ∈ 𝑉 𝑎𝑙(𝑆𝑖 ). Thus, we enrich our
BN-DSSApple. In this section, we formalize the problem             BN by adding, for each symptom variable 𝑆𝑖 , a virtual
of transferability and we propose a practical solution to          likelihood evidence node 𝑂𝑖 that encodes the likelihood
bridge the gap between the expert model and the user               ratio 𝐿(𝑆𝑖 ), with 𝑝𝑎(𝑂𝑖 ) = {𝑆𝑖 }. The added set of random
perception.                                                        variable 𝒪 = {𝑂1 , 𝑂2 , … 𝑂𝑡 } is now the one observed by
    In our scenario, the transferability problem is defined        the user while providing the evidence E on the questions
as the mismatch between the BN probability distribu-               asked by the application, while the random variables in
tions (CPTs) defined by the expert, and the probability            𝒮 become hidden. We finally need to define a new set
distributions derived by the usage of the system. For-             of conditional probability tables 𝑃(𝑂𝑖 |𝑆𝑖 ) for each pair
mally, the expert during the knowledge elicitation phase           (𝑆𝑖 , 𝑂𝑖 ) ∈ 𝒮 × 𝒪. We adopt a direct estimation of these
(as described in Section 3.2) implicitly defined a com-            probabilities from the observed interactions of users with
plete set of probability 𝒫 𝑒𝑥𝑝 = {𝑃(S|𝐷 = 𝑑1 ), 𝑃(S|𝐷 =            a set of apples 𝒜 for which we know the actual observed
                                                                                                                      𝑗
𝑑2 ), … 𝑃(S|𝐷 = 𝑑𝑛 )} ⊆ 𝒫, for each set of symptom random          value by the expert. Namely, for each state 𝑠𝑖 ∈ 𝑉 𝑎𝑙(𝑆𝑖 )
variables S, given the target disease 𝐷 = 𝑑𝑖 . At testing          of each variable 𝑆𝑖 ∈ 𝒮 we define a subset of 𝒜𝑠 𝑗 ⊆ 𝒜 for
                                                                                                                        𝑖
time, the users of our application produced a set of 𝑢             which the value of the symptoms variable 𝑆𝑖 observed by
observations ℰ = {(E1 , 𝑑1 ), (E2 , 𝑑2 ), … (E𝑢 , 𝑑𝑢 )} ⊆ 𝒮 × 𝒟,                         𝑗
                                                                   the expert is 𝑆𝑖 = 𝑠𝑖 . Thus, the conditional probability of
                                𝑝            𝑞
where E𝑖 = {𝑆1 = 𝑠1𝑜 , 𝑆2 = 𝑠2 , … 𝑆𝑙 = 𝑠𝑙 }, represent the
                                                                   the observed value 𝑂𝑖 = 𝑜𝑖𝑙 by the users is defined as:
evidence provided by a user during the 𝑖-th diagnosis ses-
sion, as a set of instantiations of symptom variables, and                                      𝑗         1
                                                                              𝑃(𝑂𝑖 = 𝑜𝑖𝑙 |𝑆𝑖 = 𝑠𝑖 ) =            ∑ 1 (𝑜 𝑙 )     (3)
𝑑𝑖 is the corresponding ground-truth disease. These set                                                 |𝒜𝑠 𝑗 | 𝑎 ∈𝒜 𝑗 𝑎𝑖 𝑖
                                                                                                           𝑖   𝑖
of user observations define a different set of probabilities                                                       𝑠𝑖

𝒫 𝑢𝑠𝑟 = {𝑃(S|𝐷 = 𝑑1 ), 𝑃(S|𝐷 = 𝑑2 ), … , 𝑃(S|𝐷 = 𝑑𝑛 )} ⊆ 𝒫,
which is generally different from the one defined by the           where 1𝑎𝑖 (𝑜𝑖𝑙 ) is an indicator function which is equal to 1 if
expert, 𝒫 𝑢𝑠𝑟 ≠ 𝒫 𝑒𝑥𝑝 . The problem becomes the one to             the user observed 𝑂𝑖 = 𝑜𝑖𝑙 in apple 𝑎𝑖 , and 0 otherwise. The
find a transferability function 𝑇 (.) to be applied to the         defined conditional probability for the likelihood ratio is
expert model such that 𝒫 𝑢𝑠𝑟 = 𝑇 (𝒫 𝑒𝑥𝑝 ).                         also referred as consensus among expert and users.
4. Experiments                                                                  # users      expertise           # apples       time-span
                                                                       SES         1           high                 21           2 weeks
4.1. User Study Evaluation                                             SUS         1       high-medium              131         3 months
                                                                       MUS        11       medium-low               21           4 hours
We conducted a large user study to evaluate the effec-
tiveness of BN-DSSApple in recommending the correct                   Table 2
diagnosis. Specifically, we divided the user study into               Characteristics of the three user studies: Single Expert Study
three distinct phases to test the system behaviour under              (SUS), Single User Study (SUS), and Multiple User Study
different circumstances. The task submitted to the users              (MUS).
involved in our study was the same in all cases. The user
received a “bucket” of infected apples, for which she had
to find the correct diagnosis leveraging BN-DSSApple.                                                                       𝑗
                                                                      is a ranked list of 𝑘 suggested diagnosis 𝑑𝑎𝑖 for apple 𝑎𝑖
Each target apple was simulated as a set of two high-                 with a specific ground truth disease 𝑡𝑎𝑖 . Thus, we formally
definition photos depicting an internal and an external               define recall@k as:
view of the target apple, and for which the ground-truth
disease was collected in lab by genome sequencing. In                                                 1
                                                                                       𝑟𝑒𝑐𝑎𝑙𝑙@𝑘 =       ∑ 1𝑅𝑎𝑘 (𝑡𝑎𝑖 )                  (4)
each diagnostic round, the user had to carefully inspect                                              𝑛 𝑘     𝑖
                                                                                                        𝑅𝑎𝑖 ∈𝑁
the target apple and interact with the system by provid-
ing information (i.e., the evidence) about the symptoms               Where the function 1𝑅𝑎𝑘 (𝑡𝑎𝑖 ) is an indicator function
                                                                                                  𝑖
and signs she was able to identify on the apple. At the               which is equal 1 if 𝑡𝑎𝑖 ∈ 𝑅𝑎𝑘𝑖 and 0 otherwise.
end, BN-DSSApple returned a ranked list of three sug-
gested diagnosis, i.e., the three diseases with the highest
posterior given the available evidence, as computed by                                     SES        SUS        MUS      ZeroR
the BN. The three phases of the presented study differed                      recall@1     .905       .489       .286       .143
in the number of users, their expertise level, and the                        recall@2       1.       .656       .403       .286
number of distinct target apples involved. In details, we                     recall@3       1.       .763       .571       .429
performed:
                                                                      Table 3
      • Single Expert Study (SES): a domain expert (the               Recall@k for the three user studies performed, Single Expert
        one which collaborate in the construction of the              Study (SES), Single User Study (SUS), and Multiple Users
                                                                      Study (MUS). The ZeroR benchmark is also reported.
        BN) interacted with the system to diagnose 21
        target apples in a time-span of around 2 weeks.
                                                                         From the results presented in Table 3 we highlight how
      • Single User Study (SUS): a single user (a MSc                 the theoretical effectiveness of the BN-DSSApple model is
        student in Biology), interacted with the system               very high. Specifically, an expert user (SES), with strong
        during the course of an internship, lasting around            knowledge in the domain of post-harvest diseases of ap-
        3 months, to diagnose 131 target apples.                      ples and a good capability of correctly identify symptoms
                                                                      on a diseased apple, is able to reach a recall@1 above
      • Multiple Users Study (MUS): a group of 11 stu-
                                                                      the 90%. The performance of the system increases up
        dents of a Phytopatology class interacted with
                                                                      to 100% of recall when evaluated at a larger cut-off of
        the system to diagnose a bucket of 7 target ap-
                                                                      suggested diseases. Of course, we have to consider that
        ples each. The apples were randomly sampled
                                                                      in the SES evaluation, we are in the ideal situation where
        from the same set of 21 apples used for SES. The
                                                                      the expert user knows exactly how to look and evalu-
        activity lasted for a total of 4 hours.
                                                                      ate the symptoms requested by BN-DSSApple. A more
In Table 2 we summarize the different characteristics of              realistic situation is depicted by the SUS evaluation. In
the three user studies performed.                                     this situation, a single user with a medium-high level of
                                                                      expertise had months of time to interact with the sys-
4.2. Results                                                          tem by evaluating a very large set of apples (131). The
                                                                      performance of the system for the recall@1 are still con-
In Table 3 we report the results of the three user stud-              vincing (49%), i.e. correct disease identification by half of
ies in terms of recall@k. To better formalize this metric,            all diagnoses. The other metrics testify how the system
please consider a situation in which a set 𝑁 of 𝑛 diagnosis           is not able to scale-up well for further cut-off of recall,
is performed by BN-DSSApple. The set 𝑁 is composed                    achieving 66% of recall@2 and 76% of recall@3 (the cor-
by 𝑛 ranked lists of recommended diagnosis, namely                    rect disease is within the first 3 recommendations in 3/4
𝑁 = {𝑅𝑎𝑘1 , 𝑅𝑎𝑘2 , … 𝑅𝑎𝑘𝑛 }, where 𝑎𝑖 represents the 𝑖-th apple       of the cases). Finally, BN-DSSApple showed some limits
processed by the system. A generic 𝑅𝑎𝑘𝑖 = {𝑑𝑎1𝑖 , 𝑑𝑎2𝑖 , … , 𝑑𝑎𝑘𝑖 }   in the situation where the users have a limited expertise
and training, and a limited amount of time (few hours)                rank           attribute           consensus
to use the system as in the MUS evaluation. In addition                 1           Sclerotia              0.988
to the time and skill aspect, also less intrinsic motiva-               2            Calyx                 0.985
tion to interact as accurate as possible with the system                3              Rot                 0.964
could be a partial explanation for the deviation. In this               4             Spot                 0.950
case, the measured recall of the system is significantly                5             Stalk                0.926
lower than the one of the two previous evaluations. Par-                6             Core                 0.917
ticularly, the recall@1 doesn’t reach the 30%, while the                7      Spore_distribution          0.872
best result is achieved by the recall@3 with a value of                 8          Lesion_size             0.837
                                                                        9        Lesion_surface            0.837
57% (slightly more than half of the diagnosis include the
                                                                       10       Number_lesions             0.817
correct disease in the top-3 recommendations). Neverthe-               11       Mycelium_spore             0.809
less, despite the poor performances of BN-DSSApple in                  12         Lesion_form              0.792
MUS, the collected results are still superior to the ZeroR             13         Lesion_crack             0.790
benchmark, namely, a classifier which always suggest                   14             Halo                 0.782
the class with a priori higher probability. Important to               15          Rot_shape               0.760
notice that the reported results for ZeroR are related to              16       Rot_texture_dry            0.755
the situation in which the class (ground-truth disease)                17         Halo_colour              0.750
distribution is perfectly balanced, like for SES and MUS.              18          Rot_margin              0.740
In the comparison with ZeroR, MUS evaluation for BN-                   19         Spore_colour             0.731
                                                                       20         Spore_origin             0.694
DSSApple shows the double of recall@1 (28.6% against
                                                                       21        Lesion_margin             0.636
14.3%), while recall@2 and recall@3 are closer but still               22         Lesion_area              0.623
significantly better (+12% and +14%, respectively). The                23     Rot_texture_opaque           0.607
main cause of this mismatch of performances among ex-                  24            Wound                 0.594
pert and averaged users can be identified in the problem               25           Lenticel               0.588
of transferability of a knowledge-aware model. In the                  26      Lesion_appearance           0.417
remaining of this section, we are going to empirically an-             27     Rot_texture_pressure         0.321
alyze and explain such a phenomenon, and test possible
solutions to correct and alleviate it.                       Table 4
                                                             Attributes ranking based on the rate of agreement (i.e., con-
   Foremost, we want to understand the impact of each
                                                             sensus) of the users of MUS with the domain expert of SES.
expert-defined attribute in the model. In Table 4 we
report the ranked list of attributes, based on the like-
lihood ratio (i.e., consensus) computed between users
of MUS and the expert of SES (which we consider as a         user, with a consensus above the 90% with the expert.
ground-truth) in identifying the symptoms on the same        Nevertheless, two of them, namely Wound and Lenticel,
set of 21 target apples. It is interesting to notice how     are equally difficult to be recognized with a consensus of
the users are effective in identifying the principal symp-   around 59%. This is probably due to the fact that the two
toms and signs, presented by the application as boolean      origins might be perceived as quite similar and could be
variables. Namely, Sclerotia (99%), Rot (96%), and Spot      confused, without a careful inspection of the apple skin.
(95%) present a very high level of agreement with the           In Figure 3 we plot the recall@k achieved by BN-
domain expert, while Mycelium_spore (81%) and Halo           DSSApple for MUS and SES, by incrementally selecting
(78%) receive an high consensus. Vice versa, some quali-     the attributes based on the consensus ranking reported
tative attributes related to the appearance or the consis-   in Table 4. On the x-axis, we report the number of at-
tency of the lesion and the rot are among the hardest to     tributes in each model configuration. Namely, the 𝑖-th
be correctly recognized by the users (i.e., they show a      value represents the BN model built with the attribute
poor consensus with the expert). For example, Lesion_ap-     set 𝒜𝑖 = {𝑎1 , 𝑎2 , … 𝑎𝑖−1 , 𝑎𝑖 }, where the rank 𝑗 of attribute
pearance and Rot_texture_pressure achieve a consensus        𝑎𝑗 is defined by expert consensus, as reported in Table
below the 50%, while Lesion_margin, Lesion_area, and         4. From the graph in Figure 3a for MUS evaluation, we
Rot_texture_opaque are below 65%. Nevertheless, other        immediately notice how the model achieves the best per-
categorical variables more related to quantitative aspects   formances for recall@1 and recall@2 with around 8-9
of the lesion are easier for the users to be spotted. This   attributes. A larger set of attributes is detrimental, caus-
is the case of the variables Lesion_size, Lesion_surface,    ing a drop of recall of at least 10% in both situations.
Lesion_form, and Lesion_crack which show a consensus         Interesting to notice how these performances seem to
between 84% and 79%. Finally, it is interesting to notice    recover with the models based on 21-22 attributes, with-
the behavior of the variables of the Lesion origin cate-     out reaching the optimal level. In fact, for the recall@3
gory. Most of them are quite easy to be identified by the    metric the global optimum is achieved by the model with
                               (a)                                                         (b)

Figure 3: Recall@k by incremental selection of attributes based on ranking of Table 4 for MUS (a) and SES (b).


20 attributes, with a significant improvement of around        data with the Maximum Likelihood Estimation (MLE) al-
10% on the smaller attribute set configurations. Opposite      gorithm. The recall@1 improvement is marginal (around
considerations emerge from the graph in Figure 3b for          +2.5%), while recall@2 shows a +6.5% with respect to the
SES evaluation. In this case, the recall@k metrics are lin-    plain BN model. We already commented the large im-
early correlated to the number of attributes, and the best     provements achieved by selecting the optimal attribute
performances are always achieved with the full set of          set (BEST-ATTR model), whereas the gain in recall is be-
attributes. This means that the expert is able to correctly    tween +14% and +21%. Of course, this analysis is derived
instantiate even the harder variables, by understanding        a posteriori, where the optimal number of attributes is
the status of an infected apple. Furthermore, this “hard-      fixed after the evaluation. For this reason, the achieve-
to-recognize” attributes are necessary to significantly        ment of the model equipped with likelihood evidence
improve the diagnostic effectiveness of the model and          (LH-EV, methodology detailed in Section 3.4, where ex-
reach the highest performances in term of recall@k. For        pert ground-truth data are derived from SES) is even
instance, in both recall@2 and recall@3 the BN model           greater. For recall@1 the LH-EV outperforms TRAIN-BN
registers around +20% improvement by considering the           of around +4%, while being inferior to BEST-ATTR by
full set of 27 attributes instead of just considering 21       around -8%. For recall@2, instead, the likelihood evi-
attributes (i.e., by discarding the 6 “hardest” attributes,    dence achieves the best result outperforming also BEST-
with lowest consensus).                                        ATTR by a +2.5%. Finally, for recall@3 the LH-EV model
                                                               significantly outscores TRAIN-BN (+13%), while being
             BN     TRAIN-BN         BEST-ATTR     LH-EV       comparable with the results of BEST-ATTR.
 recall@1    .286       .312           .429 (8)      .351
 recall@2
 recall@3
             .403
             .571
                        .468
                        .636
                                       .597 (9)
                                      .779 (20)
                                                     .623
                                                     .766
                                                               5. Conclusions
Table 5
                                                           This case study focused on knowledge elicitation and
Recall@k for MUS when applying the plain BN-DSSApple       construction as well as discussed the application of likeli-
(BN), the trained BN-DSSApple on MUS data (TRAIN-BN),      hood evidence to enhance performance and transferabil-
the incremental best attribute selection (BEST-ATTR), and  ity of the knowledge-based recommendation system BN-
the BN-DSSApple with likelihood evidence (LH-EV). In BEST- DSSApple. Major limitations of the presented approach
                                                           concern the fact that the knowledge base is fully based
ATTR column, we report the results for the optimal attribute
set, with the number of selected attributes in parenthesis.on qualitatively probability elicitation from a single hu-
                                                           man expert. Furthermore, transferability problem of the
   Finally, in Table 5 we compare the recall@k results for crafted BN must be additionally investigated. Further
the MUS evaluation of the improved versions of the BN development of the method to other domains as well as
model, in order to cope with the transferability problem additional testing is required. Currently, deployment for
discussed in Section 3.4. Firstly, the smallest improve- real-life evaluation is ongoing. In future work, the inte-
ment is provided by the trained BN model (dubbed as gration of additional evidence like microscopic images
TRAIN-BN), where the parameters are fine-tuned on MUS of fungal spores will be considered.
References                                                                      in: Proceedings of the 2011 IEEE 11th Interna-
                                                                                tional Conference on Data Mining Workshops,
 [1] T. B. Sutton, H. S. Aldwinckle, A. Agnello, J. F. Wal-                     ICDMW ’11, IEEE Computer Society, USA, 2011,
     genbach (Eds.), Compendium of apple and pear dis-                          p. 540–547. URL: https://doi.org/10.1109/ICDMW.
     eases and pests, 2 ed., APS press, 2014.                                   2011.169. doi:1 0 . 1 1 0 9 / I C D M W . 2 0 1 1 . 1 6 9 .
 [2] P. Maxin, M. Williams, R. W. Weber, Control of fun-                   [13] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, G. Zhang,
     gal storage rots of apples by hot-water treatments:                        Transfer learning using computational intelligence:
     A northern european perspective, Erwerbs-Obstbau                           A survey, Knowl. Based Syst. 80 (2015) 14–23.
     56 (2014) 25–34.                                                      [14] A. Subbaswamy, S. Saria, Counterfactual normal-
 [3] D. Koller, N. Friedman, Probabilistic Graphical                            ization: Proactively addressing dataset shift using
     Models: Principles and Techniques, Adaptive                                causal mechanisms, in: R. Silva, A. Globerson,
     computation and machine learning, MIT Press,                               A. Globerson (Eds.), 34th Conference on Uncer-
     2009. URL: https://books.google.co.in/books?id=                            tainty in Artificial Intelligence 2018, UAI 2018, vol-
     7dzpHCHzNQ4C.                                                              ume 2, Association For Uncertainty in Artificial
 [4] U. B. Kjaerulff, A. L. Madsen, Bayesian Networks                           Intelligence (AUAI), 2018, pp. 947–957. 34th Confer-
     and Influence Diagrams: A Guide to Construction                            ence on Uncertainty in Artificial Intelligence 2018,
     and Analysis, 1st ed., Springer Publishing Company,                        UAI 2018 ; Conference date: 06-08-2018 Through
     Incorporated, 2010.                                                        10-08-2018.
 [5] A. Niederkofler, S. Baric, G. Guizzardi, G. Sotto-                    [15] J. Huang, A. J. Smola, A. Gretton, K. M. Borgwardt,
     cornola, M. Zanker, Knowledge models for diag-                             B. Scholkopf, Correcting sample selection bias by
     nosing postharvest diseases of apples, in: Proceed-                        unlabeled data, in: Proceedings of the 19th Interna-
     ings of the Joint Ontology Workshops 2019 Episode                          tional Conference on Neural Information Process-
     V: The Styrian Autumn of Ontology, Graz, Aus-                              ing Systems, NIPS’06, MIT Press, Cambridge, MA,
     tria, September 23-25, 2019, volume 2518 of CEUR                           USA, 2006, p. 601–608.
     Workshop Proceedings, CEUR-WS.org, 2019. URL:                         [16] M. Sugiyama, S. Nakajima, H. Kashima, P. v. Bü-
     http://ceur-ws.org/Vol-2518/paper-ODLS6.pdf.                               nau, M. Kawanabe, Direct importance estimation
 [6] M. Zanker, M. Jessenitschnig, W. Schmid, Prefer-                           with model selection and its application to covari-
     ence reasoning with soft constraints in constraint-                        ate shift adaptation, in: Proceedings of the 20th
     based recommender systems, Constraints 15 (2010)                           International Conference on Neural Information
     574–595.                                                                   Processing Systems, NIPS’07, Curran Associates
 [7] M. B. Messaoud, P. Leray, N. B. Amor, Sem-                                 Inc., Red Hook, NY, USA, 2007, p. 1433–1440.
     cado: A serendipitous strategy for causal discov-                     [17] A. B. Mrad, V. Delcroix, S. Piechowiak, P. Leicester,
     ery and ontology evolution., Knowl.-Based Syst.                            M. Abid, An explication of uncertain evidence in
     76 (2015) 79–95. URL: http://dblp.uni-trier.de/db/                         bayesian networks: likelihood evidence and proba-
     journals/kbs/kbs76.html#MessaoudLA15.                                      bilistic evidence - uncertain evidence in bayesian
 [8] A. M. Kalet, J. N. Doctor, J. H. Gennari, M. H.                            networks, Appl. Intell. 43 (2015) 802–824. URL:
     Phillips, Developing bayesian networks from a                              https://doi.org/10.1007/s10489-015-0678-6. doi:1 0 .
     dependency‐layered ontology: A proof‐of‐concept                            1007/s10489- 015- 0678- 6.
     in radiation oncology, Medical Physics 44 (2017)
     4350–4359. doi:1 0 . 1 0 0 2 / m p . 1 2 3 4 0 .
 [9] S. Fenz, An ontology-based approach for construct-
     ing bayesian networks, Data Knowl. Eng. 73 (2012)
     73–88. URL: http://dx.doi.org/10.1016/j.datak.2011.
     12.001. doi:1 0 . 1 0 1 6 / j . d a t a k . 2 0 1 1 . 1 2 . 0 0 1 .
[10] L. C. van der Gaag, S. Renooij, C. L. M. Witteman,
     B. M. P. Aleman, B. G. Taal, How to elicit many
     probabilities, in: Proceedings of the Fifteenth Con-
     ference on Uncertainty in Artificial Intelligence,
     UAI’99, Morgan Kaufmann Publishers Inc., San
     Francisco, CA, USA, 1999, p. 647–654.
[11] A. T. Ihler, J. W. Fischer III, A. S. Willsky, Loopy
     belief propagation: Convergence and effects of mes-
     sage errors, J. Mach. Learn. Res. 6 (2005) 905–936.
[12] J. Pearl, E. Bareinboim,                       Transportability of
     causal and statistical relations: A formal approach,