Improving Abductive Diagnosis Through Structural Features: A Meta-Approach Roxane Koitz1 and Franz Wotawa2 Abstract. While abductive reasoning provides an intuitive ap- ment of models with additional knowledge or the inclusion of more proach to diagnosis, its computational complexity remains an obsta- complex covering relations [1]. cle. Even though certain model representations are tractable, com- In model-based diagnosis a formalization of the system under con- puting solutions for instances of reasonable size and complexity per- sideration is exploited to determine causes for anomalies [34]. While sists to pose a challenge. Hence, the discovery of efficient methods the traditional approach utilizes a representation of the correct sys- to derive abductive explanations presents itself as appealing research tem behavior and derives diagnoses via inconsistencies, the abductive area. In this paper, we investigate the structural properties inherent variant operates on a model describing the way faults affect measur- to formalizations suitable for abductive failure localization. Based able system variables. Through the notion of entailment abductive on the features extracted we construct a meta-approach exploiting a model-based diagnosis reasons about causes of a set of observations machine learning classifier to predict the abductive reasoning tech- [3]. Considering a restricted problem space, simple set covering and nique yielding the “best” performance on a specific diagnosis sce- abductive model-based diagnosis are equivalent [23]. nario. To assess whether the proposed attributes are in fact sufficient Even though there exist certain subsets of logics where abduction for forecasting the appropriate abduction procedure and to evaluate is tractable, it generally is an NP-hard problem which grows expo- the efficiency of our algorithm selection in comparison to traditional nentially in the size of the model [27]. Hence, within this paper we abductive reasoning approaches, we conducted an empirical experi- investigate algorithm selection as a means to efficiently compute ab- ment. The results obtained indicate that the trained model is capable ductive explanations in the context of diagnosis. First formalized by of predicting the most efficient algorithm and further, we can show Rice [35], algorithm selection aims at identifying the “best perform- that the meta-approach is capable of outperforming each single ab- ing” approach for a specific problem instance. The basic building ductive reasoning method investigated. blocks within this framework are a portfolio of algorithms to choose from, empirical performance data of the algorithms on representative problems and a set of features, which are used to get a notion of the 1 Introduction difficulty of a problem instance [13]. On grounds of the empirical Being able to accurately identify the source of an unintended system data and the feature vector, a predictor can be trained capable of de- behavior is an essential objective in various application domains. Ab- termining the most suitable approach for a distinct sample from the ductive reasoning appears to be a natural approach to diagnosis as it problem space [17]. Machine learning has been identified as a fea- infers consistent explanations from background knowledge and ob- sible approach to use as a prediction tool. Leyton-Brown et al. [20] served symptoms based on the notion of entailment. Usually a set describe their portfolio approach to algorithm selection, where they of constraints, such as minimality, are placed on whether a solution train an empirical hardness model for each algorithm within their suffices as an abductive explanation or not. portfolio to forecast each approach’s computation time on the in- While diagnosis is the most prevalent application area for abduc- stance and execute the one predicted as most efficient. Algorithm tive reasoning, abduction has been applied to a diverse set of prob- selection has been applied very successfully in the domain of SAT lems such as planning [32], natural language processing [29], and [38], graph coloring [26], or tree-decomposition [24]. image interpretation [39]. A large body of literature has investigated In this paper, we restrict the problem space to propositional Horn approaches to mechanizing abductive reasoning such as consequence clause models which can be automatically generated from failure finding [22], proof-tree completion [23], set-covering [30], abductive assessments available in practice. These analyses hold information model-based diagnosis [3] or abductive logic programming [15, 6]. on faults and their effects and thus are suitable knowledge sources Within this paper we concentrate on two methods, namely par- for abductive diagnosis. The resulting logical system descriptions simonious set covering and abductive model-based diagnosis. The are characterized by certain structural properties, which we utilize set covering theory by Peng and Reggia [31] utilizes a causal asso- as features for the algorithm selection process. We extracted these ciative network recording the relations between disorders and their attributes for a collection of instances and evaluated several abduc- manifestations. In their simple model, these cause and effect sets are tive diagnosis algorithms empirically on their computation time for strictly disjoint. A diagnosis then is a set of disorders covering, i.e. the entire sample set. On basis of the performance data and the fea- explaining, the set of observed symptoms. Later the approach has tures we trained a machine learning classifier to forecast the algo- been extended to incorporate probability theory and several refine- rithm most suitable in regard to its runtime for a particular abduc- ments to the basic theory have been proposed such as the improve- tive diagnosis scenario. We embedded the selection process within a meta-algorithm, which generates the structural metrics for a given 1 Authors are listed in alphabetical order. 2 Graz University of Technology, email: {rkoitz, wotawa}@ist.tugraz.at diagnosis problem, categorizes it on the previously trained classifier and computes the diagnoses using the algorithm chosen by the pre- A solution to a PHCAP is an abductive diagnosis, as it provides hy- dictor. potheses consistently explaining the occurrence of a set of observa- We organize our paper as follows. The next section provides a for- tions. As in practice minimal solutions are preferred, we require the mal introduction to abductive model-based diagnosis as well as the diagnoses to be subset minimal. parsimonious set covering approach. Further, we show that in their Example 1: Consider the following KB: simplest form they are equivalent. Section 3 discusses the type of logical formalizations we examine and describes the structural char- A = {h1 , h2 , h3 , o1 , o2 , o3 }, Hyp = {h1 , h2 , h3 }, acteristics forming our features for the algorithm selection. Subse-  quently, we discuss the meta-approach in more detail and in Section T h = h1 → o 1 , h 2 → o 1 , h 2 → o 2 , h 3 → o 2 , h 3 → o 3 4.3 we empirically evaluate it in comparison to the algorithms in our Assume we can observe o1 and o3 , i.e. Obs = {o1 , o3 }. The so- portfolio. Lastly we summarize the paper and provide an outlook to lutions to the PHCAP, i.e. the minimal abductive diagnoses, are future work. ∆1 = {h1 , h3 } and ∆2 = {h2 , h3 }. 2 Abductive Diagnosis 2.2 Parsimonious Set Covering Within this section we consider two abductive diagnosis approaches, Abduction by parsimonious set covering is based in its simplest form namely abductive model-based diagnosis [3] operating on propo- on an associative network encompassing the causal links between sitional Horn clauses and the simple set covering approach [31]. possible disorders and their manifestations [31]. A diagnosis prob- Specifically, we show their equivalence and describe the correspond- lem is a 4-tuple P =< D, M, C, M + >, where D is the set of ing elements within each method. disorders, M comprises the manifestations, C defines the causal connections, and M + represents the current set of symptoms ob- 2.1 Model-Based Diagnosis served. The knowledge about the causal relations is defined by two sets: effects(di ) and causes(mj ). For each disorder di we can de- As model-based diagnosis requires a formal description of the sys- fine effects(di ) = {mj | < di , mj >∈ C} as the set of manifesta- tem, its abductive variant utilizes a representation of the connections tions cause by the disorder. Similarly, the set causes(mj ) = {di | < between failures and their manifestations. Based on the information di , mj >∈ C} holds the disorders which directly trigger manifesta- available, the task is to search for a set of consistent causes which to- tion mj [31]. Thus, for any subset of disorders DI , we can determine gether with the background theory logically entail a set of observed the objects directly caused by it as fault indicator. Since abduction is a hard problem, research has fo- cused on subsets of logics which allow to compute explanations in [ effects(DI ) = effects(di ) polynomial time [27]. An important restriction on the underlying di ∈DI formulas is the Horn property as for this fragment abduction is still tractable. Within the context of diagnosis a further syntactical restric- Along similar lines, we can observe that tion often imposed is a definite Horn theory since it often suffices to [ describe causal relations [7]. causes(MJ ) = causes(mj ) Therefore, we concentrate on propositional definite Horn descrip- mj ∈MJ tions and define in a similar manner as Friedrich et al. [9] a knowl- As mentioned within this approach abductive explanations are de- edge base. fined as the causes covering the observed symptoms. A set of dis- Definition 1 (Knowledge base (KB)) A knowledge base (KB) is a orders DI is said to cover a set of manifestations MJ ⊆ M if tuple (A,Hyp,Th) where A denotes the set of propositional variables, MJ ⊆ effects(DI ), i.e. the former causally infers the latter. While Hyp ⊆ A the set of hypotheses, and Th a set of Horn clause sentences minimality is not a necessary condition for a cover in the original over A. definition of Peng and Reggia [31], we introduce the further require- ment that the cover is subset minimal. The set of hypotheses Hyp comprises the propositional variables al- lowed to form a diagnosis, while the theory describes the relations Definition 4 (Cover) A set DI ⊆ D is said to cover MJ ⊆ M between the variables. In this context, we further specify the propo- if MJ ⊆ effects(DI ) and there exists no DI0 ⊂ DI with MJ ⊆ sitional variables not constituting a hypothesis, i.e. {A \ Hyp}, as effects(DI0 ). effects or symptoms. We will refer to this set of variables as Σ within this paper. Since we aim at identifying root causes for failure mani- Thus, we can define a solution to a set covering problem as a subset festations, an abduction problem considers a knowledge base KB as DI ⊆ D covering M + . well as a set of symptoms to explain. We therefore define a Proposi- tional Horn Clause Abduction Problem (PHCAP) as follows: Definition 5 (Set Cover Diagnosis) Given a diagnosis problem P . A set ∆ ⊆ D is said to be a diagnosis iff ∆ covers M + . Definition 2 (Propositional Horn Clause Abduction Problem (PHCAP)) Given a knowledge base (A,Hyp,Th) and a set of observa- In regard to the logic-based definitions discussed for model-based tions Obs ⊆ A then the tuple (A,Hyp,Th,Obs) forms a Propositional diagnosis, the disorders refer to the set Hyp in the PHCAP frame- Horn Clause Abduction Problem (PHCAP). work. Their manifestations constitute the effects Σ, M + corresponds to Obs, and the network represents the domain theory. A causal re- Definition 3 (Diagnosis; Solution of a PHCAP) Given a PHCAP lation < di , mj > is recorded in C whenever there is a logical im- (A,Hyp,Th,Obs). A set ∆ ⊆ Hyp is a solution if and only if ∆ ∪ Th plication of the form di → mj , where di ∈ Hyp and mj ∈ Σ |= Obs and ∆ ∪ Th 6|= ⊥. A solution ∆ is parsimonious or minimal within the theory T h. Thus, it is apparent that the simple set cover- if and only if no set ∆0 ⊂ ∆ is a solution. ing framework is equivalent to logic-based abduction with a theory restricted to definite Horn clauses [23]. As both methods generate the Table 1: Example 2: FMEA taken from the wind turbine domain. explanatory causes based on the relationships between disorders and effects, they compute abductive explanations. That is a set covering Component Fault Mode Effect Fan Corrosion P turbine diagnosis, as defined previously, corresponds to a minimal diagnosis Fan TMF T cabinet, P turbine in the PHCAP framework. IGBT HCF T cabinet, T nacelle Example 1 (cont): Considering our previous example, the diagnosis problem P can be reformalized in set covering: set A then simply encompasses all proposition variables, while the D = {h1 , h2 , h3 }, M = {o1 , o2 , o3 }, M + = {o1 , o3 }, component-fault pairs compose Hyp.   Example 2: Transforming the FMEA given in Table 1 would lead to < h1 , o1 >, < h2 , o1 >, C= the following KB: < h2 , o2 >, < h3 , o2 >, < h3 , o3 >   We can obtain the set covering diagnoses by determining mode(F an, Corrosion), Hyp = the disorder sets DI where effects(DI ) cover M + , which mode(F an, T M F ), mode(IGBT, HCF ) are effects({h1 , h3 }) = {o1 , o2 , o3 } and effects({h2 , h3 }) =  {o1 , o2 , o3 }. Hence, the diagnoses are ∆1 = {h1 , h3 } and ∆2 = A= mode(F an, Corrosion), T cabinet, P turbine, . . . {h2 , h3 }.   The equivalence between set covering and the hitting set prob-  mode(F an, Corrosion) → P turbine,    lem has been established [16], thus we can exploit the notion of    mode(F an, T M F ) → P turbine,    hitting sets in order to define a diagnosis within the parsimonious Th = mode(F an, T M F ) → T cabinet, set covering theory. In particular, we stated previously that a cover  mode(IGBT, HCF ) → T cabinet,        implies a causal dependency between disorders and manifestations  mode(IGBT, HCF ) → T nacelle  and can be expressed through the effects relation. Along similar lines causes(mj ) comprises information on all disorders responsible for mj . Hence, by computing the hitting set of causes(mj ) for a sin- 3.1 Diagnosing the Models gle manifestation, we derive a disjunction of all disorders possibly leading to mj , i.e. each disorder constitutes a diagnosis. For multi- As is apparent from the example, the result of the mapping is a model ple observations, i.e. m1 , m2 , . . . , mn ∈ M + , the hitting sets of all consisting of bijunctive definite Horn clauses. Thus, we can simply causes-sets of the current manifestations form the diagnoses. This apply either model-based or set covering abduction to compute di- is apparent since in order to represent a solution one disorder ex- agnoses. However, since we would like to extend our approach to plaining each observation has to be present within each diagnosis. different failure assessments in practice, we allow more expressive To impose the parsimonious criteria, we restrict solutions to subset formalizations, i.e. conjunctions of causes and disjunction of mani- minimal hitting sets [30]. festations. In order to use these models within the PHCAP and simple set covering approach, we currently compile them into define Horn Definition 6 (Abductive Hitting Set Diagnosis) Given a diagnosis clauses. It is apparent that this increases the theory’s cardinality, re- problem P . A set ∆ ⊆ D is said to be a minimal diagnosis iff ∆ is a quires to reassemble the original hypotheses at the end of the diag- minimal hitting set of S, where ∀mj ∈ M + : causes(mj ) ∈ S. nosis and to ensure that subset minimality is still present. Now we explain how we can utilize the logical models obtained Example 1 (cont): The causes sets for the current manifestations are on basis of the FMEAs to perform abductive reasoning within the set causes(o1 ) = {h1 , h2 } and causes(o3 ) = {h3 }, thus causes(o1 ) ∈ cover approach. Given a model of this type we can easily represent S and causes(o3 ) ∈ S. The minimal hitting set of S correspond to it as a hypergraph H = (V, E), where V is the set of vertices and E ∆1 = {h1 , h3 } and ∆2 = {h2 , h3 }. constitutes the set of hyperedges. The nodes of the hypergraph repre- sent the propositional variables, while the hyperedges are determined 3 Models by the theory. For each clause there exists a hyperedge containing all propositional variables of said clause, i.e. ∀a ∈ A → a ∈ V and An essential issue in model-based diagnosis has been the construc- S ∀c ∈ T h → l∈c |l| ∈ E where || is a function mapping literals tion of system descriptions suitable for identifying faults. Thus, nu- to the underlying propositions ignoring negations, i.e., |¬p| = p and merous techniques to automatically extract models have been pro- |p| = p for all p ∈ A. In Figure 1 on the right hand side a hypergraph posed with a recent method taking advantage of Failure Mode Effect representation of Example 1 is shown. Analysis (FMEA) records [37]. This type of risk evaluation is be- Following this representation we can assign a label to each vertex coming increasingly common and collects data on how faults on a within a hyperedge E, such that: component level influence system variables [12]. Table 1 depicts a simplified example of an FMEA where each row contains a compo-   {v} if v ∈ Hyp nent, a possible fault of said component and its corresponding ef- label(v) = S fects. Since it captures the causal associations between defects and  label(x) otherwise x∈E∧x6=v their consequences it provides information necessary for abductive reasoning. In case a vertex represents a manifestation, its label correspond to In the straightforward mapping presented by Wotawa [37] each its causes-set, as it holds the hypotheses responsible for the effect. record of the FMEA is transformed into a set of Horn clause sen- Thus, we can utilize the labels of the nodes representing the observa- tences, where the component-fault mode pair implies an effect. These tions to compute the abductive diagnoses as hitting sets. Note that by formulas form the theory T h of a KB which we can utilize to relying on this notion we could further handle intermediate effects, compute abductive diagnoses based on a set of fault indicators. The which we do not discuss in more detail in this paper. h1 o1 |causes(oi ) ∩ causes(oj )| overlap(oi , oj ) = |causes(oi ) ∪ causes(oj )| h2 o2 The overlap of o1 and o2 at h2 is shown as a red oval on the left side of the DAG in Figure 1. In turn we can compute over- lap(o1 , o3 ) = 0. Peng and Reggia [31] define a pathognomonic ef- h3 o3 fect as an observation with a single cause. Thus, whenever a pathog- nomonic symptom is involved, we do not compute an overlap re- lation. By collecting these measures for any pair of hypotheses or Figure 1: DAG and hypergraph representation of Example 1. The effects, we can compute a value over the entire model. DAG shows shared hypotheses (left oval) and common effects (right oval) for pairs of nodes. 3.2.3 Independent Diagnosis Subproblem 3.2 Structural Metrics As stated the models we are considering are bijunctive definite Horn Whenever there exist several subproblems in our theory we refer to clauses. Thus, we can easily define their structural properties based them as independent diagnosis subproblems. If several subproblems on various graph representations. In the simplest case the theory is exist, the graphs representing the model are disconnected and each characterized as a directed acyclic graph (DAG) with two disjunctive independent diagnosis subproblem itself is a connected subgraph. node sets, namely the propositional variables constituting the causes In case all effects are pathognomonic, then each cause-effect rela- and the effects, respectively. This representation is equivalent to the tion represents its own independent diagnosis subproblem and thus associative network as described by Peng and Reggia [31] in their set we can observe that the model is orthogonal. Imagine the clause covering approach. Furthermore, it is apparent that by constructing h2 → o2 missing from the theory of Example 1. In this case we an undirected graph we receive a bipartite graph. would have two independent diagnosis subproblems, namely one in- Considering the graph representations of the model, we can ex- cluding h1 , h2 and o1 and the other one consisting of h3 , o2 and o3 . tract certain characteristics of their structure which we subsequently As an additional measure to the number of subproblems we further use within the algorithm selection process. As abductive diagnosis compute the average size over all independent diagnosis subprob- is possibly exponential in the number of causes to consider, the car- lems in case several exist. dinality of Hyp is an intuitive measure complexity. In addition, we collect the number of effects and connections within the theory. 3.2.4 Path Length 3.2.1 Outdegree and Indegree Another measure of connectedness within the model is the minimal Based on the DAG, we can compute for each vertice representing a path length between any two nodes on the hypergraph. In particular, hypothesis its outdegree, which specifies the number of manifesta- we measure the length of the minimal path between nodes represent- tions affected by said cause. Similarly, we measure the indegree of ing hypotheses, thus we compute the minimal number of hyperedges each effect, i.e. the number of hypotheses inferring the manifestation. to be traversed between each pair of hypothesis vertices. Note that for Considering the set covering framework we can define the degrees as a single model there are possibly several hypergraphs depending on follows: the number of independent diagnosis subproblems, thus we disregard outdegree(hi ) = |effects(hi )| paths between nodes belonging to different subproblems. Consider- ing the hypergraph in Figure 1, we can observe path(h1 , h2 ) = 2. indegree(mj ) = |causes(mj )| In regard to Example 1 we can observe outdegree(h2 ) = |effects(h2 )| = 2 and indegree(o3 ) = |causes(o3 )| = 1. Collected 3.2.5 Clustering Coefficient over the entire model these measures provide an intuitive metric of the basic magnitude of the theory and the connectedness of the graph. The clustering coefficient is a known measure of node clusters within a graph. It is evident that we cannot compute a clustering coefficient from the graph representations used so far, i.e. the DAG, bipartite 3.2.2 Covering and Overlap graph and hypergraph, due to the two disjoint node classes. There- Several disorders may cover the same effect, i.e. a manifestation can fore, we transform the bipartite graph by projection [18]. In partic- be explained by multiple causes. On basis of this we can define a ular, we remove all nodes corresponding to manifestations and link covering metric for each pair of hypotheses as the ratio between the two cause vertices vhi and vhk whenever they imply the same ef- number of common effects and the total number of symptoms in- fect, i.e. effects(hi )∩ effects(hk ) 6= ∅. Based on the resulting undi- duced by the hypotheses: rected graph featuring only the nodes corresponding to hypothe- ses, we compute for each node the local clustering coefficient as |effects(hi ) ∩ effects(hj )| c = ki (k2n , where n is the number of neighbors of the node and covering(hi , hj ) = i −1) |effects(hi ) ∪ effects(hj )| ki the number of edges between the neighbors of n. While in net- Figure 1 depicts on the right hand side of the DAG the shared work analysis the projection of bipartite graphs results in coefficients observation o2 between h2 and h3 as a blue oval. Thus, we can see differentiating from typical one-mode networks, this does not pose that covering(h2 , h3 ) = 31 . an issue in our case as we are solely interested in the models in our In a similar manner, we define the overlap of two effects as their problem space. Thus, the clustering coefficient provides for our mod- common sources in relation to all their causes: els another measure of covering between hypotheses. 3.2.6 Kolmogorov Complexity independent diagnosis subproblems comprising the current observa- tions. Based on the online and offline generated attributes we supply A simple encoding-based measure on a graph is its Kolmogorov the feature vector φ with the measurements of the current diagno- complexity, which defines a value equal to the length of the word sis problem. By providing all features to the machine learning algo- necessary to encode the graph. A straightforward manner in this con- rithm, we in turn retrieve a predicted best abduction method out of text is to compute the complexity based on the adjacency matrix of our portfolio for this specific scenario based on the trained classi- the undirected graph [25]. fier and the instance’s features. Subsequently, we can instantiate the diagnosis engine with the corresponding abduction method as well 3.2.7 Observation Dependent Metrics as diagnosis problem and compute the set of abductive explanations, i.e. ∆ − Set. Since not only the topology of the model is of interest, but also the In the remainder of this section we describe first our portfolio structure of the current diagnosis problem, we measure the indegree which currently includes five abductive diagnosis methods and sec- and the overlap among the elements of Obs as well as the number ond we list the metrics we have used within the meta-approach. of diagnosis subproblems involving variables of Obs, in case several exist. 4.1 Portfolio We employ a 1-of 5 portfolio, i.e. we select one approach from the 4 Meta-Approach static algorithm space containing five methods which can be utilized Algorithm selection aims at identifying the most appropriate method for abductive reasoning based on a propositional logic model. For out of a portfolio of techniques for a given problem instance in re- each technique, we give a brief description of the underlying notion. gard to its performance [35]. Performance in this context is most In particular, we utilize an Assumption-Based Truth Maintenance commonly associated with the computation time but could also re- Systems (ATMS) [4, 5] as a general abduction engine for proposi- fer to accuracy or simplicity. The model as described by Rice [35] tional Horn clauses [19]. Besides the ATMS our portfolio holds var- advocates for the use of features inherent to the problems within ious hitting set algorithms, which are capable of computing minimal the problem space in order to accurately map a new sample to its diagnoses as shown in Section 2.2. Thus, we simply compute for each most effective or efficient algorithm. This mapping is based on em- oi ∈ Obs the set causes(oi ), store them in the set S and derive the pirical performance data on representative samples of the approaches minimal hitting sets for S. The hitting set routines we included in our present in the algorithm space [17]. On basis of the features extracted meta-approach are the following: Binary Hitting Set Tree (BHSTree) and the execution records, a predictor can be trained which can deter- [21], HS-DAG [34, 10], HST [36], and Berge’s algorithm [2, 8]. mine aspects of the problem influencing the performance of an algo- rithm. Thereby each problem can be described by a set of attributes 4.1.1 ATMS which together with execution data allows a predictor to forecast the most valuable algorithm on an instance. Machine learning has been The ATMS operates on a graph representation of the logical theory, identified as a feasible approach to use as a prediction tool. where propositional variables are represented as nodes and the re- Generally, there are two possible objectives; either one algorithm lations within the theory determine the directed edges. By utilizing of the portfolio is to be selected based on a single predictor or for a label for each node, the ATMS determines the subset minimal set each approach within the portfolio the performance metric should be of hypotheses implying each vertex and thereby allows to directly determined as a basis of the selection. The latter requires a distinct record abductive explanations. Furthermore, by recording contradic- empirical hardness model for each method within the portfolio and tions it retains consistency. In order to generate the diagnoses for a thus whenever the algorithm for a new instance is to be selected, given PHCAP, a clause is added such that o1 ∧ o2 ∧ . . . ∧ on → obs, for each approach a prediction has to be made [13, 17]. SATzilla where o1 , o2 , . . . , on ∈ Obs and obs 6∈ A. The label of obs then [38] is an example of such a portfolio approach within the domain of contains the solution to the PHCAP. SAT solvers. For our meta-technique, however, we consider the first variant, where we train a single classifier for all abductive reasoning 4.1.2 HS-DAG methods to select a single approach for execution. We consider a 1-of n portfolio [38], where there are n algorithms Reiter’s [34] minimal hitting set approach exploits the structure of to choose from but only one is selected and executed to solve the di- a tree. To compute the hitting sets an initial set out of S is the ded- agnosis problem. Within the context of diagnosis our meta-approach icated root node. The tree is then iteratively extended in a breadth works the following way: As mentioned the foundation of model- first manner, where each node n is labeled by a set s ∈ S in case s based diagnosis is a description of the system to be diagnosed. Thus, is disjoint to the set of edge labels of the current path. If this is the the majority of the features can be computed offline on the diagnosis case an outgoing edge h(n) is generated for each σ ∈ s and after models present. Further, within this phase the empirical data on com- all s ∈ S have been processed each leaf represents a hitting set. To putation times of the various abductive reasoning approaches can be ensure minimality various techniques on pruning the tree have been collected and on basis of the metrics and the runtime information a developed. Some inadequacies of Reiter’s algorithm were corrected machine learning classifier is trained. Whenever the diagnosis pro- by Greiner et al. [10] and they further devised their HS-DAG as a cess is triggered by a detected anomaly, we retrieve our previously version of Reiter’s approach performed on a DAG instead of a tree. learned machine learning classifier as well as the offline determined metrics of the diagnosis model. Algorithm 1 describes the online 4.1.3 HST portion of the meta-approach, which is executed whenever new di- agnoses are to be computed. Online we have to collect the current The HST variant of HS-DAG operates on a tree instead of a graph PHCAP’s instance-based features such as |Obs| or the number of and avoids the construction of unnecessary nodes and costly subset Algorithm 1 MetAB procedure M ETAB (A, Hyp, T h, Obs) m ← retrieveClassifier() . Retrieves trained model φof f line ← retrieveMetrics(A, Hyp, T h) . Retrieves the previously computed model metrics φonline ← computeMetrics(A, Hyp, T h, Obs) . Computes the instance-based features φ = φof f line ∪ φonline algorithm ← predict(φ, m) . Forecasts the best performing algorithm for the diagnosis problem ∆ − Set ← diagnose(algorithm, A, Hyp, T h, Obs) . Computes diagnoses based on the predicted algorithm for the PHCAP return ∆ − Set end procedure checks [36]. Based on an ordered list of the elements within S the • Covering (maximum, average, standard deviation) algorithm limits the number of outgoing edges for each node to a • Overlap (maximum, average, standard deviation) specific range within the ordered list. By checking the current path and the tree already constructed, the algorithm decides to whether 3. Undirected graph a node has to be generated or the corresponding hitting set will be • Kolmogorov complexity based on adjacency matrix constructed later during the computation. • Local clustering coefficient (maximum, average, standard devi- ation)3 4.1.4 BHSTree 4. Hypergraph Lin and Jiang [21] propose the Binary Hitting Set Tree. First the • Path length (maximum, average, standard deviation)4 tree is constructed by splitting input sets on particular elements and 5. Instance specific/Observation dependent recursively adapting the sets and building the tree. During the bottom up traversal the hitting sets are constructed by merging the data of • Number of observations the child nodes. The minimization is performed by a minimization • Indegree current observation nodes (maximum, average, stan- function µ, which is not specified in detail in their paper. dard deviation) • Overlap current observation (maximum, average, standard de- 4.1.5 Berge’s Algorithm viation) This minimal hitting set algorithm, sometimes referred to as Berge’s • Number of independent diagnosis subproblems including cur- algorithm, uses an intuitive notion to compute hitting sets [8, 28]. rent observations By definition a hitting set intersected with any set of S is not empty, i.e. each set of S has to contribute to each hitting set. In Berge’s While Hutter et al. [14] state that the feature extraction method algorithm the minimal hitting sets are constructed and modified in- should be efficient, in our framework only the computation of a sub- crementally. Initially, the set of hitting sets H is empty. Whenever a set of these attributes has to be performed online, namely the compu- new s ∈ S is to consider, each η ∈ H is checked whether s ∩ η = ∅. tation of the instance specific metrics. The creation of the diagnosis In case it is not, i.e. η already hits s, η remains unchanged, otherwise models, the computation of the basic model features, i.e. feature sets it is removed from H and for each σ ∈ s a new set is created con- 1 to 4, and the training of the classifier are performed offline. Thus, taining η and σ. By checking whether subsets are present within H online the observation specific metrics have to be extracted, i.e. fea- the algorithm ensures to derive minimal hitting sets. ture set 5, the algorithm appropriate for the problem instance has to be predicted and the diagnoses have to be calculated. 4.2 Features 4.3 Empirical Evaluation In Section 3 we discussed the metrics we extract from our logical problem instances. Here we simply list the properties we compute in In this section we evaluate the feasibility of our meta-approach. On our meta-approach. the one hand we assess the quality of the model properties to train a machine learning classifier capable of predicting the most efficient 1. Logic model specific algorithm out of the portfolio in regard to its runtime for a specific PHCAP instance. On the other hand, we examine the efficiency of • Number of hypotheses the meta-approach overall in comparison to the abductive reasoning • Number of effects algorithms in the portfolio. • Number of causal relations, i.e. clauses in the theory Our meta-approach itself is implemented in Java as well as the ATMS engine, BHSTree5 and Berge’s algorithm. For the remaining • Number of independent diagnosis subproblems hitting set algorithms, i.e. HS-DAG and HST, we exploit PyMBD6 • Average size of independent diagnosis subproblems [33], a publicly available python library of minimal hitting set al- 2. DAG gorithms. To create a predictor based on the features we utilize the 3 Based on the projection of the undirected graph only containing hypothesis • Outdegree of hypothesis nodes (maximum, average, standard deviation) nodes. 4 Path length between hypothesis vertices. • Indegree of effect nodes (maximum, average, standard devia- 5 http://www.ist.tugraz.at/modremas/index.html 6 http://modiaforted.ist.tugraz.at/downloads/pymbd.zip tion) Waikato Environment for Knowledge Analysis (WEKA) library [11] the test set the meta-approach was on average 1.57% slower than an which provides a vast variety of classification algorithms. The ex- optimal algorithm selection (MetAB Opt), i.e. the predictor would periment was conducted on a Mac Pro (Late 2013) with a 2.7 GHz classify every instance correctly, would have been. 12-Core Intel Xeon ES processor and 64GB of RAM running OS X 10.10.5. 10000 1000 4.3.1 Data Log  Runtime  [in  ms] In order to evaluate the meta-algorithm we generated a test suite of 100 artificial samples. Note here that while there are real-world sam- ples based on FMEAs, we do not incorporate them in this evalua- 10 tion as the size of the practical models is rather limited. The syn- thetic samples obtained differ in their number of hypotheses, effects, 1 connections and the covering and overlap between causes and man- ifestations, respectively. Conjunctions of causes and disjunctions of 0.1 manifestations were created randomly within the samples and it was ensured that instances with various independent diagnosis problems 0.01 were present in the test suite. A total of 195 samples were created with a varying number of hypotheses (12 ≤ |Hyp| ≤ 3120), ef- fects (1 ≤ |M | ≤ 5000), clauses (12 ≤ |T h| ≤ 5100) and (1 ≤ |Obs| ≤ 30). With each experiment run we collected the 31 Figure 2: Underlying statistical distributions of the log runtimes for metrics described in the previous section to build our feature vector7 the test set. for the classification. For each problem instance we executed our ab- ductive reasoning algorithms and recorded the most efficient algo- rithm on each problem instance. To evaluate the classification based We explored WEKA’s attribute selection in order to determine on the features we randomly split the test suite into a training set whether we could remove certain features while achieving the same comprising 80% of the data and a test set holding 20% of the exam- prediction accuracy. Utilizing the meta-classifier with the LWL clas- ples. We standardized our data before performing any classification. sifier, we examined various selection approaches on the training data and could diminish the set of features significantly from 32 to around four8 . The number and composition of the reduced attribute set de- 4.3.2 Results pends highly on the performed selection process. For example uti- lizing the OneR evaluator and limiting the number of features of Before selecting the classification method, we performed cross val- the reduced set to three, we receive the attribute set consisting of idation on several classification algorithms available in WEKA on the number of current diagnosis problems , the number of observa- the training data. Based on the accuracy obtained we decided to use tions and number of effects of the entire model. Attribute selection on WEKA’s implementation of a general algorithm for locally weighted grounds of the information gain results in number of current diagno- learning LWL. While within this experiment we did not compare dif- sis problems, the number of observations, the average path length on ferent classifiers besides the initial informal evaluation, we do be- the hypergraph and its standard deviation. Utilizing the SVM-based lieve that examining various machine learning algorithms will be of reduction, we receive the number of observations, the standard devi- interest in future research on this matter. LWL performs prediction ation of the indegree of the nodes representing the observations, the by building a local model based on the weighted neighborhood data average current covering relation and its standard deviation as well of the attribute of interest. In our case the this attribute is nominal as as the average of the covering relation over the entire model. As and simply corresponds to the algorithms name. In regard to the pa- can be seen the size of Obs plays an essential role in predicting the rameters we utilized a brute force nearest neighbor search algorithm, preferable algorithm. In regard to the remaining selected properties included all neighbors in the linear weighting function and an en- they provide information on the PHCAP, i.e. the current observa- tropy based classifier. tions, and various metrics on how hypotheses are connected through The classification utilizing LWL and based on the metrics reaches effects. An in depth analysis of a reduced feature reduction would be a satisfactory success rate of 71.79% (MAE=0.22, RMSE=0.31) cor- a topic of further research. rectly predicted instances, i.e. the selected algorithm was in fact the most efficient on the problem, on the test set. The confusion ma- Table 2: Confusion Matrix for the artificial test set. The rows repre- trix in Table 2 shows the number of correctly and wrongly classified instances. From the contingency table it is apparent that within the sent the actual number of instances within the category, while the test set Berge’s algorithm was the dominant approach, thus, our pre- columns show the predicted outcome. dictor classified all but one instance as Berge’s algorithm. A known Berge HS-DAG ATMS BHSTree HST Total Berge 27 0 0 0 0 27 limitation of this type of algorithm, selection where simply a single HS-DAG 1 0 0 0 0 0 approach is chosen and executed on the instance, is that in case the ATMS 2 0 0 0 0 2 BHSTree 8 0 0 0 0 8 prediction is incorrect the meta-approach might be rather inefficient HST 0 0 0 0 1 1 [17]. It is to note that in case our classifier chose a slower approach, Total 38 0 0 0 1 39 the selected algorithm was the second fastest within the portfolio. On Figure 2 shows the distribution of the log runtime data for the 7 Note that the feature vector itself holds 32 values, as one is the nominal value corresponding to the algorithm’s name. This last feature is to be pre- 8 The feature of the most efficient algorithm remains of course within the set dicted. after selection and therefore accounts for one feature in the reduced vector. 10000000 1000000 1000000 100000 100000 10000 Log  Runtime  [in  ms] Log  Runtime  [in  ms] 1000 10000 100 1000 10 100 1 10 0.1 1 0.01 0.1 Berge ATMS BHSTree HS-­‐DAG HST MetAB_Opt MetAB_71.79% (a) Cumulative runtimes over the entire sample set. (b) Underlying statistical distributions of the log runtimes for the entire sam- ple set. Figure 3: Runtime plots over the entire sample set. test set. In case of our artificial examples the meta-approach MetAB the other algorithms in the portfolio. In Figure 3b we have depicted (M=0.83 ms, SD =1.54 ms) performs well, i.e. is on average the the distribution of the log runtimes of the various approaches. most efficient. On average its is 99% faster than HST (M=82.78 ms, SD=455.65 ms), 94.86% more efficient than the ATMS (M=16.16 ms, SD=76.76 ms), around 85% faster than BHSTree (M=5.57 ms, 5 Conclusion SD=30.43 ms) and Berge’s algorithm (M=5.9 ms, SD=32.72 ms), Even though abduction is an intuitive approach to diagnosis, its com- and still computes diagnoses around 58.27% faster than the HS-DAG putational complexity remains a disadvantage. Since the complexity (M=1.99 ms, SD=4.5 ms). is inherent to the underlying diagnosis problem, we investigate algo- The overall runtime for the meta algorithm is determined by (1) rithm selection as a method to predict the most efficient abduction the computation of the online metrics, (2) the time it takes to create approach for a particular instance. An essential part within algorithm the feature vector, supply it to the classifier and predicting the best selection is the exploration of characteristics of the problems which algorithm and (3) the diagnosis time of the suggested abduction pro- contribute to the computational effort. Hence, we consider various cedure. In regard to the feasibility of the meta-approach overall in metrics characterizing the type of models generated from failure as- comparison to other abductive diagnosis methods, we like to refer sessments available in practice. An advantage of this approach in the back to a particular characteristic of model-based diagnosis, namely context of abductive diagnosis is that the majority of these features the availability of a system description offline. As the model has to can be collected offline. Based on the attributes we form a feature be present before the computation of the diagnoses, it allows us to vector for a machine learning classifier as part of a meta-approach. extract most of the metrics utilized in the algorithm selection offline. The empirical evaluation showed that the extracted properties of the Thus, the online computation of the features which are inherent to the instances allow to determine the “best” abduction method. Even in specific instance of the PHCAP is negligible (< 0.1 ms). The predic- cases where the classification is incorrect, the approach selects the tion of the algorithm for a single instance for the diagnosis models we second most efficient algorithm and thus overall outperforms the investigated was insignificant. The third factor, the diagnosis time, is other diagnosis methods. Therefore, we believe that this meta ap- much dependent on the predictive capabilities of the classifier. proach is a feasible alternative to continuously using a single abduc- A premature analysis of the results of the test data would sug- tion procedure. Despite the satisfactory classification results, we plan gest that applying Berge’s method to every instance would yield on further extending the set of problem metrics, explore the capabil- the optimal runtime for most problems. However, from the cu- ities of various machine learning classifiers as well as determine dif- mulative log runtimes Figure 3a we can observe that based on ferent attribute combinations yielding a more accurate classification the entire set of problems, i.e. test and training, Berge is not the result. most efficient approach as on several instances its computation time is notably larger than of other algorithms. On the entire sample we observe that only considering the algorithms from the portfo- ACKNOWLEDGEMENTS lio, HS-DAG is on average the best performing approach (M=5.5 The work presented in this paper has been supported by the FFG ms, SD=32.31 ms) followed by Berge’s algorithm (M=15.62 ms, project Applied Model Based Reasoning (AMOR) under grant SD=68.39 ms) and BHSTree (M=16.88 ms,SD=76.6 ms), while 842407. We would further like to express our gratitude to our in- the ATMS (M=101.37ms, SD=563.85 ms) still outperforms HST dustrial partner, Uptime Engineering GmbH. (M=8968.04ms, SD=70873.65ms). Furthermore, based on the pre- diction accuracy on the test set, we have created a mock meta- approach (MetAB 71.79%) with 71.79% accuracy as we have expe- REFERENCES rienced on the test data. Thus, in 71.79% of the samples we recorded [1] Joachim Baumeister and Dietmar Seipel, ‘Diagnostic reasoning with for this approach the optimal time and for the remaining 28.21% the multilevel set-covering models’, Technical report, DTIC Document, second fastest time. We choose the instances with the slower run- (2002). [2] Claude Berge. Hypergraphs, volume 45 of north-holland mathematical times randomly. This mock meta-approach would still outperform, library, 1989. [3] Luca Console, Daniele Theseider Dupre, and Pietro Torasso, ‘On the (2003). Relationship Between Abduction and Deduction’, Journal of Logic and [22] Pierre Marquis, ‘Consequence finding algorithms’, in Handbook of Computation, 1(5), 661–690, (1991). Defeasible Reasoning and Uncertainty Management Systems, 41–145, [4] Johan de Kleer, ‘An assumption-based TMS’, Artificial Intelligence, 28, Springer, (2000). 127–162, (1986). [23] Sheila A McIlraith, ‘Logic-based abductive inference’, Knowledge Sys- [5] Johan de Kleer, ‘Problem solving with the ATMS’, Artificial Intelli- tems Laboratory, Technical Report KSL-98-19, (1998). gence, 28(2), 197–224, (1986). [24] Michael Morak, Nysret Musliu, Reinhard Pichler, Stefan Rümmele, and [6] Marc Denecker and Antonis Kakas, ‘Abduction in logic programming’, Stefan Woltran, ‘Evaluating tree-decomposition based algorithms for in Computational Logic: Logic Programming and Beyond, 402–436, answer set programming’, in Learning and Intelligent Optimization, Springer, (2002). 130–144, Springer, (2012). [7] Thomas Eiter and Georg Gottlob, ‘The complexity of logic-based ab- [25] Abbe Mowshowitz and Matthias Dehmer, ‘Entropy and the complexity duction’, Journal of the ACM (JACM), 42(1), 3–42, (1995). of graphs revisited’, Entropy, 14(3), 559–570, (2012). [8] Thomas Eiter, Kazuhisa Makino, and Georg Gottlob, ‘Computational [26] Nysret Musliu and Martin Schwengerer, ‘Algorithm selection for the aspects of monotone dualization: A brief survey’, Discrete Applied graph coloring problem’, in Learning and Intelligent Optimization, Mathematics, 156(11), 2035–2049, (2008). 389–403, Springer, (2013). [9] Gerhard Friedrich, Georg Gottlob, and Wolfgang Nejdl, ‘Hypothesis [27] Gustav Nordh and Bruno Zanuttini, ‘What makes propositional abduc- classification, abductive diagnosis and therapy’, in Expert Systems in tion tractable’, Artificial Intelligence, 172(10), 1245–1284, (2008). Engineering Principles and Applications, 69–78, Springer, (1990). [28] Mattias Nyberg, ‘A generalized minimal hitting-set algorithm to han- [10] Russell Greiner, Barbara A Smith, and Ralph W Wilkerson, ‘A correc- dle diagnosis with behavioral modes’, Systems, Man and Cybernetics, tion to the algorithm in reiter’s theory of diagnosis’, Artificial Intelli- Part A: Systems and Humans, IEEE Transactions on, 41(1), 137–148, gence, 41(1), 79–88, (1989). (2011). [11] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter [29] Ekaterina Ovchinnikova, Niloofar Montazeri, Theodore Alexandrov, Reutemann, and Ian H Witten, ‘The weka data mining software: an Jerry R Hobbs, Michael C McCord, and Rutu Mulkar-Mehta, ‘Abduc- update’, ACM SIGKDD explorations newsletter, 11(1), 10–18, (2009). tive reasoning with a large knowledge base for discourse processing’, [12] Peter G. Hawkins and Davis J. Woollons, ‘Failure modes and effects in Computing Meaning, 107–127, Springer, (2014). analysis of complex engineering systems using functional models’, Ar- [30] Yun Peng and James A Reggia, ‘Plausibility of diagnostic hypotheses: tificial Intelligence in Engineering, 12, 375–397, (1998). The nature of simplicity.’, in AAAI, volume 86, pp. 140–145, (1986). [13] Frank Hutter, Youssef Hamadi, Holger H Hoos, and Kevin Leyton- [31] Yun Peng and James A Reggia, Abductive inference models for diag- Brown, ‘Performance prediction and automated tuning of randomized nostic problem-solving, Springer, 1990. and parametric algorithms’, in Principles and Practice of Constraint [32] David Poole and Keiji Kanazawa, ‘A decision-theoretic abductive ba- Programming-CP 2006, 213–228, Springer, (2006). sis for planning’, in AAAI Spr. Symp. on Decision-Theoretic Planning, [14] Frank Hutter, Lin Xu, Holger H Hoos, and Kevin Leyton-Brown, ‘Al- (1994). gorithm runtime prediction: Methods & evaluation’, Artificial Intelli- [33] Thomas Quaritsch and Ingo Pill, ‘Pymbd: A library of mbd algo- gence, 206, 79–111, (2014). rithms and a light-weight evaluation platform’, Proceedings of Dx- [15] Antonis C. Kakas, Robert A. Kowalski, and Francesca Toni, ‘Abductive 2014, (2014). logic programming’, Journal of logic and computation, 2(6), 719–770, [34] Raymond Reiter, ‘A theory of diagnosis from first principles’, Artificial (1992). Intelligence, 32(1), 57–95, (1987). [16] Richard M Karp, Reducibility among combinatorial problems, [35] John R Rice, ‘The algorithm selection problem’, (1975). Springer, 1972. [36] Franz Wotawa, ‘A variant of reiter’s hitting-set algorithm’, Information [17] Lars Kotthoff, ‘Algorithm selection for combinatorial search problems: Processing Letters, 79(1), 45–51, (2001). A survey’, arXiv preprint arXiv:1210.7959, (2012). [37] Franz Wotawa, ‘Failure mode and effect analysis for abductive diagno- [18] Matthieu Latapy, Clémence Magnien, and Nathalie Del Vecchio, ‘Basic sis’, in Proceedings of the International Workshop on Defeasible and notions for the analysis of large two-mode networks’, Social networks, Ampliative Reasoning (DARe-14), volume 1212. CEUR Workshop Pro- 30(1), 31–48, (2008). ceedings, ISSN 1613-0073, (2014). http://ceur-ws.org/Vol-1212/. [19] Hector J Levesque, ‘A knowledge-level account of abduction.’, in IJ- [38] Lin Xu, Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown, CAI, pp. 1061–1067, (1989). ‘Satzilla: portfolio-based algorithm selection for sat’, Journal of Ar- [20] Kevin Leyton-Brown, Eugene Nudelman, Galen Andrew, Jim McFad- tificial Intelligence Research, 565–606, (2008). den, and Yoav Shoham, ‘A portfolio approach to algorithm selection’, [39] Yifan Yang, Jamal Atif, and Isabelle Bloch, ‘Abductive reasoning us- in IJCAI, volume 1543, p. 2003, (2003). ing tableau methods for high-level image interpretation’, in KI 2015: [21] Li Lin and Yunfei Jiang, ‘The computation of hitting sets: Review Advances in Artificial Intelligence, 356–365, Springer, (2015). and new algorithms’, Information Processing Letters, 86(4), 177–184,