Computing Weighted Solutions in ASP: Representation-Based Method vs. Search-Based Method Duygu Çakmak, Halit Erdoğan, and Esra Erdem Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul, Turkey Abstract For some problems with many solutions, like planning and phylogeny reconstruction, one way to compute more desirable solutions is to assign weights to solutions, and then pick the ones whose weights are over (resp. below) a threshold. This paper studies computing weighted solutions to such problems in Answer Set Programming. We investigate two sorts of methods for computing weighted solutions: one suggests modifying the representa- tion of the problem and the other suggests modifying the search procedure of the answer set solver. We show the applicability and the effectiveness of these methods in reconstructing weighted phylogenies for Indo-European languages. We also compare these methods in terms of computational effi- ciency; the experimental results show that the search-based method is better. 1 Introduction In Answer Set Programming (ASP) [10], a computational problem is described as an ASP program whose answer sets correspond to solutions, and answer sets for this program are computed using answer set solvers. Some problems, like planning and phylogeny reconstruction, have many solutions. Moreover, the correspondence between the answer sets and the solutions may not be one-to-one; there may be many answer sets that denote the same solution. For such problems, one way to compute more desirable solutions is to assign weights to solutions, and then pick the distinct solutions whose weights are over (resp. below) a threshold. For ex- ample, in a planning problem, we can define the weight of a plan in terms of the costs of actions (or action sequences), and then compute the distinct plans whose weights are less than a given value. In puzzle generation, we can define the weight of a puzzle instance by means of some difficulty measure, and then generate diffi- cult puzzles whose weights are over a given value. Motivated by such applications, we study the problem of computing weighted solutions in ASP and show the appli- cability of our approach in phylogeny reconstruction (i.e., computing leaf-labeled trees, called phylogenies, to model the evolutionary history of a set of species). We study two sorts of methods for computing weighted solutions and compare them experimentally: Representation-based methods The idea is to modify the representation of the problem, to compute weighted solutions. In some cases, we do not have to define the weight of a solution explicitly; we can use aggregates (e.g., Proceedings of the 17th International RCRA workshop (RCRA 2010): Experimental Evaluation of Algorithms for Solving Problems with Combinatorial Explosion Bologna, Italy, June 10–11, 2010 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings sum, count, times) to compute the weight, in the sense of [14, 8, 15]. Some other problems require an explicit ASP definition of the weight of a solution. Phylogeny reconstruction problems we consider are in the latter group: the weight of a phylogeny does not only depend on the weights of some parts of the phylogeny but also some domain-specific information. Search-based methods These methods do not modify the ASP representation of the problem, but define the weight function externally (e.g., as a C++ pro- gram) and modify the search algorithm of the answer set solver to compute solutions over (resp. below) a given threshold. There is no answer set solver that can compute weighted solutions in such a way. In our studies, we con- sider the solver C LASP [9], and modify its search algorithm to implement the suggested method. We apply these methods to phylogeny reconstruction. Reconstructing phylo- genies for a given set of taxonomic units is important for various research such as historical linguistics, zoology, anthropology, archeology, etc.. For example, a phylogeny of languages may help scientists to better understand human migra- tions [16]. For a given set of taxonomic units, some existing phylogenetic systems, like that of [1], generate more than one phylogeny that explains the evolutionary relationships between the given taxonomic units. In such cases, to pick the most plausible ones, an expert needs to go over these phylogenies manually and in detail. There is no phylogenetic system that can help experts to order phylogenies with re- spect to a desirability/plausibility measure that includes also some domain-specific information. In this study, we define the weight of a phylogeny for the family of Indo-European languages studied in [1], in such a way as to reflect the plausibility criterion provided by the historical linguistics. Using this weight function, we show the applicability and effectiveness of the methods above (for computing weighted solutions) in reconstructing plausible phylogenies for Indo-European languages, and compare them in terms of computational efficiency. To summarize, the main contributions of this paper are as follows: • We precisely describe the decision/optimization problems for computing weighted solutions in ASP (Section 2). • We introduce a search-based method for computing weighted solutions, and implement it by modifying the search algorithm of the answer set solver C LASP (Section 3); the ASP representation of the problem is not modified. The new algorithm is called C LASP -W. In this method, the weight of a so- lution is defined in C++ in a separate file; this allows us to use C LASP -W to compute weighted solutions to various problems. • We also describe a representation-based method for computing weighted so- lutions, where the weight of a solution is defined as an ASP program (Sec- tion 3). With this method, we can use an existing ASP solver as is. 2 • We compare the representation-based method and the search-based method in terms of computational efficiency (e.g., computation time and memory consumed) (Section 7), for reconstructing weighted phylogenies for Indo- European languages [1] (Section 5). Experimental results show that the search-based method is better than the representation-based method. • Extending the applicability of the methods of [6] for computing similar (diverse) solutions, we describe the problem of computing diverse/similar weighted solutions in ASP, introduce a search-based method to compute these solutions, and implement it by modifying the search algorithm of C LASP (Section 6). The new algorithm is called C LASP -NKW. • While evaluating the methods above experimentally in reconstructing phy- logenies for Indo-European languages, we also illustrate their effectiveness in reconstructing plausible phylogenies (Section 5). For that, we introduce a weight function that takes into account the compatibility criterion for recon- structing phylogenies as well as some domain-specific information provided by the historical linguists (Section 4). The experimental results illustrate the usefulness of our methods for phylogenetics. This paper extends [3] by a detailed discussion on the representation-based method and the search-based method for computing weighted solutions as well as a discussion on computing similar/diverse weighted solutions. It also extends the discussion of experimental results. 2 Weighted Solutions We are interested in the following sorts of computational problems for computing weighted solutions: AT LEAST (resp. AT MOST ) w- WEIGHTED SOLUTION: Given an ASP pro- gram P that formulates a computational problem P , a weight measure ω that maps a solution for P to a nonnegative integer, and a nonnegative inte- ger w, decide whether a solution S exists for P such that ω(S) ≥ w (resp. ω(S) ≤ w). For instance, suppose that P describes the phylogeny reconstruction problem for Indo-European languages, and that ω describes the total weight of the charac- ters compatible with the to-be-constructed phylogeny and takes into account some domain-specific information. Then finding phylogenies whose weights are at least 45 is an instance of the problem above. Suppose that the ASP program P is a propositional normal logic program and deciding ω(S) ≤ w (resp. ω(S) ≥ w) for a given w is in NP. Then, we can conclude that Theorem 1. AT LEAST (resp. AT MOST ) w- WEIGHTED SOLUTION is NP-complete. 3 w Compute.lp (P): (Computes a solution to the problem) Weight.lp (W): (Computes the weight of a solution) Constraint.lp (C): (Eliminates the solutions whose weight is above/below w) w – weighted solutions Figure 1: Representation-based method: computing at most/least w-weighted so- lutions. Membership follows from the fact that we can not only guess a candidate solution S but also a witness for ω(S) ≤ w (resp. ω(S) ≥ w), and check in polynomial time whether ω(S) ≤ w (resp. ω(S) ≥ w). For hardness, we can reduce the answer-set existence for normal propositional programs, which is an NP-complete problem, to this problem. 3 Computing Weighted Solutions We study two sorts of methods, representation-based and search-based, for com- puting at least/most w-weighted solutions of a given problem P , given an ASP program P, a weight measure ω that maps a solution to a nonnegative integer, and a nonnegative integer w. The idea behind the representation-based methods is to modify the represen- tation of the problem, to compute weighted solutions. In some cases, we do not have to define the weight of a solution explicitly; we can use aggregates (e.g., sum, count, times) to compute the weight, in the sense of [14, 8, 15]. Some other prob- lems require an explicit definition of the weight of a solution. Phylogeny recon- struction problems we consider are in the latter group: the weight of a phylogeny does not only depend on the weights of some parts of the phylogeny but also some domain-specific information. Therefore, we consider representation-based meth- ods (as outlined in Figure 1) that formulate the weight function ω as an ASP pro- gram W, the constraints on the weights of the solutions also as an ASP program C, and then compute at least (resp. most) w-weighted solutions by computing answer sets for the ASP program P ∪ W ∪ C. Search-based methods (as outlined in Figure 2) on the other hand do not modify the ASP representation of the problem, but define the weight function externally (e.g., as a C++ program) and modify the search algorithm of the answer set solver to compute solutions over (resp. below) a given threshold. There is no answer set solver that can compute weighted solutions in such a way. In our studies, we consider the solver C LASP [9], and modify its search algo- rithm to implement the search-based method above. Before describing these mod- 4 w Compute.lp (P): (Computes a solution to the problem) *The weight function is integrated into the solver. w – weighted solutions Figure 2: Search-based method: computing at most/least w-weighted solutions. Algorithm 1 CLASP Input: An ASP program Π Output: An answer set A for Π A ← ∅ // current assignment of literals 5 ← ∅ // set of conflicts while No Answer Set Found do // propagate according to the current assignment and conflicts; update the current assignment PROPAGATION (Π, A, 5) if There is a conflict in the current assignment then RESOLVE - CONFLICT (Π, A, 5) // learn and update the conflict set and do backtracking else if Current assignment does not yield an answer set then SELECT (Π, A, 5) // select a literal to continue search else return A end if end if end while ifications, let us describe first C LASP’s algorithm. C LASP does a conflict-driven DPLL-like [4] branch and bound search to find an answer set for the program: at each level, it does propagation followed by backtracking or selection of new lit- erals according to the current conflicts. A rough working principle of C LASP is shown in Algorithm 1. As can be seen, C LASP goes through three main steps to find an answer set. In the PROPAGATION step, it decides the literals that have to be included in the answer set due to the current assignment and conflicts. In the RESOLVE - CONFLICT step, it tries to resolve the conflicts encountered in the previ- ous step. If there is a conflict, then C LASP learns it and does backtracking to an appropriate level. If there are no conflicts and the currently selected literals do not represent an answer set, in SELECT, C LASP selects a new literal (based on some heuristics) to continue search. We modify the C LASP’s algorithm (Algorithm 1) to compute at least (resp. at most) w-weighted solutions, as shown in Algorithm 2; the parts in red denote these modifications. The new algorithm is called C LASP -W. Note that compared to C LASP, C LASP -W has an additional procedure called WEIGHT- ANALYZE; there- fore, in order to use C LASP -W, we need to implement the WEIGHT- ANALYZE 5 Algorithm 2 CLASP-W Input: An ASP program Π and a nonnegative integer w Output: An answer set for Π, that describes an at least (resp. at most) w-weighted solution A ← ∅ // current assignment of literals 5 ← ∅ // set of conflicts while A does not represent an answer set do // propagate according to the current assignment and conflicts;update the current assignment PROPAGATION (Π, A, 5) // compute an upper (resp. lower) bound for the weight of a solution that contains A weight ← WEIGHT- ANALYZE(A) // if the upper bound weight is less than the desired weight value w // then no need to continue search to find an at least w-weighted solution if There is a conflict in propagation OR weight < w then RESOLVE - CONFLICT (Π, A, 5) // learn and update the conflict set and do backtracking end if if Current assignment does not yield an answer set then SELECT(Π, A, 5) // select a literal to continue search else return A end if end while return false function according to the weight measure of the specific domain. The WEIGHT- ANALYZE function is called at each step of the search; therefore, it should be capable of identifying the partial solution formed by the currently selected literals, and measuring the weight of that partial solution. Since a partial solution may extend to many complete solutions, the WEIGHT- ANALYZE function computes instead an upper bound (resp. a lower bound) for the weight of a solution that extends the current partial solution. Computing an exact upper bound (resp. a lower bound) might be hard and inefficient; therefore, one may be interested in implementing a heuristic function that computes an approximate upper bound (resp. lower bound) for a solution. To guarantee to find a complete solution, the heuristic function shall be admissible. In other words, the upper bound (resp. lower bound) computed by the heuristic function shall be greater (resp. less) than or equal to the exact upper bound (resp. lower bound). If this is not the case, then we have a risk of missing a solution. Once we define the WEIGHT- ANALYZE function to estimate the weight of a solution, we check whether the estimated weight is less (resp. greater) than the given weight threshold w. If the upper bound (resp. the lower bound) computed by the heuristic function is already less (resp. greater) than the given weight threshold w, then there is no solution that can be characterized by the current assignment of literals and that has a weight greater (resp. smaller) than w. Therefore; we set the current assignment of literals as conflict in that case. After setting an assignment as conflict, C LASP learns that assignment and does backtracking and never selects those assignment in the further stages of the search. 6 4 Weighted Phylogeny Reconstruction Problem The evolutionary relations between species (or “taxonomic units”) based on their shared traits can be modeled as a phylogeny—a tree whose leaves represent the species, internal vertices represent their ancestors and edges in between represent the relationships between them. The problem of phylogeny reconstruction asks for “plausible” phylogenies for a given set of taxonomic units. The plausibility of phylogenies can be evaluated with respect to domain-specific information (e.g., biological evidence, archeological evidence, historical linguistics) provided by the experts. There have been various studies to compute plausible phylogenies (cf. [1]). We consider a character-based cladistics approach with respect to the compatibility criterion, as in [1]. In this approach, we describe each taxonomic unit with a set of “characters”—traits that every taxonomic unit can instantiate in a variety of ways. The taxonomic units that instantiate the character in the same way are assigned the same “state” of that character. Once we describe the state of every character for every taxonomic unit, we can reconstruct phylogenies using the “compatibility” criterion: our goal is to find a phylogeny with a small number of incompatible characters. The problem of reconstructing a phylogeny with at most k incompatible characters (let us call this problem as k-CP) is NP-hard [5]. While reconstructing phylogenies, some characters may give more information than the others. For instance, to model the evolutionary history of a family of lan- guages, morphological/phonological characters are more informative than lexical characters. In order to emphasize the role of such characters in reconstructing a phylogeny, we define the concept of a weighted phylogeny. Before we define weighted phylogenies, let us define a phylogeny. A phylogeny for a set of taxonomic units is a finite rooted binary tree hV, Ei along with two finite sets I and S and a function f from L x I to S, where L is the set of leaves of the tree. The set L represents the given taxonomic units, whereas the set V describes their ancestral units and the set E describes the genetic relationships between them. The elements of I represent, intuitively, qualitative characters, and elements of S are possible states of these characters. The function f “labels” every leaf v by mapping every index i to the state f (v, i) of the corresponding character in that taxonomic unit. A weighted phylogeny is a phylogeny (V, E, L, I, S, f ) along with a weight function Φ that maps every character i ∈ I to a nonnegative integer. The weight of a phylogeny (V, E, L, I, S, f ) can be defined in various ways with respect to Φ; in the following, we consider the weight of a phylogeny as the sum of the weights of all characters that are compatible with that phylogeny. We are interested in computing weighted phylogenies for a given set of taxo- nomic units, and thus the following decision problem: AT LEAST w- WEIGHTED k- COMPATIBLE PHYLOGENIES (wk-WCP): Given three sets L, I, S, a function f from L × I to S, a function Φ from I to non- 7 negative integers, and two nonnegative integers w and k, decide the existence of a phylogeny (V, E, L, I, S, f ) with at most k incompatible characters and whose weight is at least w. Note that wk-WCP is NP hard: we can reduce k-CP to wk-WCP by taking Φ(i) = 1 for every i ∈ I. 5 Computing Weighted Phylogenies for Indo-European Lan- guages We can compute weighted phylogenies for the family of Indo-European languages described in [1], using the representation-based method or the search-based method described in Section 3. Before we show the applicability of these methods, let us define the weight Φ of a character for languages, and the weight of a phylogeny (V, E, L, I, S, f ) for Indo-European languages. 5.1 Weight of a Phylogeny for Indo-European As discussed in [1], out of all given characters and character states, only informa- tive characters1 and their essential states2 play a role in reconstructing phylogenies. Therefore, we consider informative characters and their essential states only while reconstructing phylogenies for Indo-European languages. We have observed that some informative characters have less number of essential states whereas some have many; the ones with many essential states give more information as to how the languages are related to each other. Based on this domain-specific informa- tion, we define the weight Φ of an informative character i ∈ I with respect to a phylogeny (V, E, L, I, S, f ), as the number of languages that are mapped to an essential state for that character: ΦL,I,S,f (i) = |{l ∈ L : f (i, l, s) = s, i is informative, s is essential}| (1) In the following, we drop the subscript from ΦL,I,S,f when L, I, S, f are clear from the context. We can define the weight of a phylogeny (V, E, L, I, S, f ) for Indo-European languages simply as the sum of the weights of compatible characters with that phylogeny: X weight(V, E, L, I, S, f ) = Φ(i) (2) i∈I,i is informative To get more plausible phylogenies, we also incorporate further domain-specific in- formation. It is told us by historical linguist Don Ringe that it is least likely that Greco-Armenian (GA) languages are siblings with Balto-Slavic (BS) languages. 1 A character is informative if it has at least 2 essential states. 2 For a phylogeny (V, E, L, I, S, f ), a state s ∈ S is essential with respect to a character j ∈ I if there exist two different leaves l1 and l2 in L such that f (l1 , j) = f (l2 , j) = s. 8 Similarly, but not as least likely as GA and BS, is the grouping of GA with Ger- manic (GE) languages. If the to-be-reconstructed phylogenies have such odd group- ings of languages, we reduce some amount from the total weight of the phylogeny making sure that the weight of a phylogeny is not negative. 5.2 The Representation-Based Method The representation-based method (Section 3, Figure 1) suggests describing the weight of a phylogeny as an ASP program W , the weight constraints as an ASP program C, and then compute weighted phylogenies by means of computing an- swer sets for the program P ∪W ∪C where P is a program that describes phylogeny reconstruction. We describe ASP programs in the language of L PARSE [13]. We take P as the phylogeny program of [1]. We describe the weight of a phylogeny (defined in the previous section) as an ASP program W in four parts, as follows. Suppose that PW, W denote phylogeny weights, IC denotes an incompatible character, CW denotes a character weight, and C denotes a character. First we describe the weight CW of an informative character IC as in (1): weightOfChar(IC,CW) :- CW{leaf(V):f(V,IC,S):essential_state(IC,S)}CW. In the second part, first we define the sum of the weights of characters compat- ible with the to-be-reconstructed phylogeny, as in (2): totalWeightOfChars(PW) :- addCharWeights(PW,C), maxChar(C). addCharWeights(PW,1) :- compatible(1), weightOfChar(1,PW). addCharWeights(0,1) :- not compatible(1). addCharWeights(PW+CW,C) :- compatible(C), weightOfChar(C,CW), addCharWeights(PW,C-1). addCharWeights(PW,C) :- not compatible(C), addCharWeights(PW,C-1). Observe in the third and the fifth rules that the incompatible characters do not contribute to the weight of a phylogeny. In the third part, we take into account the domain-specific information dis- cussed in the previous section. For instance, it is least likely that Greco-Armenian (GA) languages be siblings with Balto-Slavic (BS) languages. Therefore, we de- crease some amount, say 30, from the weight of a phylogeny if it groups GA and BS as siblings. For Indo-European, historical linguist Don Ringe identified 4 odd groupings. We define the sum W of deductions due to all these odd groupings: totalDeductions(W) :- addDeductions(W,4). addDeductions(W,1) :- oddGroup(1,W). addDeductions(W+PW,G+1) :- oddGroup(G+1,PW), addDeductions(W,G). Finally, in the fourth part, we define the weight of a phylogeny by subtracting the total deductions of odd groupings from the total weights of characters: 9 weightOfPhylogeny(PW-W) :- totalWeightOfChars(PW), totalDeductions(W). We describe the weight constraints, to ensure that the weight of the phylogeny is larger than w, as another ASP program C as follows: :- weightOfPhylogeny(W), W k) then RESOLVE - CONFLICT (Π, A, 5) end if if Current assignment does not yield an answer set then SELECT(Π, A, 5) else return X ← X ∪ A end if end while return X We introduce a search-based method for computing n k-similar (resp. k-diverse) at least (resp. at most) w-weighted solutions in ASP, and implement this method by modifying the search algorithm C LASP -W as in Algorithm 3; the parts in blue denote these modifications. The modified version is called C LASP -NKW. Note that AT LEAST (resp. AT MOST ) w- WEIGHTED SOLUTION can be solved using C LASP -NK of [6] by defining the distance function in terms of the given weight function ω. However, in general, we cannot compute n k-similar (resp. diverse) at least (resp. most) w-weighted solutions in this way. To be able to solve all four variations of n k- SIMILAR (resp. k- DIVERSE) AT LEAST (resp. AT MOST ) w- WEIGHTED SOLUTIONS using C LASP -NK, we need to define four dif- ferent distance functions (rather than a generic distance function) in terms of the given distance function ∆ and the given weight function ω. 7 Experimental Results We applied the computational methods described above (i.e., the representation- based method, and the search-based method) to reconstruct weighted phylogenies for Indo-European languages. We used the dataset assembled by Don Ringe and Ann Taylor [12]. While computing phylogenies, we also took into account some domain-specific information. Let us first consider computing an at least w-weighted phylogeny with at most c incompatible characters (Table 1). Based on the results of [1], we consider c = 16, 17, 18 and w = 45. For each problem, for each method, we present the computation time, the size of the ground program, and the size of the mem- 11 ory used in computation.3 For instance, let us consider computing a phylogeny with at most 17 incompatible characters, and whose weight is at least 45. With the representation-method, C LASP takes 15.32 CPU sec.s to compute such a phy- logeny; the ground program has 79229 atoms and 1585419 rules; the computation of the phylogeny consumes 369 MB of memory. On the other hand, with the search-based method, C LASP -W takes 1.30 CPU sec.s to compute such a phy- logeny; the ground program has 3744 atoms and 55219 rules; the computation of the phylogeny consumes 22 MB of memory. Observe in Table 1 that in terms of both computation time and the memory used, the search-based method performs better than the representation-method. These results conform with our expectations. The representation-based method explicitly defines the weight function, and thus the program/memory size is larger. The search-based method deals with the time consuming computation of weights of phylogenies, not at the representation level but at the search level: It does not require an ASP representation of the weight function but requires a modification of the solver to guide it find a plausible phylogeny. Therefore, in the search-based methods, the program/memory size is smaller as well as the computation time. Now let us examine the results of experiments, while computing all at least w- weighted phylogenies with at most c incompatible characters (Table 2): In terms of both computation time and the memory used, the search-based method performs better than the representation-method. For instance, let us consider the problem of computing all phylogenies with at most 17 incompatible characters, and whose weight is at least 45. With the representation-method, C LASP takes 126 CPU sec.s to compute all 8 phylogenies; whereas, with the search-based method, C LASP -W takes 17 CPU sec.s to compute them. In [1], after computing all 45 phylogenies, the authors examine them manually and identify 34 of them as plausible (1 with 16 incompatible characters, 7 with 17 incompatible characters, 6 with 18 incompatible characters, 3 with 19 incom- patible characters, 17 with 20 incompatible characters). With the weight measure defined in Section 5, and the representation/search-based methods described above for computing weighted phylogenies, we could automatically compute all plausible phylogenies with at most 18 incompatible characters. We also applied our search-based method for computing similar/diverse phy- logenies for Indo-European languages, using C LASP -NKW. We considered the nodal distance (as in [6]) to measure the similarity/diversity of a set of phyloge- nies. For instance, we computed 2 12-similar and at least 45-weighted phylogenies using C LASP -NKW; the computation took 2.3 CPU sec.s. Likewise, we computed 2 24-diverse and at least 45-weighted phylogenies using C LASP -NKW; the com- putation took 2.7 CPU sec.s. In [1], after examining all 34 plausible phylogenies manually, the authors come up with three forms of tree structures, and then “filter” the phylogenies accordingly. The results of our experiments comply with these 3 All CPU times are in seconds, for a workstation with a 1.5GHz Xeon processor and 4x512MB RAM, running Red Hat Enterprise Linux (Version 4.3). 12 Table 1: The representation-based method vs. the search-based method: computing an at least w-weighted phylogeny with at most c incompatible characters. # of incompatibleweightmethod time program size memory size characters (c) (w) (CPU sec.s) (MB) 16 45 representation-based15.52 # of atoms: 79229 369 # of rules: 1585419 search-based 1.34 # of atoms: 3744 22 # of rules: 55219 17 45 representation-based15.32 # of atoms: 79229 369 # of rules: 1585419 search-based 1.30 # of atoms: 3744 22 # of rules: 55219 18 45 representation-based15.47 # of atoms: 79229 369 # of rules: 1585419 search-based 1.10 # of atoms: 3744 22 # of rules: 55219 Table 2: The representation-based method vs. the search-based method: computing all at least w-weighted phylogenies with at most c incompatible characters. # of incompatibleweight# of phylogeniesmethod time memory size characters (c) (w) (CPU sec.s)(MB) 16 45 1 representation-based14.49 369 search-based 3.67 22 17 45 8 representation-based126.06 369 search-based 17 22 18 45 14 representation-based210.32 369 search-based 22.35 22 groupings: the similar phylogenies computed by C LASP -NKW are in the same group; the diverse phylogenies are in different groups. 8 Discussion We studied the problem of computing weighted solutions in ASP, where the weight of a solution is defined explicitly. We introduced two methods to solve this prob- lem. One method is based on the idea of modifying the representation of the prob- lem so that we can use an existing ASP solver to compute weighted solutions. In this method, the weight measure and the weight constraints are encoded as ASP programs, and the main program is modified in an elaboration tolerant way by sim- ply adding these programs. The other method is based on the idea of modifying the search algorithm of an ASP solver so that we can compute weighted solutions without modifying the main program. In this method, the weight of a solution is 13 implemented as a C++ program (in a separate file), and the search algorithm of the ASP solver C LASP is modified to compute weighted solutions with respect to this program. The modified version of C LASP is called C LASP -W. We compared the search-based method with the representation-based method in reconstructing weighted phylogenies for Indo-European languages. We observed that C LASP -W performs better in terms of computation time and space. One benefit of the search-based method (other than computational efficiency) is that it allows implementing complex numerical weight measures which cannot be easily encoded in ASP. On the other hand, to take advantage of computational efficiency of this method, one has to devise an admissible heuristic function for the weight measure. Though, in some cases (e.g., when it is easier to represent the weight measure in ASP) it might be desirable to use the representation-based method. For some problems, we do not have to define the weight of a solution explicitly, and we can use aggregates (e.g., sum, count, times) to compute the weight, in the sense of [14, 8, 15]. For such problems, other approaches assign weights to weak constraints [7, 2] or some rules [11] and then find the answer set that minimizes the sum of the weights of the violated weak constraints or violated rules. However, these representation-based methods are not general enough to handle more com- plicated weight functions that requires further new definitions, like that of a phy- logeny above where we need to define total deductions relative to domain-specific information. Based on the promising experimental results above, we extended our search- based method to be able to compare weighted solutions; for that we modified C LASP -W in the spirit of [6]. The modified version of C LASP -W is called C LASP - NKW. By this way, given a weight measure ω for a phylogeny and a distance mea- sure ∆ for a set of phylogenies (both as C++ programs, in separate files), we can compute similar/diverse weighted phylogenies. Our methods for computing (similar/diverse) weighted solutions, and the solvers C LASP -W and C LASP -NKW provide useful tools for analyzing/comparing solu- tions both in ASP and in phylogenetics. In ASP, there are many appealing appli- cations (e.g., product configuration, planning) for which finding weighted and/or similar/diverse solutions could be useful. In phylogenetics, there is no software system that has utilities for analying/comparing phylogenies; in that sense, our search-based methods (including the weight/distance measures) allow experts to automatically analyze phylogenies. Acknowledgments This work has been supported by TUBITAK Grant 107E229. References [1] Daniel R. Brooks, Esra Erdem, Selim T. Erdogan, James W. Minett, and Don- ald Ringe. Inferring phylogenetic trees using answer set programming. JAR, 14 39(4):471–511, 2007. [2] Francesco Buccafurri, Nicola Leone, and Pasquale Rullo. Strong and weak constraints in disjunctive datalog. In Proc. of LPNMR, pages 2–17, 1997. [3] Duygu Cakmak, Esra Erdem, and Halit Erdogan. Computing weighted solu- tions in answer set programming. In Proc. of LPNMR, pages 416–422, 2009. [4] M. Davis, G. Logemann, and D. Loveland. A machine program for theorem- proving. Communications of the ACM, 5:394–397, 1962. [5] William H. E. Day and David Sankoff. Computational complexity of inferring phylogenies by compatibility. Systematic Zoology, 35(2):224–229, 1986. [6] Thomas Eiter, Esra Erdem, Halit Erdogan, and Michael Fink. Finding similar or diverse solutions in answer set programming. In Proc. of ICLP, 2009. [7] Thomas Eiter, Wolfgang Faber, Nicola Leone, and Gerald Pfeifer. Computing preferred and weakly preferred answer sets by meta interpretation in answer set programming. In Proc. of ASP Workshop, 2001. [8] Wolfgang Faber, Gerald Pfeifer, Nicola Leone, Tina Dell’Armi, and Giuseppe Ielpa. Design and implementation of aggregate functions in the dlv system. TPLP, 8(5-6):545–580, 2008. [9] Martin Gebser, Benjamin Kaufmann, Andr Neumann, and Torsten Schaub. Conflict-driven answer set solving. In Proc. of IJCAI, pages 386–392, 2007. [10] Vladimir Lifschitz. What is answer set programming? In Proc. of AAAI, 2008. [11] Emilia Oikarinen and Matti Järvisalo. Max-ASP: Maximum satisfiability of answer set programs. In Proc. of LPNMR, pages 236–249, 2009. [12] Donald Ringe, Tandy Warnow, and Ann Taylor. Indo-European and compu- tational cladistics. Transactions of the Philological Society, 100(1):59–129, 2002. [13] Patrik Simons, Ilkka Niemelä, and Timo Soininen. Extending and implement- ing the stable model semantics. Artificial Intelligence, 138:181–234, 2002. [14] Patrik Simons and Timo Soininen. Stable model semantics of weight con- straint rules. In Proc. of LPNMR, pages 317–331, 1999. [15] Tran Cao Son and Enrico Pontelli. A constructive semantic characterization of aggregates in answer set programming. TPLP, 7(3):355–375, 2007. [16] J.P White and J.F. O’Connell. A Prehistory of Australia, New Guinea, and Sahul. Academic, San Diego, CA, 1982. 15