Hardness of MSA with Selective Column Scoring? Andrea Caucchiolo and Ferdinando Cicalese1[0000−0003−1652−0599] Department of Computer Science, University of Verona, Italy {andrea.caucchiolo, ferdinando.cicalese}@univr.it Abstract. Multiple Sequence Alignment (MSA for short) is a well known problem in the field of computational biology. In order to evaluate the quality of a solution, many different scoring functions have been intro- duced, the most widely used being the Sum-of-pairs score (SP-score). It is known that computing the best MSA under the SP-score measure is NP-hard. In this paper, we introduce a variant of the Column score (defined in Thompson et al. 1999), which we refer to as Selective Column score: Given a symbol a ∈ Σ, the score of the i-th column is one if and only if all symbols of the same column are a, and otherwise zero. The a- column score of an alignment is then the number of columns made of only character a. We show that finding the optimal MSA under the Selective Column Score is NP-hard for all alphabets of size |Σ| ≥ 2 by reducing from MIN-2-SAT. Keywords: Multiple Sequence Alignment · Column score · NP-completeness 1 Introduction Alignments are fundamental methods for the comparison of biological sequences. They are extensively employed for assessing similarity among sequences as well as in subprocedures of more complex operations, like the computation of ap- proximate overlaps in the context of sequencing. Definition 1 (Multiple alignment). Let s1 , . . . , sm , be m strings over some alphabet Σ. Let - 6∈ Σ be a gap symbol and Σ 0 = Σ ∪ {-}. A multiple alignment S̃ of s1 , . . . , sm of length ` is a tuple of strings s̃1 , . . . , s̃m ∈ Σ 0 such that – |s̃1 | = |s̃2 | = · · · |s̃m | = `, – for each i = 1, . . . , m, if we remove from s̃i all and only the gap characters we obtain the string si , – there does not exist a position j where a gap occurs in all s̃i , i.e., for all j = 1, . . . , ` there exists an i ∈ {1, . . . , m}, such that1 s̃i [j] 6= -. ? Copyright c 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0) 1 For a string s and integer i, we denote by s[i] the ith character of s. 2 A. Caucchiolo and F. Cicalese It is useful to visualize the alignment as a matrix, where rows are the strings s̃i . Therefore, for each j = 1, . . . , ` we refer to the sequence s̃1 [j], . . . , s̃m [j] as the jth column of the alingment. Many proposals exist regarding the choice of the most appropriate objective function that should be optimized when computing an alignment of a set of strings [1, 11, 12]. In this paper, we focus on the so called column score analyzed in [12], where the alignment is assessed by counting the number of columns consisting of a single character. Definition 2 (Column Score). Let s1 , . . . , sm be a sequence of strings from some alphabet Σ. Given an alignment S̃ = s̃1 , . . . , s̃m of length ` of s1 , . . . , sm , and a character a ∈ Σ, the a-column score of S̃, denoted csa (S̃) is the number of columns made of only character a, i.e., csa (S̃) = |{j ∈ {1, . . . , `} | ∀i = 1, . . . , m, s̃i [j] = a}|. The column score of S̃, denoted cs(S̃) is the number of columns made of a single character a, i.e., X cs(S̃) = | ∪a∈Σ {j ∈ {1, . . . , `} | ∀i = 1, . . . , m, s̃i [j] = a}| = csa (S̃). a∈Σ In particular we prove the NP-completeness of the following problem. Selective Column Score Alignment (SCS-Align) Input: Strings s1 , . . . , sm , over some alphabet Σ, an integer M ≥ maxi |si |, a non-negative integer κ, and a selected character a ∈ Σ. Question: Is there an aligment S̃ = s̃1 , . . . , s̃m of s̃1 , . . . , s̃m such that the length of S̃ is at most M and the a-column score of S̃ is at least κ? In contrast with other objective functions for multiple sequence alignment, to the best of our knowledge, before our result, nothing was known about the tractability of computing the best possible alignment under the (selective) a- column score. Main Result. In this paper, we show that the Selective Column Score Align- ment problem (and several of its variants) is NP-complete. Related work. It is not hard to see that without the upper bound M on the alignment length, computing an alignment of maximum possible column score is equivalent to finding the longest common subsequence of the input strings, which is known to be an NP-hard problem [9]. However, in contrast to our SCS- Align, the Longest Common Subsequence problem is easily solvable in polynomial time when the output is restricted to subsequences made of a single given character—note that a a-column score alignment asks for a subsequence of Selective Column Score Alignment 3 this type. Another important issue/difference with respect to modelling column score alignment as an instance of LCS is that an alignment obtained starting from a longest common subsequence might have an uncontrollable number of gaps per row. We show that the hardness result on SCS holds for an equivalent variant of the problem where we impose a bound of 1 on the number of gaps per row. In this respect, the length constraint on the solutions of SCS-Align have some similarity with the gapped common subsequences considered in [4, 5]. In these papers, however, the authors consider only the special case of two input strings and do not discuss complexity issues of the general instances. The most commonly considered objective function for optimizing multiple sequence alignments is the so called Sum of Pairs score (SP-score) introduced by Carrillo and Lipman [3]. The SP-score generalizes the edit distance based score for pairwise alignments (equivalently multiple alignments of only two strings). An optimal pairwise alignment can be computed in quadratic time by the Needleman and Wunsh dynamic programming approach [10]. In contrast, computing an optimal SP-score based multiple sequence alignment is known to be NP-hard [13]. Alternative scoring measures, based on the distance to a consensus string, are investigated in Li et al. [8]. In its simplest form, given an alignment a consensus string can be obtained by selecting for each column a majority symbol. The score of the alignment is the sum of the Hamming distance between each input string and the consensus sequence. The authors of [8] investigate several variants of this approach showing that they are in general NP-hard. They also consider a sort of inverse procedure, where a median string of the input string is computed that minimizes the sum of gap-bounded edit distance of each string from the median. Once this string has been found the alignment can be constructed by optimally aligning each input string to the median. Both the above approaches (SP-score and distance from consensus) are based on pairwise comparisons of strings. More precisely, in the SP-score case each pos- sible pairwise comparison of the input strings is computed and the corresponding scores are added up to obtain the final score. In the case of distance to consensus, a median string is computed and the alignment score is defined as the sum of pairwise alignment scores of the input strings to the median. The column score is a basic model of scoring functions that are obtained by evaluating the set of elements in a column, rather than composing evaluations of single pairs of elements in a column. Organization of the paper. In section 2, we prove the NP-completeness of our basic version of the column score, Selective Column Score Alignment, by reducing from MIN-2-SAT. In section 3, we show how the reduction can be extended to variants where there is no selected character, and the bound on the length of the alignment is substituted with a very sharp bound on the number of gaps per row of the alignment. Section 4 concludes the paper with some open questions. 4 A. Caucchiolo and F. Cicalese 2 The NP-completeness of Selective Column Score Alignment In this section, we establish the complexity of the SCS-Align problem defined in the introduction. We provide a reduction from the NP-complete problem MIN-2-SAT [7]. Min-2-Sat Input: A 2-CNF formula φ and an integer k Question: Is there an assignment to the variables of φ that satisfies at most k clauses? The Reduction. Let I = (φ, k) be an instance of MIN-2-Sat. Let x1 , . . . , xn be the variables of φ and C1 , . . . , Cm be the clauses of φ. Then, φ = C1 ∧C2 ∧· · ·∧Cm , where for each i = 1, . . . , m, Ci = (l1i ∨ l2i ), with lji ∈ {x1 , . . . xn , x1 , . . . , xn }. From I, we now build an instance J = ({s0 , . . . , sn+1 }, M, κ, a) for SCS- Align, where each si is a binary string, M is a bound on the alignment length, a is a character in the alphabet Σ = {0, 1}, and κ is the bound on the number of columns of the alignments that are expected to contain only the selected character a. We set M = 8mn+3m+2, a = 0 and build n+2 binary strings: s0 , s1 , . . . , sn , sn+1 where |s0 | = |sn+1 | = M and |si | = M − 1 for each i = 1, . . . , n. More precisely we define: s0 = 1(01)2mn (101)m (10)2mn 1, (1) and for each i = 1, . . . , n, (1) (m) si = 0(00)(i−1)2m (10)2m (00)(n−i)2m ci . . . ci (00)(i−1)2m (10)2m (00)(n−i)2m , (2) where, for each j = 1, . . . , m,  100 if xi ∈ Cj  (j) ci = 010 if xi ∈ Cj (3)  000 else  Finally, we define sn+1 = 0M , (4) and set κ = 2mn + m − k. Selective Column Score Alignment 5 Example 1: Let φ = (x1 ∨ x2 ) ∧ (x2 ∨ x3 ) be a 2-CNF formula with m = 2 and n = 3. The strings s0 , . . . , s4 obtained by the above process are then: s0 : 1(01)4 (01)4 (01)4 101101(10)4 (10)4 (10)4 1 s1 : 0(10)4 (00)4 (00)4 100000(10)4 (00)4 (00)4 s2 : 0(00)4 (10)4 (00)4 100010(00)4 (10)4 (00)4 s3 : 0(00)4 (00)4 (10)4 000010(00)4 (00)4 (10)4 s4 : 0(00)4 (00)4 (00)4 000000(00)4 (00)4 (00)4 0 with s1 , s2 , s3 respectively representing the variables x1 , x2 , x3 . Remark 1. Let S̃ = s̃0 , s̃1 , . . . , s̃n+1 be an alignment of length ` ≤ M of the strings s0 , . . . , sn+1 . The following three facts are easy implied by the structure of the instance produced by our reduction. 1. Since |s0 | = |sn+1 | = M we have that it must be ` = M, hence s̃0 = s0 and s̃n+1 = sn+1 . Moreover, for each i = 1, . . . , n, because of |si | = M − 1 it follows that s̃i contains exactly one gap. We will use gap(i) to denote the position of the single gap in s̃i , i.e., the index j = 1, . . . , M, such that s̃i [j] = -. 2. Following from Fact 1. and since s̃n+1 is made of only zeroes there exist no columns with only 1, and the only columns that can contain only 0 are those corresponding to the positions of the zeroes in s0 . In other words, for the instances produced by our reduction, we have cs0 (S̃) = cs(S̃). 3. For each i = 1, . . . , n and j = 1, . . . , M, we say that s̃i blocks position j if s̃0 [j] = s0 [j] = 0 and s̃i [j] = 1. A column j is made of only zeroes if and only if there is no i = 1, . . . , n such that s̃i blocks position j. For each i = 1, . . . , n, let sTi = si - be the string obtained by adding a gap at the end of si . Let sF i = -si be the string obtained by adding a gap at the beginning of si . Then, we have (1) (m) sTi = 0(00)(i−1)2m (10)2m (00)(n−i)2m ci,T . . . ci,T (00)(i−1)2m (10)2m (00)(n−i)2m -, (1) (m) sF i = -(00) (i−1)2m (01)2m (00)(n−i)2m ci,F . . . ci,F (00)(i−1)2m (01)2m (00)(n−i)2m 0, (j) (j) where, for each j = 1, . . . , m, ci,T is equal to ci (defined in (3)) and  010 if xi ∈ Cj  (j) ci,F = 001 if xi ∈ Cj  000 else  Let us define the head of s0 to be the prefix sh0 = s0 [1] . . . s0 [4mn + 1] = 1(01)2mn . 6 A. Caucchiolo and F. Cicalese Let us define the core of s0 to be the substring sp0 = s0 [4mn + 2] . . . s0 [4mn + 3m + 1] = (101)m . Finally, let us define the tail of s0 to be the suffix st0 = s0 [4mn + 3m + 2] . . . s0 [8mn + 3m + 2] = (10)2mn 1. By a direct comparison with the definition of s0 in (1), we have the following proposition, stating useful properties of strings sF T i , si . Proposition 1. For each i = 1, . . . , n, the string sTi blocks – exactly 2m positions of the head of s0 ; – no position of the tail of s0 ; – the position of the jth zero in the core of s0 if and only if xi ∈ Cj , i.e., if the assignment xi = T rue satisfies clause Cj . Analogously, we have that the string sF i blocks – exactly 2m positions of the tail of s0 ; – no position of the head of s0 ; – the position of the jth zero in the core of s0 if and only if xi ∈ Cj , i.e., if the assignment xi = F alse satisfies clause Cj . Moreover, for each i 6= i0 the strings sTi , sF T F i , si0 , si0 block distinct positions in the head and tail of s0 . The following lemma proves that our reductions maps yes instances of MIN- 2-Sat to yes-instances of SCS-Align. Lemma 1. If there exists an assignment A for φ that satisfies at most k clauses then there exists an alignment S̃ = s̃0 , . . . , s̃n+1 of s0 , . . . , sn+1 such that the length of S̃ is M and there are at least κ = 2mn + m − k columns made only of zeroes. Proof. Given an assignment A that satisfies at most k clauses of φ, we define the alignment S̃ as follows: for each i = 0, . . . , n + 1 we let  si if i = 0 or i = n + 1  s̃i = sTi if A sets xi = T rue (5)   F si if A sets xi = F alse By the definition of sF T i and si and Proposition 1, we have that strings s̃1 , . . . , s̃n block exactly 2mn of the positions in the head and the tail of s0 . Moreover, the position of the jth zero in the core of s0 is blocked by some string s̃i if and only if the truth value of xi in A satisfies Cj . Since A satisfies at most k clauses, we have that at most k positions of the core of s0 are blocked by some s̃i . It total there are 4mn + m positions in s0 corresponding to a zero character and at most 2mn + k are blocked by the strings of the alignment. Hence, there are at least 2mn + m − k positions which are not blocked, equivalently ≥ 2mn + m − k columns made of only zeros. Selective Column Score Alignment 7 Example 2: Using the same CNF formula as Example 1 with the assignment A = {x1 = T rue, x2 = F alse, x3 = F alse}, by Lemma 1 we get this alignment: s˜0 = s0 : 1(01)4 (01)4 (01)4 101101(10)4 (10)4 (10)4 1 s˜1 = sT1 : 0(10)4 (00)4 (00)4 100000(10)4 (00)4 (00)4 - s˜2 = sF 4 4 4 4 4 4 2 : -(00) (01) (00) 010001(00) (01) (00) 0 s˜3 = sF 4 4 4 4 4 4 3 : -(00) (00) (01) 000001(00) (00) (01) 0 s˜4 = s4 : 0(00)4 (00)4 (00)4 000000(00)4 (00)4 (00)4 0 We can see that the strings sT1 , sF F 2 , s3 block a total of 13 columns, and knowing that the assignment A satisfies only one clause we have k = 1, therefore it holds 13 ≥ 2mn + k = 13. Lemma 2. Let κ∗ be the maximum column score achievable by an alignment of length M of the strings s0 , . . . , sn+1 . Then, there exists an alignment S̃ of s0 , . . . , sn+1 , such that 1. the length of S̃ is M, 2. cs(S̃) = κ∗ , 3. for each i = 1, . . . , n, we have gap(i) ∈ {1, M }, where gap(i) denotes the position of the gap in s̃i , i.e., the index j ∈ {1, . . . , M } such that s̃i [j] = -. Proof. Let S̃ be an alignment that, among those satisfying conditions 1. and 2., has the minimum number of strings s̃i violating condition 3. And assume, by contradiction, that such number is greater than 0. Then there exists i ∈ {1, 2, . . . , n} such that the position of the gap in s̃i is not in {1, M }. We are going to show that we can modify s̃i moving the gap in position 1 or M and obtain another alignment satisfying condition 1. and 2. The existence of such an alignment contradicts the minimality of S̃ with respect to condition 3, implying that in fact in S̃ for all i = 1, . . . , n we must have gap(i) ∈ {1, M } as desired. Let ni be the number of positions that s̃i blocks altogether in the tail and the head of s0 . We have     2m if gap(i) < 4m(i − 1) + 2, 2m if gap(i) ≥ 4mi + 2 + 4mn + 3m,     ni = 4m if 4mi + 2 ≤ gap(i) ≤ (i − 1)4m + 1 + 4mn + 3m, x  2m + d 2 e if gap(i) = (i − 1)4m + 1 + x, 1 ≤ x ≤ 4m,     2m + d 4m−x+1 e if gap(i) = 4m(i − 1) + 1 + 4mn + 3m + x, 1 ≤ x ≤ 4m.  2 Note that these numbers refer to positions that can be blocked only by s̃i . Hence, unblocking any of such positions implies a net increase in the column score of the resulting alignment. We apply the following three cases: Case 1. gap(i) ≤ 4mi + 1. In this case s̃i blocks 2m positions in the tail of s0 and possibly some positions in the head of s̃0 . If we modify s̃i by moving the 8 A. Caucchiolo and F. Cicalese gap to the first position (equivalently, we set s̃i = sFi ) we reduce the number of positions blocked in the tail and head of s̃0 to 2m. Moreover, the characters of the new s̃i which are aligned to the core of s̃0 do not change. Then, the new alignment has not decreased the number of columns made of only zeros, i.e., it satisfies conditions 1 and 2 and has one more string satisfying condition 3. Case 2. gap(i) ≥ 4mn + 3m + 4m(i − 1) + 2. In this case s̃i blocks 2m positions in the head of s0 and possibly some positions in the tail of s̃0 . If we modify s̃i by moving the gap to the last position (equivalently, we set s̃i = sTi ) we reduce the number of positions blocked altogether in the tail and head of s̃0 to only 2m. Moreover, the characters of the new s̃i which are aligned to the core of s̃0 do not change. Then, the new alignment has not decreased the number columns made of only zeros, i.e., it satisfies conditions 1 and 2 and has one more string satisfying condition 3. Case 3. 4mi + 2 ≤ gap(i) ≤ 4m(i − 1) + 1 + 4mn + 3m. In this case, s̃i blocks 2m positions in the head of s0 and 2m positions in the tail of s0 . It is possible that the position of the gaps allows to align the central zeros of s̃i (those belonging (j) to the substrings ci ) to the zeroes of the core of s0 . If we modify s̃i by moving the gap to the first position (equivalently, we set s̃i = sF i ) we reduce the number of positions blocked altogether in the tail and head of s̃0 to 2m. Even if in the resulting new alignment the new s̃i blocked all the positions in the core of s0 , which are m, the net gain in the number of columns made of only zeroes would be m. Then, the new alignment has strictly increased the number of columns made of only zeros, i.e., it satisfies conditions 1 and 2 and has one more string satisfying condition 3. We can see that the strings sT1 , sF F 2 , s3 block a total of 13 columns, and knowing that the assignment A satisfies only one clause we have k = 1, therefore it holds 13 ≥ 2mn + k = 13. Lemma 3. If there exists an alignment S̃ of s0 , . . . , sn+1 that has length M and such that cs(S̃) ≥ κ, then there exists an assignment A for φ that satisfies at most k clauses. Proof. By Lemma 2, we can assume w.l.o.g., that in S̃ for each i = 1, . . . , n, we have gap(i) ∈ {1, M }. Equivalently, for each i = 1, . . . , n, we have that s̃i ∈ {sTi , sF i }. We now create the assignment A by setting for each i = 1, . . . , n, ( T rue if s̃i = sTi xi = (6) F alse if s̃i = sF i By the definition of sF T i and si and Proposition 1, we have that strings s̃1 , . . . , s̃n block exactly 2mn of the positions in the head and the tail of s0 . Hence, exactly 2mn columns corresponding to the non-blocked positions in the head and tail of s0 are made of only zeroes. Since cs(S̃) ≥ κ = 2mn + m − k, Selective Column Score Alignment 9 we have that at least m − k columns corresponding to positions in the core of s0 must be made of only zeroes. Equivalently, at most k positions in the core of s0 are blocked by the strings s̃1 , . . . , s̃n . By Proposition 1, we have that the position of the jth zero in the core of s0 is blocked by some string s̃i if and only if the truth value of xi in A satisfies Cj . Since at most k positions of the core of s0 are blocked by some s̃i , it follows that A satisfies at most k clauses. As a result of Lemmas 1 and 3, we have the following. Theorem 1. The problem Selective Column Alignment is NP-complete already over an alphabet of size ≥ 2. 3 Variants and Extensions In this section we extend the NP-completeness of the SCS-Align to show the NP-completeness of other variants or reformulations of the problem that are also of interest. We start by the simple observation that the problem remains NP-complete even if we consider as objective function the column score, rather than the a-column score with respect to a specific character a. Bounded Column Score Alignment (BCS-Align) Input: Strings s1 , . . . , sm over some alphabet Σ, an integer M ≥ maxi |si |, and a non-negative integer κ. Question: Is there an aligment S̃ = s̃1 , . . . , s̃m of s1 , . . . , sm such that the length of S̃ is at most M and the column score of S̃ is at least κ? In fact, due to the presence of the string sn+1 being made of only zeroes as observed in point 2 of Remark 1, the proof in the previous section also shows the result in the following corollary. It is enough to observe that sn+1 forces only columns entirely made of 0 to be accepted. Corollary 1. The Bounded Column Score Alignment problem is NP- complete We can also reformulate the problem by interpreting the bound on the length of the alignment as a bound on the number of gaps. In fact, the following for- mulation is also easily shown to be NP-complete gap-Bounded Column Score Alignment (gap-CS-Align) Input: Strings s1 , . . . , sm over some alphabet Σ, an integer bound g on the number of allowed gaps per string, and a non-negative integer bound κ on the number of uniform columns to be found. Question: Is there an aligment S̃ = s̃1 , . . . , s̃m of s̃1 , . . . , s̃m such that for each i ∈ [m], s̃i contains at most g gaps, and the column score of S̃ is at least κ? 10 A. Caucchiolo and F. Cicalese Corollary 2. The gap-Bounded Column Score Alignment problem is NP- complete Proof. We use exactly the same reduction employed for the NP-completeness of SCS-Align. We show that the particular case of gap-CS-Align with g = 1 is NP-complete. The simple observation needed is that in any alignment S̃ = s̃0 , . . . , s̃n+1 of the strings s0 , . . . , sn+1 no gap can be present in s̃0 , (nor in s̃n+1 ); otherwise, at least 2 gaps should be present in each s̃i (i = 1, . . . , n). Hence, we have that Remark 1 (and in particular point 1.) holds also on the basis of the bound on the number of gaps. 4 Final Remarks We proved that computing the best multiple sequence alignment with respect to the column score is NP-hard even in a very restricted model where: strings are binary, we only try to optimize the number of columns of the alignment made of only zeroes, and we bound the number of gaps per row to 1. It would be interesting to better understand the relationships of this problem to parameterized variants of the Longest Common Supersequence (LCS) problem [4, 5, 2, 6], since LCS is the natural model of column score based multiple sequence alignment in the absence of any bounds on the number of gaps or restrictions on the uniform columns. Another natural open problem regards the limits of approximability of our problems. References 1. Berkemer, S.J., Höner zu Siederdissen, C., Stadler, P.F.: Compositional properties of alignments. Mathematics in Computer Science (2020) 2. Blin, G., Bulteau, L., Jiang, M., Tejada, P.J., Vialette, S.: Hardness of longest common subsequence for sequences with bounded run-lengths. In: Kärkkäinen, J., Stoye, J. (eds.) Combinatorial Pattern Matching. pp. 138–148. Springer Berlin Heidelberg, Berlin, Heidelberg (2012) 3. Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology. SIAM Journal on Applied Mathematics 48(5), 1073–1082 (1988) 4. Iliopoulos, C.S., Kubica, M., Rahman, M.S., Waleń, T.: Algorithms for comput- ing the longest parameterized common subsequence. In: Ma, B., Zhang, K. (eds.) Combinatorial Pattern Matching. pp. 265–273. Springer Berlin Heidelberg, Berlin, Heidelberg (2007) 5. Iliopoulos, C.S., Sohel Rahman, M.: Algorithms for computing variants of the longest common subsequence problem. Theoretical Computer Science 395(2), 255– 267 (2008) 6. Keller, O., Kopelowitz, T., Lewenstein, M.: On the longest common parameterized subsequence. Theoretical Computer Science 410(51), 5347–5353 (2009), combina- torial Pattern Matching 7. Kohli, R., Krishnamurti, R., Mirchandani, P.: The minimum satisfiability problem. SIAM Journal on Discrete Mathematics 7(2), 275–283 (1994) Selective Column Score Alignment 11 8. Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. Journal of Computer and System Sciences 65(1), 73–96 (2002) 9. Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322?336 (1978) 10. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for sim- ilarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48(3), 443–453 (1970) 11. Notredame, C.: Recent progress in multiple sequence alignment: a survey. Phar- macogenomics 3(1), 131–144 (January 2002) 12. Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27(13), 2682–2690 (07 1999) 13. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal of Computational Biology 1(4), 337–348 (1994)