=Paper= {{Paper |id=Vol-3072/paper27 |storemode=property |title=Hardness of MSA with Selective Column Scoring |pdfUrl=https://ceur-ws.org/Vol-3072/paper27.pdf |volume=Vol-3072 |authors=Andrea Caucchiolo,Ferdinando Cicalese |dblpUrl=https://dblp.org/rec/conf/ictcs/CaucchioloC21 }} ==Hardness of MSA with Selective Column Scoring== https://ceur-ws.org/Vol-3072/paper27.pdf
       Hardness of MSA with Selective Column
                     Scoring?

       Andrea Caucchiolo and Ferdinando Cicalese1[0000−0003−1652−0599]

            Department of Computer Science, University of Verona, Italy
             {andrea.caucchiolo, ferdinando.cicalese}@univr.it



       Abstract. Multiple Sequence Alignment (MSA for short) is a well known
       problem in the field of computational biology. In order to evaluate the
       quality of a solution, many different scoring functions have been intro-
       duced, the most widely used being the Sum-of-pairs score (SP-score). It
       is known that computing the best MSA under the SP-score measure is
       NP-hard.
       In this paper, we introduce a variant of the Column score (defined in
       Thompson et al. 1999), which we refer to as Selective Column score:
       Given a symbol a ∈ Σ, the score of the i-th column is one if and only
       if all symbols of the same column are a, and otherwise zero. The a-
       column score of an alignment is then the number of columns made of
       only character a.
       We show that finding the optimal MSA under the Selective Column Score
       is NP-hard for all alphabets of size |Σ| ≥ 2 by reducing from MIN-2-SAT.

       Keywords: Multiple Sequence Alignment · Column score · NP-completeness


1    Introduction

Alignments are fundamental methods for the comparison of biological sequences.
They are extensively employed for assessing similarity among sequences as well
as in subprocedures of more complex operations, like the computation of ap-
proximate overlaps in the context of sequencing.

Definition 1 (Multiple alignment). Let s1 , . . . , sm , be m strings over some
alphabet Σ. Let - 6∈ Σ be a gap symbol and Σ 0 = Σ ∪ {-}. A multiple alignment
S̃ of s1 , . . . , sm of length ` is a tuple of strings s̃1 , . . . , s̃m ∈ Σ 0 such that

 – |s̃1 | = |s̃2 | = · · · |s̃m | = `,
 – for each i = 1, . . . , m, if we remove from s̃i all and only the gap characters
   we obtain the string si ,
 – there does not exist a position j where a gap occurs in all s̃i , i.e., for all
   j = 1, . . . , ` there exists an i ∈ {1, . . . , m}, such that1 s̃i [j] 6= -.
?
  Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0)
1
  For a string s and integer i, we denote by s[i] the ith character of s.
2        A. Caucchiolo and F. Cicalese

      It is useful to visualize the alignment as a matrix, where rows are the strings
s̃i . Therefore, for each j = 1, . . . , ` we refer to the sequence s̃1 [j], . . . , s̃m [j] as
the jth column of the alingment.

    Many proposals exist regarding the choice of the most appropriate objective
function that should be optimized when computing an alignment of a set of
strings [1, 11, 12].
    In this paper, we focus on the so called column score analyzed in [12], where
the alignment is assessed by counting the number of columns consisting of a
single character.

Definition 2 (Column Score). Let s1 , . . . , sm be a sequence of strings from
some alphabet Σ. Given an alignment S̃ = s̃1 , . . . , s̃m of length ` of s1 , . . . , sm ,
and a character a ∈ Σ, the a-column score of S̃, denoted csa (S̃) is the number
of columns made of only character a, i.e.,

                csa (S̃) = |{j ∈ {1, . . . , `} | ∀i = 1, . . . , m, s̃i [j] = a}|.

   The column score of S̃, denoted cs(S̃) is the number of columns made of a
single character a, i.e.,
                                                                             X
    cs(S̃) = | ∪a∈Σ {j ∈ {1, . . . , `} | ∀i = 1, . . . , m, s̃i [j] = a}| =   csa (S̃).
                                                                                a∈Σ

    In particular we prove the NP-completeness of the following problem.
    Selective Column Score Alignment (SCS-Align)
    Input:    Strings s1 , . . . , sm , over some alphabet Σ, an integer M ≥
              maxi |si |, a non-negative integer κ, and a selected character
              a ∈ Σ.
    Question: Is there an aligment S̃ = s̃1 , . . . , s̃m of s̃1 , . . . , s̃m such that the
              length of S̃ is at most M and the a-column score of S̃ is at least
              κ?
    In contrast with other objective functions for multiple sequence alignment,
to the best of our knowledge, before our result, nothing was known about the
tractability of computing the best possible alignment under the (selective) a-
column score.
Main Result. In this paper, we show that the Selective Column Score Align-
ment problem (and several of its variants) is NP-complete.
Related work. It is not hard to see that without the upper bound M on the
alignment length, computing an alignment of maximum possible column score
is equivalent to finding the longest common subsequence of the input strings,
which is known to be an NP-hard problem [9]. However, in contrast to our SCS-
Align, the Longest Common Subsequence problem is easily solvable in
polynomial time when the output is restricted to subsequences made of a single
given character—note that a a-column score alignment asks for a subsequence of
                                          Selective Column Score Alignment        3

this type. Another important issue/difference with respect to modelling column
score alignment as an instance of LCS is that an alignment obtained starting
from a longest common subsequence might have an uncontrollable number of
gaps per row. We show that the hardness result on SCS holds for an equivalent
variant of the problem where we impose a bound of 1 on the number of gaps per
row. In this respect, the length constraint on the solutions of SCS-Align have
some similarity with the gapped common subsequences considered in [4, 5]. In
these papers, however, the authors consider only the special case of two input
strings and do not discuss complexity issues of the general instances.
    The most commonly considered objective function for optimizing multiple
sequence alignments is the so called Sum of Pairs score (SP-score) introduced by
Carrillo and Lipman [3]. The SP-score generalizes the edit distance based score
for pairwise alignments (equivalently multiple alignments of only two strings). An
optimal pairwise alignment can be computed in quadratic time by the Needleman
and Wunsh dynamic programming approach [10]. In contrast, computing an
optimal SP-score based multiple sequence alignment is known to be NP-hard
[13].
    Alternative scoring measures, based on the distance to a consensus string, are
investigated in Li et al. [8]. In its simplest form, given an alignment a consensus
string can be obtained by selecting for each column a majority symbol. The score
of the alignment is the sum of the Hamming distance between each input string
and the consensus sequence. The authors of [8] investigate several variants of
this approach showing that they are in general NP-hard. They also consider a
sort of inverse procedure, where a median string of the input string is computed
that minimizes the sum of gap-bounded edit distance of each string from the
median. Once this string has been found the alignment can be constructed by
optimally aligning each input string to the median.
    Both the above approaches (SP-score and distance from consensus) are based
on pairwise comparisons of strings. More precisely, in the SP-score case each pos-
sible pairwise comparison of the input strings is computed and the corresponding
scores are added up to obtain the final score. In the case of distance to consensus,
a median string is computed and the alignment score is defined as the sum of
pairwise alignment scores of the input strings to the median. The column score
is a basic model of scoring functions that are obtained by evaluating the set
of elements in a column, rather than composing evaluations of single pairs of
elements in a column.


Organization of the paper. In section 2, we prove the NP-completeness of
our basic version of the column score, Selective Column Score Alignment,
by reducing from MIN-2-SAT. In section 3, we show how the reduction can be
extended to variants where there is no selected character, and the bound on the
length of the alignment is substituted with a very sharp bound on the number
of gaps per row of the alignment. Section 4 concludes the paper with some open
questions.
4         A. Caucchiolo and F. Cicalese

2      The NP-completeness of Selective Column Score
       Alignment

In this section, we establish the complexity of the SCS-Align problem defined
in the introduction. We provide a reduction from the NP-complete problem
MIN-2-SAT [7].

     Min-2-Sat
     Input:    A 2-CNF formula φ and an integer k
     Question: Is there an assignment to the variables of φ that satisfies at most
               k clauses?

The Reduction. Let I = (φ, k) be an instance of MIN-2-Sat. Let x1 , . . . , xn be
the variables of φ and C1 , . . . , Cm be the clauses of φ. Then, φ = C1 ∧C2 ∧· · ·∧Cm ,
where for each i = 1, . . . , m, Ci = (l1i ∨ l2i ), with lji ∈ {x1 , . . . xn , x1 , . . . , xn }.
    From I, we now build an instance J = ({s0 , . . . , sn+1 }, M, κ, a) for SCS-
Align, where each si is a binary string, M is a bound on the alignment length,
a is a character in the alphabet Σ = {0, 1}, and κ is the bound on the number
of columns of the alignments that are expected to contain only the selected
character a.

   We set M = 8mn+3m+2, a = 0 and build n+2 binary strings: s0 , s1 , . . . , sn , sn+1
where |s0 | = |sn+1 | = M and |si | = M − 1 for each i = 1, . . . , n.
   More precisely we define:


                              s0 = 1(01)2mn (101)m (10)2mn 1,                                 (1)

and for each i = 1, . . . , n,

                                              (1)      (m)
    si = 0(00)(i−1)2m (10)2m (00)(n−i)2m ci . . . ci         (00)(i−1)2m (10)2m (00)(n−i)2m ,
                                                                                           (2)
where, for each j = 1, . . . , m,
                                      
                                      100 if xi ∈ Cj
                                      
                                  (j)
                                 ci = 010 if xi ∈ Cj                                          (3)
                                      
                                        000 else
                                      


Finally, we define
                                         sn+1 = 0M ,                                          (4)

and set
                                     κ = 2mn + m − k.
                                                Selective Column Score Alignment      5

Example 1: Let φ = (x1 ∨ x2 ) ∧ (x2 ∨ x3 ) be a 2-CNF formula with m = 2 and
n = 3. The strings s0 , . . . , s4 obtained by the above process are then:

                    s0 : 1(01)4 (01)4 (01)4 101101(10)4 (10)4 (10)4 1
                    s1 : 0(10)4 (00)4 (00)4 100000(10)4 (00)4 (00)4
                    s2 : 0(00)4 (10)4 (00)4 100010(00)4 (10)4 (00)4
                    s3 : 0(00)4 (00)4 (10)4 000010(00)4 (00)4 (10)4
                    s4 : 0(00)4 (00)4 (00)4 000000(00)4 (00)4 (00)4 0

with s1 , s2 , s3 respectively representing the variables x1 , x2 , x3 .


Remark 1. Let S̃ = s̃0 , s̃1 , . . . , s̃n+1 be an alignment of length ` ≤ M of the
strings s0 , . . . , sn+1 . The following three facts are easy implied by the structure
of the instance produced by our reduction.
 1. Since |s0 | = |sn+1 | = M we have that it must be ` = M, hence s̃0 = s0
    and s̃n+1 = sn+1 . Moreover, for each i = 1, . . . , n, because of |si | = M − 1
    it follows that s̃i contains exactly one gap. We will use gap(i) to denote
    the position of the single gap in s̃i , i.e., the index j = 1, . . . , M, such that
    s̃i [j] = -.
 2. Following from Fact 1. and since s̃n+1 is made of only zeroes there exist no
    columns with only 1, and the only columns that can contain only 0 are those
    corresponding to the positions of the zeroes in s0 . In other words, for the
    instances produced by our reduction, we have cs0 (S̃) = cs(S̃).
 3. For each i = 1, . . . , n and j = 1, . . . , M, we say that s̃i blocks position j if
    s̃0 [j] = s0 [j] = 0 and s̃i [j] = 1. A column j is made of only zeroes if and
    only if there is no i = 1, . . . , n such that s̃i blocks position j.

    For each i = 1, . . . , n, let sTi = si - be the string obtained by adding a gap
at the end of si . Let sF  i = -si be the string obtained by adding a gap at the
beginning of si . Then, we have
                                           (1)      (m)
sTi = 0(00)(i−1)2m (10)2m (00)(n−i)2m ci,T . . . ci,T (00)(i−1)2m (10)2m (00)(n−i)2m -,
                                          (1)       (m)
sF
 i = -(00)
           (i−1)2m
                   (01)2m (00)(n−i)2m ci,F . . . ci,F (00)(i−1)2m (01)2m (00)(n−i)2m 0,


                                   (j)                (j)
where, for each j = 1, . . . , m, ci,T is equal to ci       (defined in (3)) and
                                     
                                     010 if xi ∈ Cj
                                     
                                (j)
                               ci,F = 001 if xi ∈ Cj
                                     
                                       000 else
                                     

   Let us define the head of s0 to be the prefix sh0 = s0 [1] . . . s0 [4mn + 1] =
1(01)2mn .
6         A. Caucchiolo and F. Cicalese

     Let us define the core of s0 to be the substring sp0 = s0 [4mn + 2] . . . s0 [4mn +
3m + 1] = (101)m .
     Finally, let us define the tail of s0 to be the suffix st0 = s0 [4mn + 3m +
2] . . . s0 [8mn + 3m + 2] = (10)2mn 1.
     By a direct comparison with the definition of s0 in (1), we have the following
proposition, stating useful properties of strings sF     T
                                                    i , si .

Proposition 1. For each i = 1, . . . , n, the string sTi blocks

    – exactly 2m positions of the head of s0 ;
    – no position of the tail of s0 ;
    – the position of the jth zero in the core of s0 if and only if xi ∈ Cj , i.e., if
      the assignment xi = T rue satisfies clause Cj .

Analogously, we have that the string sF
                                      i blocks

    – exactly 2m positions of the tail of s0 ;
    – no position of the head of s0 ;
    – the position of the jth zero in the core of s0 if and only if xi ∈ Cj , i.e., if
      the assignment xi = F alse satisfies clause Cj .

Moreover, for each i 6= i0 the strings sTi , sF    T     F
                                              i , si0 , si0 block distinct positions in the
head and tail of s0 .
   The following lemma proves that our reductions maps yes instances of MIN-
2-Sat to yes-instances of SCS-Align.

Lemma 1. If there exists an assignment A for φ that satisfies at most k clauses
then there exists an alignment S̃ = s̃0 , . . . , s̃n+1 of s0 , . . . , sn+1 such that the
length of S̃ is M and there are at least κ = 2mn + m − k columns made only of
zeroes.

Proof. Given an assignment A that satisfies at most k clauses of φ, we define
the alignment S̃ as follows: for each i = 0, . . . , n + 1 we let
                              
                              si if i = 0 or i = n + 1
                              
                         s̃i = sTi if A sets xi = T rue                   (5)
                              
                               F
                                si if A sets xi = F alse

      By the definition of sF           T
                                 i and si and Proposition 1, we have that strings
s̃1 , . . . , s̃n block exactly 2mn of the positions in the head and the tail of s0 .
Moreover, the position of the jth zero in the core of s0 is blocked by some string
s̃i if and only if the truth value of xi in A satisfies Cj . Since A satisfies at most k
clauses, we have that at most k positions of the core of s0 are blocked by some s̃i .
It total there are 4mn + m positions in s0 corresponding to a zero character and
at most 2mn + k are blocked by the strings of the alignment. Hence, there are at
least 2mn + m − k positions which are not blocked, equivalently ≥ 2mn + m − k
columns made of only zeros.
                                            Selective Column Score Alignment            7

Example 2: Using the same CNF formula as Example 1 with the assignment
A = {x1 = T rue, x2 = F alse, x3 = F alse}, by Lemma 1 we get this alignment:

                s˜0 = s0 : 1(01)4 (01)4 (01)4 101101(10)4 (10)4 (10)4 1
                s˜1 = sT1 : 0(10)4 (00)4 (00)4 100000(10)4 (00)4 (00)4 -
                s˜2 = sF        4    4    4          4    4    4
                       2 : -(00) (01) (00) 010001(00) (01) (00) 0

                s˜3 = sF        4    4    4          4    4    4
                       3 : -(00) (00) (01) 000001(00) (00) (01) 0
                s˜4 = s4 : 0(00)4 (00)4 (00)4 000000(00)4 (00)4 (00)4 0

We can see that the strings sT1 , sF    F
                                   2 , s3 block a total of 13 columns, and knowing
that the assignment A satisfies only one clause we have k = 1, therefore it holds
13 ≥ 2mn + k = 13.
Lemma 2. Let κ∗ be the maximum column score achievable by an alignment
of length M of the strings s0 , . . . , sn+1 . Then, there exists an alignment S̃ of
s0 , . . . , sn+1 , such that
1. the length of S̃ is M,
2. cs(S̃) = κ∗ ,
3. for each i = 1, . . . , n, we have gap(i) ∈ {1, M }, where gap(i) denotes the
   position of the gap in s̃i , i.e., the index j ∈ {1, . . . , M } such that s̃i [j] = -.
Proof. Let S̃ be an alignment that, among those satisfying conditions 1. and 2.,
has the minimum number of strings s̃i violating condition 3. And assume, by
contradiction, that such number is greater than 0.
    Then there exists i ∈ {1, 2, . . . , n} such that the position of the gap in s̃i
is not in {1, M }. We are going to show that we can modify s̃i moving the gap
in position 1 or M and obtain another alignment satisfying condition 1. and 2.
The existence of such an alignment contradicts the minimality of S̃ with respect
to condition 3, implying that in fact in S̃ for all i = 1, . . . , n we must have
gap(i) ∈ {1, M } as desired.
    Let ni be the number of positions that s̃i blocks altogether in the tail and
the head of s0 . We have
     
     
     
       2m                 if gap(i) < 4m(i − 1) + 2,
        2m                 if gap(i) ≥ 4mi + 2 + 4mn + 3m,
     
     
     
     
ni = 4m                    if 4mi + 2 ≤ gap(i) ≤ (i − 1)4m + 1 + 4mn + 3m,
                x
     
        2m + d 2 e         if gap(i) = (i − 1)4m + 1 + x, 1 ≤ x ≤ 4m,
     
     
     
     
     2m + d 4m−x+1 e if gap(i) = 4m(i − 1) + 1 + 4mn + 3m + x, 1 ≤ x ≤ 4m.
     
                   2

   Note that these numbers refer to positions that can be blocked only by s̃i .
Hence, unblocking any of such positions implies a net increase in the column
score of the resulting alignment.
   We apply the following three cases:
Case 1. gap(i) ≤ 4mi + 1. In this case s̃i blocks 2m positions in the tail of s0
and possibly some positions in the head of s̃0 . If we modify s̃i by moving the
8       A. Caucchiolo and F. Cicalese

gap to the first position (equivalently, we set s̃i = sFi ) we reduce the number
of positions blocked in the tail and head of s̃0 to 2m. Moreover, the characters
of the new s̃i which are aligned to the core of s̃0 do not change. Then, the new
alignment has not decreased the number of columns made of only zeros, i.e., it
satisfies conditions 1 and 2 and has one more string satisfying condition 3.
Case 2. gap(i) ≥ 4mn + 3m + 4m(i − 1) + 2. In this case s̃i blocks 2m positions
in the head of s0 and possibly some positions in the tail of s̃0 . If we modify s̃i
by moving the gap to the last position (equivalently, we set s̃i = sTi ) we reduce
the number of positions blocked altogether in the tail and head of s̃0 to only
2m. Moreover, the characters of the new s̃i which are aligned to the core of s̃0
do not change. Then, the new alignment has not decreased the number columns
made of only zeros, i.e., it satisfies conditions 1 and 2 and has one more string
satisfying condition 3.
Case 3. 4mi + 2 ≤ gap(i) ≤ 4m(i − 1) + 1 + 4mn + 3m. In this case, s̃i blocks 2m
positions in the head of s0 and 2m positions in the tail of s0 . It is possible that
the position of the gaps allows to align the central zeros of s̃i (those belonging
                    (j)
to the substrings ci ) to the zeroes of the core of s0 . If we modify s̃i by moving
the gap to the first position (equivalently, we set s̃i = sF
                                                           i ) we reduce the number
of positions blocked altogether in the tail and head of s̃0 to 2m. Even if in the
resulting new alignment the new s̃i blocked all the positions in the core of s0 ,
which are m, the net gain in the number of columns made of only zeroes would
be m. Then, the new alignment has strictly increased the number of columns
made of only zeros, i.e., it satisfies conditions 1 and 2 and has one more string
satisfying condition 3.

We can see that the strings sT1 , sF    F
                                   2 , s3 block a total of 13 columns, and knowing
that the assignment A satisfies only one clause we have k = 1, therefore it holds
13 ≥ 2mn + k = 13.

Lemma 3. If there exists an alignment S̃ of s0 , . . . , sn+1 that has length M and
such that cs(S̃) ≥ κ, then there exists an assignment A for φ that satisfies at
most k clauses.

Proof. By Lemma 2, we can assume w.l.o.g., that in S̃ for each i = 1, . . . , n,
we have gap(i) ∈ {1, M }. Equivalently, for each i = 1, . . . , n, we have that
s̃i ∈ {sTi , sF
              i }.
     We now create the assignment A by setting for each i = 1, . . . , n,
                                 (
                                  T rue if s̃i = sTi
                            xi =                                            (6)
                                  F alse if s̃i = sF
                                                   i


      By the definition of sF           T
                                 i and si and Proposition 1, we have that strings
s̃1 , . . . , s̃n block exactly 2mn of the positions in the head and the tail of s0 .
Hence, exactly 2mn columns corresponding to the non-blocked positions in the
head and tail of s0 are made of only zeroes. Since cs(S̃) ≥ κ = 2mn + m − k,
                                              Selective Column Score Alignment             9

we have that at least m − k columns corresponding to positions in the core of
s0 must be made of only zeroes. Equivalently, at most k positions in the core of
s0 are blocked by the strings s̃1 , . . . , s̃n .
    By Proposition 1, we have that the position of the jth zero in the core of s0
is blocked by some string s̃i if and only if the truth value of xi in A satisfies Cj .
Since at most k positions of the core of s0 are blocked by some s̃i , it follows that
A satisfies at most k clauses.

    As a result of Lemmas 1 and 3, we have the following.

Theorem 1. The problem Selective Column Alignment is NP-complete
already over an alphabet of size ≥ 2.


3    Variants and Extensions

In this section we extend the NP-completeness of the SCS-Align to show the
NP-completeness of other variants or reformulations of the problem that are
also of interest. We start by the simple observation that the problem remains
NP-complete even if we consider as objective function the column score, rather
than the a-column score with respect to a specific character a.
    Bounded Column Score Alignment (BCS-Align)
    Input:    Strings s1 , . . . , sm over some alphabet Σ, an integer M ≥
              maxi |si |, and a non-negative integer κ.
    Question: Is there an aligment S̃ = s̃1 , . . . , s̃m of s1 , . . . , sm such that the
              length of S̃ is at most M and the column score of S̃ is at least
              κ?
    In fact, due to the presence of the string sn+1 being made of only zeroes as
observed in point 2 of Remark 1, the proof in the previous section also shows
the result in the following corollary. It is enough to observe that sn+1 forces only
columns entirely made of 0 to be accepted.

Corollary 1. The Bounded Column Score Alignment problem is NP-
complete

    We can also reformulate the problem by interpreting the bound on the length
of the alignment as a bound on the number of gaps. In fact, the following for-
mulation is also easily shown to be NP-complete
    gap-Bounded Column Score Alignment (gap-CS-Align)
    Input:    Strings s1 , . . . , sm over some alphabet Σ, an integer bound g on
              the number of allowed gaps per string, and a non-negative integer
              bound κ on the number of uniform columns to be found.
    Question: Is there an aligment S̃ = s̃1 , . . . , s̃m of s̃1 , . . . , s̃m such that for
              each i ∈ [m], s̃i contains at most g gaps, and the column score
              of S̃ is at least κ?
10       A. Caucchiolo and F. Cicalese

Corollary 2. The gap-Bounded Column Score Alignment problem is NP-
complete

Proof. We use exactly the same reduction employed for the NP-completeness of
SCS-Align. We show that the particular case of gap-CS-Align with g = 1
is NP-complete. The simple observation needed is that in any alignment S̃ =
s̃0 , . . . , s̃n+1 of the strings s0 , . . . , sn+1 no gap can be present in s̃0 , (nor in s̃n+1 );
otherwise, at least 2 gaps should be present in each s̃i (i = 1, . . . , n). Hence, we
have that Remark 1 (and in particular point 1.) holds also on the basis of the
bound on the number of gaps.


4    Final Remarks

We proved that computing the best multiple sequence alignment with respect to
the column score is NP-hard even in a very restricted model where: strings are
binary, we only try to optimize the number of columns of the alignment made
of only zeroes, and we bound the number of gaps per row to 1.
    It would be interesting to better understand the relationships of this problem
to parameterized variants of the Longest Common Supersequence (LCS)
problem [4, 5, 2, 6], since LCS is the natural model of column score based multiple
sequence alignment in the absence of any bounds on the number of gaps or
restrictions on the uniform columns. Another natural open problem regards the
limits of approximability of our problems.


References
 1. Berkemer, S.J., Höner zu Siederdissen, C., Stadler, P.F.: Compositional properties
    of alignments. Mathematics in Computer Science (2020)
 2. Blin, G., Bulteau, L., Jiang, M., Tejada, P.J., Vialette, S.: Hardness of longest
    common subsequence for sequences with bounded run-lengths. In: Kärkkäinen,
    J., Stoye, J. (eds.) Combinatorial Pattern Matching. pp. 138–148. Springer Berlin
    Heidelberg, Berlin, Heidelberg (2012)
 3. Carrillo, H., Lipman, D.: The multiple sequence alignment problem in biology.
    SIAM Journal on Applied Mathematics 48(5), 1073–1082 (1988)
 4. Iliopoulos, C.S., Kubica, M., Rahman, M.S., Waleń, T.: Algorithms for comput-
    ing the longest parameterized common subsequence. In: Ma, B., Zhang, K. (eds.)
    Combinatorial Pattern Matching. pp. 265–273. Springer Berlin Heidelberg, Berlin,
    Heidelberg (2007)
 5. Iliopoulos, C.S., Sohel Rahman, M.: Algorithms for computing variants of the
    longest common subsequence problem. Theoretical Computer Science 395(2), 255–
    267 (2008)
 6. Keller, O., Kopelowitz, T., Lewenstein, M.: On the longest common parameterized
    subsequence. Theoretical Computer Science 410(51), 5347–5353 (2009), combina-
    torial Pattern Matching
 7. Kohli, R., Krishnamurti, R., Mirchandani, P.: The minimum satisfiability problem.
    SIAM Journal on Discrete Mathematics 7(2), 275–283 (1994)
                                           Selective Column Score Alignment        11

 8. Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. Journal of
    Computer and System Sciences 65(1), 73–96 (2002)
 9. Maier, D.: The complexity of some problems on subsequences and supersequences.
    J. ACM 25(2), 322?336 (1978)
10. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for sim-
    ilarities in the amino acid sequence of two proteins. Journal of Molecular Biology
    48(3), 443–453 (1970)
11. Notredame, C.: Recent progress in multiple sequence alignment: a survey. Phar-
    macogenomics 3(1), 131–144 (January 2002)
12. Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple
    sequence alignment programs. Nucleic Acids Research 27(13), 2682–2690 (07 1999)
13. Wang, L., Jiang, T.: On the complexity of multiple sequence alignment. Journal
    of Computational Biology 1(4), 337–348 (1994)