<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Maximum Letter-Duplicated Subsequence (short paper)⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riccardo Dondi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehdi Hosseinzadeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandru Popa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università degli Studi di Bergamo</institution>
          ,
          <addr-line>Bergamo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bucharest</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this contribution we consider Max-LL-DUP, an optimization problem that asks for a letter-duplicated subsequence of an input string that contains the maximum number of letters of the alphabet over which the input string is defined. When each letter has at most four occurrences in the input string, we prove that the problem is APX-hard. Then, we give a linear-time algorithm when each letter has at most three occurrences in the input string.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;String Algorithms</kwd>
        <kwd>LCS</kwd>
        <kwd>APX-hardness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>and in Section 4, we present a polynomial-time algorithm when each letter has at most three
occurrences. We conclude this contribution with some future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Definitions</title>
      <p>Given a string , we denote by || its length. We denote by [] the letter of  in position ,
with 1 ≤  ≤ | |. Given two positions  and , with 1 ≤  ≤  ≤ | |, we denote by [, ] the
substring of  between  and . If  = , then [, ] is the letter []. Given a letter  ∈ Σ , we
denote by , with  ≥ 1, a string consisting of  occurrences of . A subsequence ′ of  is
obtained by removing some letters, possibly none, of . Now, we introduce the definition of
letter-duplicated subsequence.</p>
      <p>Definition 1. Given a string  on alphabet Σ , a subsequence ′ of  is letter-duplicated if
′ = 11 . . .  where  ∈ Σ and  ≥ 2, with 1 ≤  ≤ , and  ̸= +1, with 1 ≤  ≤  − 1.</p>
      <p>A letter-duplicated subsequence of  of length two is called a pair. Consider two pairs,
[][], with 1 ≤  &lt;  ≤ | |, and [][ℎ] with 1 ≤  &lt; ℎ ≤ | |, where [] = [] ̸= [] =
[ℎ]. The two pairs are crossing when, assuming  &lt;  it holds that: 1 ≤  &lt;  &lt;  &lt; ℎ ≤ | |
or 1 ≤  &lt;  &lt; ℎ &lt;  ≤ | |. Note that at most one of two crossing pairs can belong to a
letter-dupplicated subsequence of . Now, we define the problem we are interested into.</p>
      <sec id="sec-2-1">
        <title>Problem 1. Max-LL-DUP</title>
        <p>Input: A string  on alphabet Σ .</p>
        <p>Output: A letter-duplicated subsequence ′ that contains the maximum number of letters in Σ .</p>
        <p>Since the objective function of Max-LL-DUP is the number of letters appearing in a solution
and not the length of the subsequence, we assume in the following that each solution of
Max-LLDUP contains at most one pair for each letter in Σ . We denote by -Max-LL-DUP, 1 ≤  ≤ | |,
the restriction of Max-LL-DUP where each letter appears at most  times in the input sequence.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. APX-Hardness of 4-Max-LL-DUP</title>
      <p>In this section, we show that 4-Max-LL-DUP is APX-hard. We prove the result by giving a
reduction from the Maximum Independent Set on cubic graphs (3-MIS). Recall that in a cubic
graph each vertex has degree 3 and that, given a cubic graph  = (, ), 3-MIS asks for a
subset  ′ ⊆  of maximum cardinality such that no two vertices in  ′ are adjacent.</p>
      <p>Given an instance  = (, ) of 3-MIS we define a corresponding instance of 4-Max-LL-DUP
as follows. We start by defining the alphabet Σ and the string . The alphabet Σ is defined as
follows:
Σ =</p>
      <p>{,  :  ∈ , 1 ≤  ≤ |  |} ∪ {, , , : {,  } ∈ , 1 ≤ ,  ≤ |  |}.</p>
      <p>Given a vertex  ∈  , the letters in {, , ,, ,, , ∈ Σ :  ∈  } are called the letters
related to . Now,  consists of diferent substrings. For each , 1 ≤  ≤ |  |,  contains the
following substrings 1(), 2():</p>
      <p>1() =  2() = , , ,,,,.</p>
      <p>For each {,  } ∈ , with 1 ≤  &lt;  ≤ |  |, the following substring (, ) is defined:
(, ) = , ,, ,.</p>
      <p>The input string  is obtained by concatenating first the substrings 1(), 1 ≤  ≤ |  |, in
lexicographic order, then 2(), 1 ≤  ≤ |  |, in lexicographic order, then (,  ), where
{,  } ∈  and 1 ≤  &lt;  ≤ |  |. First, we show that  is indeed an instance of
4-Max-LLDUP.</p>
      <p>Lemma 1. Let  be an instance of 3-MIS and  be a corresponding string built by the reduction,
then each letter of Σ has at most four occurrences in .</p>
      <p>Proof. We show that each letter in Σ has at most four occurrences in . Letter , 1 ≤  ≤ |  |,
has two occurrences in , since it appears only in substring 1(). Letter , 1 ≤  ≤ |  |,
has four occurrences in , since it appears (twice) only in substrings 1(), 2(). Letter , ,
1 ≤ ,  ≤ |  |, has four occurrences in , since it appears only in substrings 2() (twice) and
(, ) (twice).</p>
      <p>Now, we prove that starting from an independent set of , we can compute in polynomial
time a solution of 4-Max-LL-DUP on instance .</p>
      <p>Lemma 2. Let  be an instance of 3-MIS and  be the corresponding instance of 4-Max-LL-DUP.
Given an independent set  ′ of , we can compute in polynomial time a solution of 4-Max-LL-DUP
on instance  that contains 5| ′| + 4| ∖  ′| pairs.</p>
      <p>Proof. Given an independent set  ′ of , define a solution ′ of 4-Max-LL-DUP as follows:
- For each  ∈  ′, add to ′ pair  in 1(), and  in 2(); moreover, for each  (assume
w.l.o.g  &lt; ), add to ′ the pair , , , , in (, ). Hence five pairs of letters related to  are
in ′ (recall that  is a cubic graph).
- For each  ∈  ∖  ′, add to ′ the pair  in 1(), and the pairs , , , , ,, , ,, ,
in 2(), where {,  }, {, ℎ}, {, } are the edges of  incident in . Hence in this case
four pairs of letters related to  are in ′.
′ contains 5| ′| + 4| ∖  ′| pairs of letters, thus concluding the proof.</p>
      <p>Lemma 3. Let  be an instance of 3-MIS and  be a corresponding instance of 4-Max-LL-DUP.
Given a solution of 4-Max-LL-DUP on instance  that contains 5 + 4(| | − ) pairs, we can
compute in polynomial time an independent set  ′ of  with | ′| ≥ .</p>
      <p>Proof. Consider a solution ′ of 4-Max-LL-DUP that contains 5 + 4(| | − ) pairs. First, we
notice that there exists a set  ′ ⊆  of vertices in  such that | ′| ≥  and such that, for each
 ∈  ′, ′ contains all the five pairs of letters related to , that is ,  and for each , with
{,  } ∈  (assume w.l.o.g  &lt; ), , , , . Indeed, assume aiming at a contradiction that this
is not the case. Since an instance of 4-Max-LL-DUP contains 5 pairs of letters related to each
vertex  ∈  , then ′ can contain at most 5( − 1) + 4(| | −  + 1) &lt; 5 + 4(| | − ) pairs.
Consider now  ∈  ′. Then ′ contains 5 pairs of letters related to . We claim that for each
{,  } ∈  (we assume w.l.o.g.  &lt; ), ′ contains one pair , , in each substring (, ).
Assume this is not the case, then ′ must contain a pair , , in a substring of  diferent from
(, ), that is in 2(). Thus ′ cannot contain pair  in 2(), since it is crossing with
pair , , . Since ′ can contain at most one of the crossing pairs ,  of 1(), it follows
that ′ contains at most 4 pairs of letters related to , which contradicts the assumption that
 ∈  ′.</p>
      <p>Consider two vertices ,  ∈  ′, with 1 ≤  &lt;  ≤ |  |. Since ′ contains 5 pairs of letters
related to each of , , then for  ∈ {, } ′ must contain a pair , , in each (, ). Then
{, } ∈/ , otherwise the pairs ,, and ,, would be crossing in (, ) and could
not be both in ′. Thus no edge {, } ∈  and  ′ is an independent set of  of size at least
.</p>
      <sec id="sec-3-1">
        <title>We can conclude with the main result of this section.</title>
        <p>Theorem 1. 4-Max-LL-DUP is APX-hard.</p>
        <p>Proof. Due to the results of Lemma 2 and in Lemma 3, we have described an -reduction from
3-MIS to 4-Max-LL-DUP (see [5] for details on L-reductions). Indeed, an optimal solution of
3-MIS, denoted by OPT(3-MIS) contains at least |4 | vertices (a greedy algorithm that selects a
vertex , adds it to the independent set and remove  and its neighbors, computes an
independent set of size at least |4 | ). Hence, denoted by OPT(4-Max-LL-DUP) an optimal solution of
4-Max-LL-DUP, by Lemma 2 there exists a constant  such that:</p>
        <p>(4-Max-LL-DUP) ≤    (3-MIS).</p>
        <p>Moreover, given an approximated solution of 4-Max-LL-DUP on instance  of value
APX(4-Max-LL-DUP) by Lemma 3 we can compute in polynomial time an approximated
solution of 3-MIS on instance  containing APX(3-MIS) vertices, such that there exists a constant 
and it holds that</p>
        <p>(3-MIS) −  (3-MIS) ≤  (  (4-Max-LL-DUP) −  (4-Max-LL-DUP)).
It follows that have designed an L-reduction from 3-MIS to 4-Max-LL-DUP. Since 3-MIS is
APX-hard [6], then also 4-Max-LL-DUP is APX-hard.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. A Linear-Time Algorithm for 3-Max-LL-DUP</title>
      <p>We now show that 3-Max-LL-DUP is solvable in linear time. We present how to compute the
value of an optimal solution of 3-Max-LL-DUP, the algorithm can be easily adapted to compute
the actual solution.</p>
      <p>Consider a function [], with 0 ≤  ≤ | |, that represents the length of an optimal solution
of 3-Max-LL-DUP on instance [1, ]. Function [] can be computed with the following
recurrence. For  ∈ 0, 1, [] = 0; for  ≥ 2, we have that:</p>
      <p>⎧ [ − 1]
[] = max ⎨ [ − 1] + 1
⎩
where [] =  and  is the rightmost
occurrence of  in [1,  − 1].
(1)</p>
      <sec id="sec-4-1">
        <title>We start by considering the correctness of the recurrence.</title>
        <p>Lemma 4. There exists a solution of 3-Max-LL-DUP on [1, ] consisting of ℎ pairs if and only if
[] = ℎ.</p>
        <p>
          Proof. We prove the lemma by induction. In the base case, for  = 0, 1, there is no pair in [
          <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
          ]
and by definition, [0] = 0 and [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] = 0. Assume that the lemma holds for a string [1, ],
with  &lt; , we show that it holds for . Consider a solution ′ of 3-Max-LL-DUP on instance
[1, ] of size ℎ. If ′ does not include [], then ′ is a solution of 3-Max-LL-DUP on instance
[1,  − 1] of size ℎ and by induction hypothesis [ − 1] ≥ ℎ, thus [] ≥ ℎ. Assume that ′
includes [], then it must include also another occurrence of  = [] in [1,  − 1] we can
assume that it is []. Thus ′ contains ℎ − 1 pairs of [1,  − 1], thus by induction hypothesis
[ − 1] ≥ ℎ − 1 and, by Recurrence 1, [] ≥ ℎ. Note indeed that, since  contains at most
three occurrences of each letter, then no solution of 3-Max-LL-DUP on instance [1,  − 1]
contains .
        </p>
        <p>Assume that [] = ℎ, as computed by Recurrence 1. If [] = [ − 1] = ℎ, then by
induction hypothesis there exists a solution ′ of 3-Max-LL-DUP on instance [1,  − 1], hence
also on [1, ], of size at least ℎ. Assume that [] = [ − 1] + 1, then [ − 1] = ℎ − 1.
It follows by induction hypothesis that there exists a solution ′′ of 3-Max-LL-DUP on instance
[1,  − 1] of size at least ℎ − 1. By adding the pair ([], []) to ′′ we obtain a solution of
3-Max-LL-DUP on instance [1, ] of size at least ℎ.</p>
        <p>Next, we show that  can be computed in linear time. Notice that  contains a linear
number of entries. Each [] can be updated in constant time as described in the following.
First, assume that we have computed in linear time an array  of size |Σ | and indexed by Σ ,
that for each  ∈ Σ stores the positions of the two leftmost occurrences of  (recall that each
letter has at most three occurrences in ; note that we can assume that each letter of Σ has at
least two occurrences in , otherwise we can delete it from ). Array  can be computed in
linear time, scanning  from left to right. Using array , given  = [], we can compute in
constant time the value of , since it is the maximum value smaller than  contained in the
entry of  associated with . Then [] can be computed in constant time, since we have to
consider two values, namely [ − 1], that can be computed in constant time, and [ − 1],
that can be computed in constant time using array . Finally, the length of an optimal solution
of 3-Max-LL-DUP on instance  is stored in [||].</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We have introduced Max-LL-DUP, a problem that asks for a letter-duplicated subsequence of
an input string that contains the maximum number of letters of the alphabet. An interesting
future direction is to further study the approximation complexity of the problem, in particular,
the design of constant-factor approximation algorithms.
Mol. Biol. 16 (2021) 11. URL: https://doi.org/10.1186/s13015-021-00191-8. doi:10.1186/
S13015-021-00191-8.
[2] R. Dondi, F. Sikora, The longest run subsequence problem: Further complexity results,
in: P. Gawrychowski, T. Starikovskaya (Eds.), 32nd Annual Symposium on Combinatorial
Pattern Matching, CPM 2021, July 5-7, 2021, Wrocław, Poland, volume 191 of LIPIcs, Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2021, pp. 14:1–14:15. URL: https://doi.org/10.
4230/LIPIcs.CPM.2021.14. doi:10.4230/LIPICS.CPM.2021.14.
[3] W. Lai, A. Liyanage, B. Zhu, P. Zou, Beyond the longest letter-duplicated subsequence
problem, in: H. Bannai, J. Holub (Eds.), 33rd Annual Symposium on Combinatorial Pattern
Matching, CPM 2022, June 27-29, 2022, Prague, Czech Republic, volume 223 of LIPIcs, Schloss
Dagstuhl - Leibniz-Zentrum für Informatik, 2022, pp. 7:1–7:12. doi:10.4230/LIPIcs.CPM.
2022.7.
[4] M. Lafond, W. Lai, A. Liyanage, B. Zhu, The longest subsequence-repeated subsequence
problem, in: W. Wu, J. Guo (Eds.), Combinatorial Optimization and Applications - 17th
International Conference, COCOA 2023, Hawaii, HI, USA, December 15-17, 2023, Proceedings,
Part I, volume 14461 of Lecture Notes in Computer Science, Springer, 2023, pp. 446–458. URL:
https://doi.org/10.1007/978-3-031-49611-0_32. doi:10.1007/978-3-031-49611-0\_32.
[5] D. P. Williamson, D. B. Shmoys, The Design of Approximation Algorithms, Cambridge
University Press, 2011. URL: http://www.cambridge.org/de/knowledge/isbn/item5759340/
?site_locale=de_DE.
[6] P. Alimonti, V. Kann, Some apx-completeness results for cubic graphs, Theor. Comput.</p>
      <p>Sci. 237 (2000) 123–134. URL: https://doi.org/10.1016/S0304-3975(98)00158-3. doi:10.1016/
S0304-3975(98)00158-3.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schrinner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wulfert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Spohr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schneeberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Klau</surname>
          </string-name>
          ,
          <article-title>Using the longest run subsequence problem within homology-based scafolding</article-title>
          , Algorithms
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>