<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An approach to analysis of the similarity ofDNA-sequences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>B F Melnikov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A Trenina</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Russian State Social University</institution>
          ,
          <addr-line>Wilhelm Pieck str. 4, Moscow, Russia, 129226</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Togliatti State University</institution>
          ,
          <addr-line>Belorusskaya str. 14, Togliatti, Russia, 445020</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>63</fpage>
      <lpage>72</lpage>
      <abstract>
        <p>This paper describes algorithms, corresponding computer programs and the results of computations, supplementing results published earlier. We consider the multiplesequence alignment problem, which can be nominated by a central problem in computatiobniaollogy. For it, we continue to consider some di erent versions of so-called “triangular norm” defined on the set of triangles formed by the dierent distance between genomes computed by di erent algorithms. Besides, one of the problems considered in biocybernetics is the problem roefconstructing the distance matrix between DNA sequences, when not all the elements of the matrix under consideration are known at the input of thealgorithm. In this connection, the problem arises that the developed method of comparative evaluation of algorithms for calculating the distances between sequences should be used for another problem, i.e., for reconstructing the matrix of distances between DNA sequences. In this paper, we consider the possibility of applying themethod of comparative evaluation of the algorithms for calculating the distances betweena pair of DNA strings that we developed and studied earlier for the reconstruction ofa partially filled distance matrix. The restoration of the matrix occurs as a result of several computationapl asses. Estimates of unknown matrix elements are averaged in a special way using so-called fruisnkctions, and the result ofthis averaging is considered as the received value of the unknown element.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and motivation</title>
      <p>
        In this paper, we describe the continuation of research on the issues that we started in
[
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ]. We describe algorithms, corresponding computer programs and the results
of computations, supplementing results published earlier. We consider the multiple sequence
alignment problem, which can be nominated by a central problem in computational biology. For
it, we continue to consider some di erent versions of so-called \triangular norm". (The name
\triangular metric", also sometimes encountered in our previous papers, is also quite possible
and does not contain errors, we will not give detailed comments on this thing. However, the
\norm" in our case is some more correct, both when we speak about the whole matrix, and when
we speak only about the only triangle.) Such norm de ned on the set of triangles formed by the
di erent distance between genomes computed by di erent algorithms. Thus, in this paper the
word \metric" will be used only as the distance between the genomes, and the word \norm" as
an indicator of the badness of a certain set of such distances.
      </p>
      <p>In previous cited papers, we described Panin's metric and some algorithms for its
improvement. In this paper, we consider some variants of investigation of various variants
of the triangular norm. Let us remark, that both these directions are interrelated. Namely,
we improve the interpretation of the Panin's metric in the same way as it is done in some
interpretations of genetic algorithms: we are trying to achieve a combination of parameters, for
which the triangular norm reaches a minimum value (or is close to it).</p>
      <p>Considering this second problem, i.e., the study of variants of the triangular norm, we proceed
as follows. We consider incorrect variants of obtaining the triangle inequality, which for matrices
of the order of about 50 50 is violated in the two most successful metrics (including the
Panin's metric earlier developed by us) in less than 1% of cases. It is important to note that
in order to improve the decrease in the quantitative index of the badness of the entire matrix
of considerations, we also, rst of all, consider triangles in which the triangle inequality is not
violated, but in which the badness value is relatively large. Possible improvements are related to
the use of neural networks that were not used by us in previous calculations. In this case, neural
networks solve the inverse problem: we improve (reduce) the overall badness of the matrix of
distances between genomes, forcibly changing the previously obtained distances; further, we try
to re ect these forced changes in the original algorithms for calculating distances.</p>
      <p>
        Thus, to determine the distance between genomes, we need heuristic algorithms; and, if
possible, they should not require too much time. There are various such algorithms, but
their obvious disadvantage is getting a few di ering results when using di erent heuristic
algorithms applied to the distance calculation between the same pair of DNA strings. Therefore,
the problem of quality evaluation of the metrics used (distances) arises; and based on the
results obtained in solving this problem, one can draw conclusions about the applicability of
a particular algorithm for calculating distances to various applied studies. A possible approach
to determining the quality of metrics was given in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; also partially we consider these approaches
below in this paper.
      </p>
      <p>
        However, we used the same approach for a completely di erent problem; it is as follows.
Even heuristic algorithms require often large time-consuming costs: for example, to construct a
matrix of the order of 50 50 into which the distances computed by the Needleman { Wunsch
algorithm ([
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] etc.) are recorded, it takes about 28 hours (at a processor clock frequency of
about 2 GHz, see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). Therefore, one of the problems considered in the biocybernetics is the
problem of restoration of the distance matrix between DNA sequences (below we simply shall
write \DNA matrix"), in which not all elements of the matrix under consideration are known at
the input of the algorithm, see [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], compare also [9, 1I0n].connection with all this, another
problem arises: to use the developed method of comparative evaluation of algorithms for
calculating distances between sequences for a completely di erent purpose, namely, for the brie y
described problems of reconstructing the distance matrix between DNA sequences. For this
problem, in this article we consider the application of the previously developed method of
comparative evaluation ofdistance calculation algorithms between a pair of DNA strings.
      </p>
      <p>
        Using this approach (i.e., using the method of comparative evaluation of algorithms for
calculating distances to matrix reconstruction), the reconstruction itself occurs as a result of
several computational passes. On each of the passes, for some of the as-yet-un lled (unknown)
elements of the matrix, di erent estimates are obtained; these estimates are averaged in a
special way-and the result of averaging is taken as the value of the unknown element. From the
physical point of view, the applied averaging gives the position of the center of gravity of a
onedimensional system of bodies whose mass is speci ed by a special function, i.e., the risk function,
see [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. We note that earlier we used risk functions in completely di erent subject areas; these
areas were always connected with auxiliary algorithms related to multicriteria optimization.
      </p>
      <p>Below, the matrices to be reconstructed are referred to as incompletely lled with distance
matrices. We enter this term for the matrix, from which a number of elements are \crossed out".</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>
        Thus, like our previous papers, we consider the square symmetric matrix of distances between
genomes. There it is necessary to note the following. First, the genomes we choose random enough
and took them o the site [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Second, like previous works, we consider in fact three variants for
each of the considered problems:
for very distant species, including, for example, a mammal\Bison bison" and a reptilia
\Apalone spinifera" (we use the o cial scienti c Latin names), see detailed the species' list
by the link for direct download [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ];
for a su ciently close species (human and apes);
and also for human races (in fact, they can be considered as subspecies of a biological
species).
      </p>
      <p>Let us note, looking ahead, we believe that our a theoretical construction is best applicable for
more distant species, however, acceptable results are obtained also in two other cases.</p>
      <p>
        Thus, we look at algorithms of comparing the quality of di erent algorithms that calculate
the distance between two genomes. Apparently, all these algorithms are based on the use of
various versions of the Levenshtein distance (or Levenshtein metric), see [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and very many
other following papers. It is very important to note, that for the simplest formulation of the
problem (to make the strict calculating the value of Levenshtein metric for two given genomes),
we unfortunately obtain a very long-running algorithm (program): it has to do with the actual
length of the strings of genomes. Therefore, in each of the actual algorithms (see [
        <xref ref-type="bibr" rid="ref16 ref17 ref18 ref19">16, 17, 18, 19</xref>
        ]
etc.), computation of distances between genomes in reality is a heuristic extension of the exact
algorithm for calculating the Levenshtein distance. And apparently, the approach closest to
our one is given by [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]; let us note, a little running forward, that it also uses a version of the
branch-and-bound method.
      </p>
      <p>
        Thus, the considered in our previous papers Panin's metric is no exclusion, It also is a heuristic
algorithm for calculating the close version of such metric; it is an optimization problems, see [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
etc. However, we used in it a special approach (so called multi-heuristic approach for discrete
optimization problems), and for it, we use the same heuristic as in very di erent problems. From
many such problems, let us mention two ones only:
the classical traveling salesman problem (however, we consider our own approach to this
problem, and, most importantly, our original way of specifying the input data, di erent from
the traditional geometric placement, see, e.g., [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]);
and the problem of state-minimization for nondeterministic nite automata, see, e.g.,[
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The triangular norms: their study and possible attempts of improvement</title>
      <p>
        It was justi ed in our previous works cited above, there is desirable that in the matrix of distances
between genomes, any of the resulting triangles be close to an acute angled isosceles one with
two angles exceeding 60 degrees. Several various empirically selected numerical characteristics
describing such di erences are also given in our works, see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] etc. However, in the previous
articles we did not consider detailed examples, let us consider them in this section.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the results of calculations for several metrics and several norms are presented. In this
section, we shall consider the Panin's metric only, and 3 norms (\badnesses" for the triangle
under consideration). Thus, for each triangle with the sides a b c and the angles
we considered the following norms:
bad1 = (
)= ;
bad2 = (
)= ;
bad3 = (a
b)=a :
In case if 90o, we have considered each norm by the maximum possible (1:0) or even usually
exceeding this value. We assigned an even greater value to the value of badness in the case when
the three considered sides do not form a triangle at all (that is, they do not satisfy the triangle
inequality); let us note, running ahead, that similar situations is happened for any of metrics
considered by us.
      </p>
      <p>The resulting value of the norm of the whole matrix was considered as the arithmetic mean of
the norms of all triangles. We note that for the matrices of distances between genomes (usually
from 30 30 to 50 50), the number of triangles is
30 29 28
2 3
= 4060
for dimensions 30, and 19600 for dimension 50; from these values, it is clear that the calculations
we need are quite di cult.</p>
      <p>
        Thus, let us consider the part of the table given on the page titled \Panin's metrics" of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
(they are designed as an xlsx- le and are available for the direct download), see the table on Fig.
1. (The names of the considered species can be found there on the page titled \Types ofanimals".)
      </p>
      <p>
        Let us choose species with the numbers 5, 15, 25; while doing so, we speci cally chose exactly the
three of those considered, where the metric gives rather poor results (that means the following:
the badnesses of Panin's metric give relatively worse results than other metrics comparing most
other triangles of the matrix under consideration). For all 5 considered metrics, we obtain the
following 5 triangles corresponding to the species with the numbers 5, 15, 25:
1) sides 3533, 3142, and 2940 (Panin's metric);
2) sides 1215, 1179, and 704 (van der Loo's 1st metric);
3) sides 4379, 4036, and 4029 (van der Loo's 2nd metric);
4) sides 2943, 2674, and 2492 (Pages' 1st metric);
5) sides 3444, 3046, and 2838 (Pages' 2nd metric)
(here, the numbers correspond to numbers of metrics of [
        <xref ref-type="bibr" rid="ref14 ref4">4, 14</xref>
        ]). Let us consider these 5 triangles
and also 3 other ones, see Fig. 2:
      </p>
      <p>Let us note once again, that we only need the relative lengths of the sides of the triangle.</p>
      <p>The further calculations yield the following auxiliary values and values of badnesses, see
Table 1.</p>
      <p>No. a
1 3533
2 1215
3 4379
4 2943
5 3444
6 5
7 7
8 10</p>
      <p>
        We can see, that three bad orderings for all ve triangles give the same sequence of metrics.
Once again, we mention (above, this thing has already been said, but in connection with other
facts) that this ordering di ers signi cantly from the ordering of the badness considered for
complete matrices, see the results of the calculations below and, in more detail, in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However,
in all our calculations (in any case, with the exception of less than 1% of all triangles, i.e.,
including ordering for complete matrices), such ordering turns out to be the same for all three
norms (badnesses).
      </p>
      <p>
        Another option for investigating the comparative characteristics of norms is the following one
(we will continue to consider 30 species, and the matrix of distance between genomes, given in
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). We choose two metrics and for any one xed norm we arrange 4060 triangles in order of
increasing values of this norm. At the same time, when reading that both these norms are good
(that is, they give acceptable results), we should ideally obtain identical sequences of triangles.
Actually, one of these two sequences of triangles is obtained from the other by some sequence
transpositions of neighboring elements. Since, as we have noted, the number of triangles in our
case is 4060, then the maximum possible number of transpositions of neighboring elements is
equal to
4060 4059
      </p>
      <p>= 8 239 770:
2
Let us give concrete results of work (Table 2) for the rst norm (value \bad1") only.</p>
      <p>Let us give some comments to this table. The pair of metrics is selected in the rst line.
The number of transpositions of neighboring elements (for obtaining the monotone sequence)
is given in the second line. The percent of the maximum possible number of transpositions
of neighboring elements (8 239 770) is given in the third line. We do not use Spearman's rank
correlation coe cient (and some others correlation coe cients used in similar problems); instead
of them, we use the linear function having value 1 for 0 transpositions and value 1 for 8 239 770
transpositions.</p>
      <p>Still, we note that the two other norms give somewhat worse results; but for them, the number
of transpositions does not exceed 600 000 (i.e., less than 7:5%), and, therefore, calculated by our
method rank correlation coe cient is more than 0:85.</p>
      <p>Thus, all three norms almost always give similar results. Therefore we often use the singular
for the word \norm": \some di erent versions of so-called triangular norm" etc.</p>
    </sec>
    <sec id="sec-4">
      <title>4. About one method of DNA matrix reconstruction</title>
      <p>In this section, one of the methods of comparative analysis of various algorithms for calculating
distances between DNA sequences is presented, and on the basis of this, a method for
reconstructing an incompletely lled matrix is developed. In order to carry out this comparative
analysis, we propose for the resulting algorithm to calculate the distances between genomes,
consider all possible triangles, because ideally they should be acute-angled isosceles.</p>
      <p>
        To answer the question of how \correct" is the matrix obtained as a result of some heuristic
algorithm, we propose to use the \characteristic of the departure" of the obtained triangles
from \elongated isosceles" triangles; i.e., the \badness", below we shall write this term without
quotes. In this case, as one of the variants of the badness, the following formula can be used:
=
(1)
where , and are the angles, and we admit, that [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the opinion of the
authors of this paper, this formula best characterizes the requirements described by us. Here is
the informal explanation for this: The closer the triangle to the isosceles triangle, the less the
di erence between and , and in the ideal case, the numerator is 0; in accordance with our
assumptions, an obtuse (or rectangular) isosceles triangle cannot be obtained. The performance
of the properties of acute angles increases the denominator. Consequently, the approximation
of a triangle to an isosceles triangle reduces the numerator and increases the denominator in the
formula, i.e., tends to 0.
      </p>
      <p>When calculating the badness of the entire matrix for each recovery option, we can:
either summing the corresponding badness over all possible triangles of the matrices in
question;</p>
      <p>or take the maximum badness for these triangles.</p>
      <p>In the future, we propose to consider other approaches to calculating the badness of the entire
matrix.</p>
      <p>To determine the unknown element, we consider all the possible triangles formed from
elements of this matrix for which one of the sides is unknown. For each such triangle, from
the condition that it is isosceles acute-angled, we get one of the possible values of this unknown
side. Next, we calculate the nal value of this side (an unknown element) in a special way.
Namely, for its calculation, on the basis of all the estimates obtained, the element is assumed to
be equal to the arithmetic mean of all the values obtained. As an alternative, we can exclude
the largest and smallest of the values obtained.</p>
      <p>With a large number of missing elements, the matrix of the triangles with two known sides
will be small, so the restoration of the matrix in one pass is usually impossible. When restoring
the matrix on the second and subsequent passages, we can either use only the elements of the
matrix of the last pass, or use all the matrices obtained in the previous passes. In the second
case, with each successive passage in the matrix, there are more and more elements calculated
approximately. Therefore, when evaluating an unknown element, it is possible to use the analog
of the risk function, which will adjust the weight of the elements depending on the pass number.</p>
      <p>When using the so-called static risk function, the weight of the elements with each pass
decreases with the same coe cient, and to estimate the unknown element of the matrix, formula
E =
c0E0 + c1E1 + : : : + ckEk
c0 + c1 + : : : + ck
;</p>
      <sec id="sec-4-1">
        <title>Ei is the value of the matrix element, calculated on i-th pass;</title>
        <p>c0 ; : : : ; ck are some specially chosen coe cients.</p>
        <p>
          In practice (see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]), good results are achieved when the following formulas are used for the
coe cients:
        </p>
        <p>
          By [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ], the risk function can be also dynamic: when using it, we take averaging, depending
on the \rough estimate" of the nal value: whether it is \good", \middle" or \bad". Besides, we
can also consider the sequence of such dynamic risk functions, where at each stage we rely on the
value obtained in the previous step as such a \rough estimate". In our case, to evaluate the
unknown element of the matrix of distances between DNA strings, the formula
is used; f (x) is some specially chosen decreasing function.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. The formal description of the algorithm</title>
      <p>Let us consider the detailed formal description of the algorithm brie y considered before.</p>
      <sec id="sec-5-1">
        <title>Algorithm 1 (Restoring a matrix using a static risk function)</title>
        <p>Input: Incompletely de ned matrix A = aij (all elements equal to zero outside the main diagonal
are assumed to be unknown).
Used auxiliary variables: bi is the array of unknown element estimates.</p>
        <sec id="sec-5-1-1">
          <title>Description of the algorithm.</title>
          <p>Step 1: We set s := 1 (the number of the pass).</p>
          <p>Step 2: We count h, i.e. the number of elements of the upper triangle, which are equal to 0.
Step 3:
if aij = 0 and i 6= j then
begin
kol := 0 fWe count the number of triangles,</p>
          <p>built on the unknown element under considerationg
for k := 0 to n do begin
if k 6= i and k 6= j and aki 6= 0 and akj 6= 0 then
begin
kol := kol + 1; c0 := 1; cs := cs 1 p;</p>
          <p>Eki :=
Ekj :=
c0Ek0i + : : : + csEksi ;</p>
          <p>c0 + : : : + cs
c0Ek0j + : : : + csEksj ;</p>
          <p>c0 + : : : + cs
end;</p>
          <p>if Eki &gt; Ekj then bkol := Eki else bkol := Ekj
end;
end;
aij :=
b1 + : : : + bkol :</p>
          <p>kol
Step 4: We count h1, i.e., the number of elements of the upper triangle, which are equal to 0
after the next pass.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Step 5:</title>
          <p>if h1 = 0 then goto Output 1;
if h1 = h then goto Output 2;
s := s + 1;
goto Step 2.</p>
          <p>Output 1: Filled matrix A.</p>
          <p>Output 2: Matrix A cannot be lled.</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>End of description of the algorithm.</title>
          <p>After execution of the algorithm for performing a comparative analysis of the results of the
reconstruction of the matrix, we use such an indicator as the residual; it characterizes the
deviation of the resulting matrix from the original one. We calculate the residual on the basis
of the natural metric
d =
qPn 1 Pn
i=1 j=i+1(aij
n(n
1)=2
aij )2
f
;
aij are elements of the matrix obtained as a result of applying some algorithm for calculating
f
the distances between a pair of genomes (in our case, the Needleman { Wunsch algorithm);
aij are elements of the matrix restored as a result of the above algorithm.</p>
          <p>
            Some examples of operation of the algorithm and corresponding values of residual are given
in [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Thus, a very small change of the given matrix greatly reduces its badness. The values marked
in red are chosen by us manually. However, at the present time we have a neural network that
implements such an algorithm; we are going to describe this neural network in the following
publication. But the following is much more important: these \red" changes give input
information to another neural network, the one that computes the constants for the metric.
Thus, along with one \loop" of algorithms already described in previous section, we get one
more. Table 3 above already takes into account similar improvements in the metric.</p>
      <p>
        As we said before, the received results of our computer programs are designed as an xlsx- le
and are available for the direct download by [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. The whole article i s actually a comment to this
le.
      </p>
      <p>We also note that di erent \places" of di erent metrics for di erent cases (i.e., the case of
distant animal species, the case of close animal species and the case of subspecies) talk about
the need to continue research in this direction.</p>
      <p>Also in the near future we expect to develop other approaches for comparative analysis of
various algorithms for calculating distances between sequences, as well as describe the algorithms
for reconstructing matrices based on these approaches. At present, we are working on comparing
two of such algorithms, both for application in the \normal" problems of DNA analysis, and in
the problems of DNA matrix reconstruction close to those considered in the present paper.
Acknowledgements
The authors of the article express their gratitude to Vladislav Dudnikov (Togliatti State University,
Russia) for his help in preparing this paper. The reported study was partially supported by RFBR
according to the research project No. 16-47-630829.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <string-name>
            <surname>Panin</surname>
            <given-names>A 2012</given-names>
          </string-name>
          <string-name>
            <surname>On</surname>
          </string-name>
          <article-title>a parallel implementation of the multi-heuristic approach in the problem of comparison of genetic sequences</article-title>
          <source>Vektor Nauki of Togliatti State University</source>
          <volume>4</volume>
          (
          <issue>22</issue>
          )
          <fpage>83</fpage>
          -
          <lpage>86</lpage>
          (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Makarkin</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <article-title>Panin A 2013 A parallel implementation of the multi-heuristic approach in the task of comparing genetic sequences</article-title>
          <source>Applied Mathematics</source>
          <volume>4</volume>
          (
          <issue>10</issue>
          )
          <fpage>35</fpage>
          -
          <lpage>39</lpage>
          DOI: 10.4236/ am.
          <year>2013</year>
          .410A1006
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pivneva</surname>
            <given-names>S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Trifonov</surname>
            <given-names>M 2017</given-names>
          </string-name>
          <article-title>Multiheuristic approach to compare the quality of defined metrics on the set of DNA sequences Modern Information Technologies and</article-title>
          IT Education 13(
          <issue>2</issue>
          )
          <fpage>89</fpage>
          -
          <lpage>96</lpage>
          (in Russian) DOI:
          <fpage>10</fpage>
          .25559/SITITO.
          <year>2017</year>
          .
          <volume>2</volume>
          .
          <fpage>235</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pivneva</surname>
            <given-names>S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Trifonov</surname>
            <given-names>M</given-names>
          </string-name>
          <article-title>2017Various algorithms, calculating distances of DNA sequences, and some computational recommendationsfor use such algorithms</article-title>
          CEUR Workshop Proceedings 1902
          <fpage>43</fpage>
          -
          <lpage>50</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <string-name>
            <surname>Trenina M 2018 On</surname>
          </string-name>
          <article-title>a problem of the reconstruction of distance matrices between DNA sequences</article-title>
          <source>International Journal of Open Information Technologies</source>
          <volume>6</volume>
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Needleman</surname>
            <given-names>S</given-names>
          </string-name>
          and
          <article-title>Wunsch Ch 1970 A generalmethod applicable to the search for similarities in the amino acid sequence of two proteins</article-title>
          <source>Journal of Molecular Biology</source>
          <volume>48</volume>
          (
          <issue>3</issue>
          )
          <fpage>443</fpage>
          -
          <lpage>453</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Eckes</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <article-title>NischtR</article-title>
          and Krieg T 2010 Cell
          <article-title>-matrix interactions in dermal repair</article-title>
          and
          <source>scarring Fibrogenesis Tissue Repair</source>
          <volume>3</volume>
          (
          <issue>4</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          DOI: 10.1186/
          <fpage>1755</fpage>
          -1536-3-4
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Midwood</surname>
            <given-names>K S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams L Vand Schwarzbauer J E 2004</surname>
          </string-name>
          <article-title>Tissue repair and the dynamics of the extracellular matrix</article-title>
          <source>International Journal of Biochemist&amp;ry Cell Biology</source>
          <volume>36</volume>
          (
          <issue>6</issue>
          )
          <fpage>1031</fpage>
          -
          <lpage>1037</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Evdokimova</surname>
            <given-names>N I</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kuznetsov</surname>
            <given-names>A V</given-names>
          </string-name>
          <year>2017</year>
          <article-title>Local patterns in the copy-move detection problem solution</article-title>
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>1</issue>
          )
          <fpage>79</fpage>
          -
          <lpage>87</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-1-
          <fpage>79</fpage>
          -87
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Evsutin</surname>
            <given-names>O O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shelupanov</surname>
            <given-names>A A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meshcheryakov</surname>
            <given-names>R V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bondarenko</surname>
            <given-names>D O</given-names>
          </string-name>
          <year>2017</year>
          <article-title>An algorithm for information embedding into compressed digital images based on replacement procedures with use of optimization</article-title>
          <source>Computer Optics</source>
          <volume>41</volume>
          (
          <issue>3</issue>
          )
          <fpage>412</fpage>
          -
          <lpage>421</lpage>
          DOI: 10.18287/
          <fpage>2412</fpage>
          -6179-2017-41-3-
          <fpage>412</fpage>
          -421
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <article-title>Radionov A 1998 On the choice of strategy in nondeterministic antagonistic games Programming</article-title>
          and
          <source>Computer Software</source>
          <volume>5</volume>
          <fpage>55</fpage>
          -
          <lpage>62</lpage>
          (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B 2001</given-names>
          </string-name>
          <article-title>Heuristics in programming of nondeterministic games Programming</article-title>
          and
          <source>Computer Software</source>
          <volume>5</volume>
          <fpage>277</fpage>
          -
          <lpage>288</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Nucleotide</surname>
          </string-name>
          (
          <article-title>The Nucleotide database) (Access mode</article-title>
          : http://www.ncbi.nlm.nih.gov/nuccore)
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Melnikov B</surname>
          </string-name>
          <article-title>The processed results of the computer calculations (Access mode: http://bormel</article-title>
          .ru/ BorMel-DNA.xlsx)
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Levenshtein</surname>
            <given-names>V 1966</given-names>
          </string-name>
          <article-title>Binary codes capable of correcting deletions, insertions</article-title>
          ,
          <source>and reversals Soviet Physics Doklady10</source>
          (
          <volume>8</volume>
          )
          <fpage>707</fpage>
          -
          <lpage>710</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Winkler</surname>
            <given-names>W 1990</given-names>
          </string-name>
          <article-title>String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage Proceedings of the survey research methods sections</article-title>
          ,
          <source>American Statistical Association</source>
          <volume>4</volume>
          (
          <issue>22</issue>
          )
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pages</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aboyoun</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gentleman</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <string-name>
            <surname>DebRaoy S Biostrings</surname>
          </string-name>
          <article-title>: String objects representing biological sequences, and matching algorithms (Access mode: https://rdrr</article-title>
          .io/bioc/Biostrings/)
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Morgan</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anders Sand Lawrence M ShortRead:</surname>
          </string-name>
          <article-title>a bioconductor package for input, quality assessment and exploration of high-throughput sequence data (Access mode: https:// www</article-title>
          .ncbi.nlm.nih.gov/pmc/articles/PMC2752612/)
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Van der Loo M 2014 The Stringdist</surname>
          </string-name>
          <article-title>Package for Approximate String Matching</article-title>
          <source>R Journal</source>
          <volume>6</volume>
          <fpage>111</fpage>
          -
          <lpage>122</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Althaus</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caprara</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LenhofH-P and Reinert</surname>
            <given-names>K 2006</given-names>
          </string-name>
          <article-title>A branch-and-cut algorithm for multiple sequence alignment</article-title>
          <source>Mathematical Programming</source>
          <volume>105</volume>
          <fpage>387</fpage>
          -
          <lpage>425</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B 2006</given-names>
          </string-name>
          <article-title>Multiheuristic approach to discrete optimization problems Cybernetics</article-title>
          and
          <source>Systems Analysis</source>
          <volume>42</volume>
          (
          <issue>3</issue>
          )
          <fpage>335</fpage>
          -
          <lpage>41</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Makarkin</surname>
            <given-names>S</given-names>
          </string-name>
          and
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B 2013</given-names>
          </string-name>
          <article-title>Geometrical methods for solving the pseudo-geometric version of the traveling salesman problem Stochastic optimization in informatics 9(2</article-title>
          )
          <fpage>54</fpage>
          -
          <lpage>72</lpage>
          (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B 2000</given-names>
          </string-name>
          <article-title>Once more about the state-minimization of the nondeterministic finite automata</article-title>
          <source>Journal of Applied Mathematics and Computing</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          )
          <fpage>655</fpage>
          -
          <lpage>662</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Melnikov</surname>
            <given-names>B</given-names>
          </string-name>
          and
          <string-name>
            <surname>Tsyganov</surname>
            <given-names>A 2012</given-names>
          </string-name>
          <article-title>The state minimization problem for nondeterministic finite automata: the parallel implementation of the truncated branchand bound method Proceedings 5th International Symposium on Parallel Architectures, Algorithms</article-title>
          and Programming (Taipei)
          <fpage>194</fpage>
          -
          <lpage>201</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>