<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Acceleration For Bioinformatics-Based Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="editor">
          <string-name>Anderson Acceleration, SVM, Sequence Analysis</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia State University</institution>
          ,
          <addr-line>Atlanta</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms with applications in various fields, including deep learning and optimization. Despite its popularity in these areas, the efectiveness of AA in classical machine learning classifiers has not been thoroughly studied. Tabular data, in particular, presents a unique challenge for deep learning models, and classical machine learning models are known to perform better in these scenarios. However, the convergence analysis of these models has received limited attention. To address this gap in research, we implement a support vector machine (SVM) classifier variant incorporating AA to speed up convergence. We evaluate the performance of our SVM with and without Anderson acceleration on several datasets from the biology domain and demonstrate that the use of AA significantly improves convergence and reduces the training loss as the number of iterations increases. Our findings provide a promising perspective on the potential of Anderson acceleration in training simple machine learning classifiers and underscore the importance of further research in this area. By showing the efectiveness of AA in this setting, we aim to inspire more studies that explore the applications of AA in classical machine learning.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Anderson acceleration is a method that can be used to</title>
        <p>Based on the diference between the current and prior
weight vectors, a correction term is added to the weight
vector updates at each iteration. When the gradients
are changing quickly, or the optimization landscape is
very non-convex, this correction term can aid in reducing
oscillations and speeding convergence. Consider the
optimization issue as a trajectory in the weight space, where
the weight vector reflects the position at each iteration, to
eration, the gradients at each location alone control the
trajectory of the optimization process. While Anderson
acceleration can smooth out the trajectory and minimize
oscillations, the trajectory is also afected by the
diference between the current and prior weight vectors.</p>
        <p>Solving the convex problem in finding gradient descent
is a typical problem in optimization. Newton’s methods
use the inverse Hessian matrix [1] to accelerate
gradient descent, and they are successful in achieving a faster
rate of convergence compared to gradient descent or
accelerated gradient descent, but it is very expensive. By
utilizing knowledge of the curvature of the loss function
for deep learning [9]. Although it might not ofer as big
of a gain in terms of convergence speed compared to
more complex models, Anderson acceleration may still
be efective for optimizing simpler classical ML models.</p>
        <p>Ultimately, the precise characteristics of the optimiza- the technique is efective in improving convergence in
tion problem being solved will determine how efective other optimization problems [10, 11, 12, 13].
Anderson acceleration is in each given scenario. Another area where Anderson acceleration has shown</p>
        <p>In this work, we propose a robust approach to perform promise is in training sparse models, such as sparse
codAnderson acceleration (AA) to speed up the training of ing and dictionary learning [19]. In these applications,
SVM classifier models for multi-dataset training from Anderson acceleration efectively improves convergence
the domain of biological sequencing. We regularize AA and achieves sparsity, an essential consideration in many
by including it in the loss optimization of simple lin- machine-learning models.
ear classifier models (SVMs) and classical ML training, In recent years, researchers have also explored the
in contrast to previous work in complex deep learning use of Anderson acceleration in the training of
generamodels. We numerically demonstrate the efectiveness tive adversarial networks (GANs) [20]. In these
applicaof the proposed acceleration by comparing the training tions, Anderson acceleration has been shown to improve
loss with an increasing number of iterations on diferent convergence and stability and to produce high-quality
sets of biological sequences. The results show that using synthesized images.</p>
        <p>AA significantly improves convergence and eficiently Finally, it’s worth noting that Anderson acceleration
accelerates the training of traditional ML models. has also been applied to the training of robust models
that are robust to outlier examples and to adversarial
attacks [21]. In these applications, Anderson acceleration
2. Related Work efectively improves the robustness of machine learning
models and defends against adversarial attacks.</p>
        <p>Iterative optimization methods like gradient descent and
its variants are widely used for training ML models, but
convergence can be slow, especially for high-dimensional 3. Proposed Approach
problems. Anderson acceleration (AA) is a technique for
speeding up the convergence of these methods by ex- This section first discusses the algorithm we use for the
ploiting the geometry of the search space. It was first proposed method. Later, we discuss the theoretical
underintroduced by Anderson [10] as a way to accelerate the standing of Anderson Acceleration and the assumptions
convergence of the conjugate gradient method and has considered.
since been applied to a variety of optimization techniques, Anderson Acceleration (AA) attempts to make greater
such as Newton’s method [11], stochastic gradient de- use of previous data than the fixed-point iteration, which
scent [12], and the Nelder-Mead simplex algorithm [13]. only takes the most recent iteration to produce a new</p>
        <p>In recent years, there has been a growing interest in estimate,  +1 = (  ). The proposed method’s
algousing Anderson acceleration for training deep neural net- rithmic pseudocode is provided in Algorithm 1, and the
works, where it has been applied to a variety of tasks, model training flow chart is shown in Figure 1. For model
such as image classification [ 14], natural language pro- training, given the feature embedding X (or  ) made from
cessing and reinforcement learning [15]. Anderson ac- SARS-CoV-2 sequences and its lineage (variants) as
laceleration is particularly well-suited for deep learning bels Y, the first step involves the embedding generation
problems, where it has been shown to improve conver- using the methods discussed in Section 3.2, the feature
gence and generalization performance [16, 17]. A method vector generated and the labels for the sequences are
to estimate a sparse generalized linear model with con- then supplied to the algorithm. In the algorithm, firstly,
vex or non-convex separable penalties using Anderson the weight vector is initialized with random values (1 ×
acceleration is also proposed in [17]. In these approaches, length of sequence). We then initialize the   and
Anderson acceleration has been shown to improve con-   values (lines 2-3 in Algorithm 1) for each
iteravergence and generalization performance compared to tion. Afterward, for each input sample X and its label y,
traditional optimization methods, such as gradient de- we predict using weight vector  ⃗ (line 5 in Algorithm 1),
scent. In addition, it has also been applied to logistic the predicted value is normalized, and the gradient is
regression [18] and other ML models. updated (lines 5 and 6 in Algorithm 1). Sample loss is</p>
        <p>However, despite these advances, Anderson accelera- updated, and the iteration loss list   is maintained
tion has not been widely applied to classical machine (lines 8 to 9 in Algorithm 1). After every sample is
prolearning classifiers, such as support vector machines cessed, the gradient is averaged out, and weight history
(SVM), despite the potential for improved convergence is maintained for the iteration (lines 11 and 12 in
Algorates. This is particularly relevant for tabular data, where rithm 1), also shown in Figure 1-e. Anderson acceleration
classical machine learning classifiers are widely used. is used to update the weight vector from the third
iterThe limited exploration of Anderson acceleration in clas- ation since we need at least two weight histories. The
sical machine learning classifiers is surprising, given that diference between the last two weight histories is
com</p>
        <p>|∇ () − ∇ ( )| ≤ | −  |
for all ,  .
2. The objective function  is bounded below, i.e.,
there exists a constant  min such that  () ≥  min
for all  .
3. The optimization algorithm is using a fixed step
size  , and the sequence of points   generated
by the algorithm satisfies
puted and is multiplied with Anderson factor  as shown
in lines 14 and 15 in Algorithm 1, also shown in Figure
Figure 1-ii. The loss and accuracy for the iteration are
saved, and the next iteration is performed to do the same
steps. Finally, after all iterations, the loss list is returned
for the given input feature vectors. The loss for each
Iteration is captured and argued to be the better option
for faster convergence using Anderson Acceleration.</p>
        <sec id="sec-1-1-1">
          <title>3.1. Anderson Acceleration</title>
          <p>One way to formally prove the convergence of Anderson
acceleration is to use the concept of “linear convergence”,
which refers to the rate at which the optimization
process approaches the optimal solution. Specifically, we can
show that under certain conditions, the Anderson
acceleration optimization process converges linearly, meaning
that the error decreases by a constant factor at each
iteration. This contrasts standard gradient descent, which
converges at a sublinear rate (e.g., the error decreases by
a factor less than 1 at each iteration).</p>
          <p>To prove this result, we can start by considering the
optimization problem in the form of a series of updates
to the weight vector, where the update at each iteration
is given by:</p>
          <p>+1 =   −  ∇ (  )
where   is weight vector at iteration  ,  is the learning
rate, and ∇ (  ) is the gradient of the objective function
 at   . Now, we can add the Anderson acceleration term
to the update, resulting in:</p>
          <p>+1 =   +  (  −  −1 ) −  ∇ (  )
Next, we can define the error at each iteration as:</p>
          <p>=   −  ∗
where  ∗ is the optimal weight vector. Now, we can
substitute the expression for the update into the
expression for the error and rearrange it to get:
(7)
(8)
 +1 = (1 −  )  +   −1 −  ∇ (  ) (4) wThheer1e y−i1s0theistraudedleadbetlo, atnhde yyPPrreeddistothaevporieddtichteedlolagboelf.
where we have used the fact that  ∗ =   − ∇ (  ). zero, which will cause an infinity error. The negative
Now, we can define the “damping factor” as: sign ensures the optimization problem is formulated as a
minimization problem (hence, our loss can be negative).</p>
          <p>= 1 −  (5) The flowchart for training with Anderson
Acceleraand rewrite the expression for the error as: tion(AA) is shown in Figure 1. We provide the Feature
vectors X (or  ) as input along with the labels Y. Few
pa +1 =    + (1 −  ) −1 (6) roaf miteetreartiionnitsiaalnizdattihoenwsaerieghretqvueicrteodr,
isnuictihalaizsetdhewnituhmrbaenrThis expression has the form of a weighted average, dom values. Anderson acceleration factor  , for which we
where the weight of the current error is given by  , and tried several values to study its impact and select the best
the weight of the previous error is given by 1 −  . Now, value. An empty list for loss is also shown in Figure 1-b.
we can make the following assumptions: For a given number of iterations, we process the samples
| +1 −   | ≤ 
for some constant  and all  .</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>These assumptions are typically made in the analysis</title>
        <p>of gradient descent algorithms. They allow us to establish
certain convergence properties of the algorithm.
Specifically, under these assumptions, it can be shown that the
sequence of points generated by gradient descent with
Anderson acceleration converges to a stationary point (a
point where the gradient is zero) of the objective
function  at a rate of (1/) , where  is the iteration number.</p>
        <p>This convergence rate is faster than (1/ 2)achieved by
(1) plain gradient descent without Anderson acceleration.</p>
        <p>Intuitively, Anderson acceleration can be thought of
as a way to incorporate information from past iterations
into the current iteration to improve the convergence
rate of the optimization algorithm. This is achieved
using a weighted combination of the current gradient and
(2) the diference between the current and previous iterates.</p>
        <p>The weights are chosen such that the resulting update
direction better approximates the true gradient at the
current iterate, leading to faster convergence.
(3) To compute the loss, we use “cross-entropy loss” using
the following expression:
Cross Entropy Loss = −( ×(    +1−10))
(9)
to compute the gradient and loss for the sample. The
gradient is averaged out, and we update the weight using
Anderson Acceleration for that iteration, also shown in
Figure 1-ii. The process is repeated for the given number
of iterations.</p>
      </sec>
      <sec id="sec-1-3">
        <title>We employ the three representation learning techniques below to convert the biological sequences into lowdimensional embeddings.</title>
        <p>3.2.1. Spike2Vec [22]</p>
      </sec>
      <sec id="sec-1-4">
        <title>This technique ofers numerical embedding of the sup</title>
        <p>plied input spike sequences to facilitate the use of ML
models. Initially, it produces  -mers of the supplied spike
sequence because  -mers are known to maintain the
sequence’s ordering information. For a sequence of length
 , the total number of  -mers produced is  −  + 1 .</p>
        <p>For every particular sequence,  -mers is a collection of
(contiguous) amino acids (also known as mers) of length
 . (also called nGram in the NLP domain). To convert the
 -mers alphabetical data into a numerical representation,
the Spike2Vec computes the frequency vector based on
 -mers. This vector comprises the counts of each  -mer
in the sequence. A fixed-length feature vector is then
made using the generated  -mers and their frequencies
in a sequence. The character alphabet Σ and the length of
the  -mers are used to calculate the length of this feature
vector, which is |Σ| .
3.2.2. Minimizer [23]
The performance of sequence classification is
significantly impacted by the size and sparsity of feature vectors
for sequences based on  -mers frequencies. The idea of
employing non-contiguous length  sub-sequences (
mers), proposed by spaced  -mers, to create compact
feature vectors with reduced sparsity and size. It first
computed  -mers using a spike sequence as input. We
calculate  -mers, where  , from those  -mers. To conduct
the trials, we used  = 4 and  = 9 . The gap’s dimensions
are determined by  −  . However, this approach still
involves bin scanning, which is computationally expensive</p>
        <p>The cross-entropy loss penalizes the predicted scores
for the incorrect classes and rewards the predicted score
for the correct class. During training, the goal is to
minimize the cross-entropy loss so that the predicted scores
for the correct class are as high as possible compared to
and generates very high dimensional feature represen- those for incorrect classes.
tation. We took 500 Principle components by applying</p>
      </sec>
      <sec id="sec-1-5">
        <title>PCA [25] for high dimensional embeddings (feature vec</title>
        <p>tor length &gt; 1000).</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Experimental Evaluation</title>
      <sec id="sec-2-1">
        <title>To perform evaluation, we use datasets including Genome and Host. The details are as follows:</title>
        <sec id="sec-2-1-1">
          <title>4.1. Dataset Statistics</title>
          <p>4.1.1. Genome Dataset</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Using the well-known and widely used database of</title>
      </sec>
      <sec id="sec-2-3">
        <title>SARS-CoV-2, GISAID [26], we retrieve the full-length</title>
        <p>nucleotide sequences of the coronavirus. Our dataset
includes the COVID-19 variant information and 8220
nucleotide sequences. In our sample, there are 41 diferent</p>
      </sec>
      <sec id="sec-2-4">
        <title>Lineages altogether. The goal is to classify the sequences and predict the Lineage it belongs to.</title>
        <p>4.1.2. Host Dataset
The National Institute of Allergy and Infectious Disease
(NIAID) Virus Pathogen Database, Investigation Resource
(ViPR) [27], and GISAID was used to retrieve the Spike
protein sequences from a collection of spike sequences
from several clades of the Coronaviridae family, along
with details about the hosts that each spike sequence
has infected. The hostname is used as the class label in
our classification tasks for this dataset. It displays the
distribution of the dataset across the various host types
(grouped by family).</p>
        <sec id="sec-2-4-1">
          <title>4.2. Evaluation Metrics</title>
          <p>For performance evaluation of SVM without and with
Anderson acceleration, we use cross-entropy loss. The
crossentropy loss, also known as the negative log-likelihood
loss, is commonly used in supervised learning problems
with categorical targets. The cross-entropy loss for a
single sample can be expressed mathematically as follows:
   
∑=1  
 = − log ( 
 ), where   is the predicted score for

the correct class and  is the number of classes. The
cross-entropy loss is averaged over the entire training set
to obtain the final objective function optimized during
training.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results And Discussion</title>
      <sec id="sec-3-1">
        <title>In this section, we report results comparison without and with Anderson acceleration using cross-entropy loss for diferent biological sequence datasets.</title>
        <sec id="sec-3-1-1">
          <title>5.1. Results For Genome Data</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>The results for genome data using all embedding methods</title>
        <p>are reported in Figure 2 for the best value of Anderson
Acceleration (AA) factor  . We use cross-validation to
get the best value for  ranging from (0, 0.1, 0.2, ⋯, 1.0)
for respective embeddings, where 0 implies no AA and
1.0 shows maximum AA. For Spike2Vec embedding, we
can observe that although cross-entropy loss without
Anderson acceleration is smaller with fewer iterations, as
we increase the iterations, the loss increases too. On the
other hand, the loss does not increase significantly while
using Anderson acceleration in SVM. Moreover, with AA,
the loss started to converge after 300 iterations, which is
almost half compared to the loss convergence without</p>
      </sec>
      <sec id="sec-3-3">
        <title>AA (i.e., ≈ 600 iterations). For Minimizer-based embed</title>
        <p>ding, although we can observe more fluctuation in loss
compared to Spike2Vec, the loss (and convergence) is less
when SVM is used along with AA. Similarly, the
behavior of spaced  -mers-based embedding difers from both</p>
      </sec>
      <sec id="sec-3-4">
        <title>Spike2Vec and Minimizer-based embedding. Although</title>
        <p>we can see an overall increasing trend in loss with an
increasing number of iterations, the SVM with AA loss
is lower than without AA when the number of iterations
increases. Overall, it is evident from all three embedding
results that the loss with AA is less than the loss without</p>
      </sec>
      <sec id="sec-3-5">
        <title>AA for diferent embedding methods as we increase the</title>
        <p>number of iterations, showing the significance of using</p>
      </sec>
      <sec id="sec-3-6">
        <title>AA for the training of SVM.</title>
        <p>sso
L
0
− 5
number of iterations, while the y-axis shows the cross entropy
loss. The figure is best seen in color.
machine (SVM) classifier. Our experiments on several
sequence-based bioinformatics datasets show that
Anderson acceleration results in a considerable decrease in
training loss and improved convergence compared to the
standard SVM. In the future, we will investigate more
traditional linear classifier models, such as the
Perceptron, and bigger biological data to assess their scalability
and resilience. Moreover, evaluating the robustness and
generalizability of the proposed Anderson acceleration
method is also an interesting future extension.</p>
        <sec id="sec-3-6-1">
          <title>5.2. Results For Host Data</title>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>The results for host data using all embedding methods</title>
        <p>are reported in Figure 3 for the best value of Anderson
Acceleration (AA) factor  . We use cross-validation to get
the best value for  ranging from (0, 0.1, 0.2, ⋯, 1.0) for
respective embeddings, where 0 implies no AA and 1.0
shows maximum AA. For Spike2Vec-based embedding,
the behavior is not diferent from the same embedding
in the case of Genome data. Although SVM without and
with Anderson acceleration converges very fast (i.e., in &lt;
100 iterations), the cross entropy loss with AA is smaller
than SVM without AA. We observed some improvement
in the SVM without AA in the Minimizer and Spaced
 -mers-based embedding methods. However, when the
number of iterations is smaller, we can observe some
lfuctuation in the cross-entropy loss for SVM without
AA, compared to the smooth loss curve for SVM with
AA, showing its significance in eficient training of the
SVM classifier.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>ference on Machine Learning</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>6620</fpage>
          -
          <lpage>6629</lpage>
          . [13]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Barton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Ivey</surname>
          </string-name>
          <string-name>
            <surname>Jr</surname>
          </string-name>
          , Modifications of the Nelder-
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>sponse optimization</article-title>
          ,
          <source>Technical Report</source>
          ,
          <year>1991</year>
          . [14]
          <string-name>
            <surname>M. L. Pasini</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Reshniak</surname>
            ,
            <given-names>M. K.</given-names>
          </string-name>
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>deep learning models</article-title>
          ,
          <source>in: SoutheastCon</source>
          <year>2022</year>
          ,
          <year>2022</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          pp.
          <fpage>289</fpage>
          -
          <lpage>295</lpage>
          . [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gong</surname>
          </string-name>
          , Ofline rein-
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>robotic tasks, Applied Intelligence</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . [16]
          <string-name>
            <surname>M. L. Pasini</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Reshniak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Stoyanov</surname>
          </string-name>
          , Sta-
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>preprint arXiv:2110.14813</source>
          (
          <year>2021</year>
          ). [17]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Bertrand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Klopfenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Bannier</surname>
          </string-name>
          , G. Gidel,
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Massias</surname>
          </string-name>
          , Beyond l1:
          <article-title>Faster and better</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>arXiv:2204.07826</source>
          (
          <year>2022</year>
          ). [18]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Bertrand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Massias</surname>
          </string-name>
          , Anderson acceleration of
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>on Artificial Intelligence and Statistics</source>
          ,
          <year>2021</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          1288-
          <fpage>1296</fpage>
          . [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <article-title>Computational assessment of the an-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>(STSIVA)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Saad</surname>
          </string-name>
          , Solve min-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>tions</surname>
          </string-name>
          ,
          <year>2022</year>
          . [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Garstka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cannon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goulart</surname>
          </string-name>
          , Safeguarded
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <year>2022</year>
          , pp.
          <fpage>435</fpage>
          -
          <lpage>440</lpage>
          . [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Patterson, Spike2vec: An eficient and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <article-title>scalable embedding approach for covid-19 spike</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Big</given-names>
            <surname>Data (Big Data)</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>1533</fpage>
          -
          <lpage>1540</lpage>
          . [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mount</surname>
          </string-name>
          , J. Yorke,
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>parison</surname>
          </string-name>
          ,
          <source>Bioinformatics</source>
          <volume>20</volume>
          (
          <year>2004</year>
          )
          <fpage>3363</fpage>
          -
          <lpage>3369</lpage>
          . [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sekhon</surname>
          </string-name>
          , et al.,
          <source>Gakco: a fast gapped</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>and Knowledge Discovery in Databases</source>
          ,
          <year>2017</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          356-
          <fpage>373</fpage>
          . [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Esbensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Geladi</surname>
          </string-name>
          , Principal compo-
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>ratory systems 2</source>
          (
          <year>1987</year>
          )
          <fpage>37</fpage>
          -
          <lpage>52</lpage>
          . [26]
          <string-name>
            <given-names>GISAID</given-names>
            <surname>Website</surname>
          </string-name>
          , https://www.gisaid.org/,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Online; accessed 17-October-2022]. [27]
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Pickett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Sadat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Noronha</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>acids research 40</source>
          (
          <year>2012</year>
          )
          <fpage>D593</fpage>
          -
          <lpage>D598</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>