=Paper=
{{Paper
|id=Vol-2962/paper11
|storemode=property
|title=Alternative Base Callers Aid Real-time Analysis of SARS-CoV-2 Sequencing Runs
|pdfUrl=https://ceur-ws.org/Vol-2962/paper11.pdf
|volume=Vol-2962
|authors=Vladimír Boža,Matej Fedor,Kristína Boršová,Viktória Čabanová,Jana Černíková,Viktória Hodorová,Peter Perešíni,Klára Sládečková,Boris Klempa,Jozef Nosek,Broňa Brejová,Tomáš Vinař
|dblpUrl=https://dblp.org/rec/conf/itat/BozaFBCCHPSKNBV21
}}
==Alternative Base Callers Aid Real-time Analysis of SARS-CoV-2 Sequencing Runs==
<pdf width="1500px">https://ceur-ws.org/Vol-2962/paper11.pdf</pdf>
<pre>
             Alternative Base Callers Aid Real-Time Analysis of SARS-CoV-2
                                    Sequencing Runs

    Vladimír Boža1 , Matej Fedor1 , Kristína Boršová2,3 , Viktória Čabanová3 , Jana Černíková1 , Viktória Hodorová2 , Peter
               Perešíni1 , Klára Sládečková1 , Boris Klempa3 , Jozef Nosek2 , Broňa Brejová1 , Tomáš Vinař1

                      1 Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia
                                 2 Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia
                         3 Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia


Abstract: One of the advantages of nanopore sequenc-                      the raw sequencing signal using nanopolish (Loman et al.,
ing is its ability to provide data in real time, which al-                2015).
lows monitoring, early stopping, and fast identification                     One of the problems with this protocol is that the PCR
of mutations in sequenced material. Nanopore sequencer                    amplification step introduces wide variation in coverage,
measures electrical current induced by the DNA passing                    both between samples and between different amplicons
through a pore and this signal needs to be translated to                  within a sample. Due to the high error rate of nanopore
a string over the alphabet {A,C,G,T} through a process                    sequencing, it is not advisable to determine mutations in
called base calling. To achieve base calling in real time,                regions with low coverage (the standard pipeline set the
the mainstream tools (such as Guppy provided by Ox-                       coverage threshold at 20). In such scenarios, it is diffi-
ford Nanopore Technologies) require the support of high-                  cult to estimate when to stop data acquisition. Fortunately,
performance GPUs. This is prohibitive in many settings.                   results of nanopore sequencing can be processed in real-
Here, we evaluate the accuracy of several alternative base                time and on-the-fly monitoring during sequencing helps to
callers, which only require use of a desktop CPU or a sup-                inform decisions on when to stop the run.
port of low-cost USB-connected accelerator. While their                      A nanopore sequencer reads an electrical signal induced
accuracy is, in general, lower than that of Guppy in a high-              by the DNA passing through a pore and before subse-
accuracy mode using GPUs, we show that these alterna-                     quent analysis, this signal needs to be translated to DNA
tive base callers can act as a replacement for monitoring                 bases via base calling. A base caller provided by manu-
and mutation detection in SARS-CoV-2 sequencing runs,                     facturer (called Guppy), requires a machine with a high
without sacrificing the accuracy of the final result.                     performance GPU, which is not available in many laptop
   Availability:     http://compbio.fmph.uniba.sk/                        computers and is also problem in desktops due to current
sars-cov-2-sequencing/                                                    NVIDIA GPU shortages.
                                                                             In this work, we propose to use alternative base callers
                                                                          with lower demands on computational resources, albeit
1     Introduction                                                        producing reads with a slightly lower accuracy (Boža
                                                                          et al., 2020; Perešíni et al., 2020; Boža et al., 2021). We
The ARTIC protocol has originally been developed for se-                  demonstrate that using our alternative base caller not only
quencing viral genomes with nanopore sequencing devices                   allows monitoring, but can also produce the final sequence
(Quick et al., 2016), and it has become a commonly used                   of similar quality as using the standard base caller. More-
protocol for SARS-CoV-2 sequencing (Tyson et al., 2020).                  over, we are able to call tentative variants during sequenc-
Briefly, overlapping segments of the viral genome are first               ing from incomplete sequence using a custom made clas-
amplified using PCR, and the resulting amplicons are se-                  sifier. This allows us to report important information about
quenced using nanopore sequencing (see a simplified il-                   virus lineage determination already during the sequencing
lustration in Figure 1). Typically, multiple samples are                  run, well before the full sequence is determined.
sequenced in parallel using barcoding. In bioinformatics
post processing, the individual reads are first assigned to
individual samples, using demultiplexing according to the                 2   Evaluation of Alternative Base Callers
barcodes. Stricter parameters (requiring the presence of
barcodes on both ends of the read) are typically used in                  We have evaluated three alternative base callers that
order to avoid barcode bleeding and to discard partially                  can achieve real-time base calling without the use of a
sequenced reads. The reads are then aligned to the refer-                 GPU: Deepnano-blitz (Boža et al., 2020), Deepnano-Coral
ence genome and mutations are discovered with the aid of                  (Perešíni et al., 2020), and Osprey (Boža et al., 2021).
                                                                          There are also other alternative base callers such as Bonito
      Copyright c 2021 for this paper by its authors. Use permitted un-
                                                                          (Seymour, 2020) and SACall (Huang et al., 2020), but
der Creative Commons License Attribution 4.0 International (CC BY         none of them offers real-time base calling on a CPU or
4.0).                                                                     a low power USB-connected TPU.
            Input cDNA


            Primer pool 1

                                                            b)


            Primer pool 2


 a)          Combination of pools                 c)

  AAAGTAGATGCTAAAGCTTACAAAGAAGT                               AAA GTAGATGCTAAAGCTTACAAAGA AGT
                                                              AAA ATTCTTTTAAGGCGGGTCATGGT AGT
  GGGCCTTTTTATATATCCTACTATTGTTT
  TATCTCTGCTATAGTAACCTGAAAGTCTC                               GGG CCTTTTTATATATCCTACTATTG TTT
  AAAATTCTTTTAAGGCGGGTCATGGTAGT                              TAT CTCTGCTATAGTAACCTGAAAGT CTC
  TATTTATGTTCTTTTAACGTGCAACCCTC                              TAT TTATGTTCTTTTAACGTGCAACC CTC
 d)                                                         e)

  AGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGC
  AGGTGCCACTACATGTGGTTACTTACCCCAAAA
    GGTGCCACTACATGTGGTTACTTACCCCAAAAT
     GTGCCACTACTTGTGGTTACTTACCCCAA
    GGTGCCACTACATGTGGTTACTTACCCCAAAA
     GTGCCACTACATGTGGTTACTTACCCCAAAA
                         TTACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAGC
                          TACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAG
                        CTTACCCCAAAATGCTGTTGTT-AAATTTATTGTCC
                         TTACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAG
                           ACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAGC
 f)


 AGGTGCCACTACATGTGGTTACTTACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAGC
 g)

Figure 1: A simplified illustration of the ARTIC protocol workflow. (a) First each virus sample is amplified, using
target-specific primers. Two pools of primers are used to obtain overlapping amplicons. (b) Amplicons from multiple
samples are tagged using barcodes and sequenced together. (c) All reads are sequenced on a MinION device which for
each sequenced molecule produces an electrical signal. (d) The electrical signal is converted to the string using base
calling software. (e) Barcodes at ends of reads are recognized by the demultiplexing software and individual reads are
assigned to their samples. (f) Reads are mapped to the reference. (g) Mutations are called based on the consensus of
multiple reads.
   Deepnano-blitz (Boža et al., 2020) is a real-time CPU                  104
base caller based on recurrent neural networks. Deepnano-
blitz allows adjustment of the time vs. accuracy tradeoff
                                                                          103
by changing the size of the neural network model. Smaller
version (48) can run in real time on a single CPU core,


                                                               Coverage
larger version (96) requires multiple cores to achieve real-              102
time performance. The accuracy of the smaller version is
slightly lower than the accuracy of Guppy 4.4 in the fast
mode, the larger version is comparable to Guppy 4.4 in the                101
                                                                                    barcode 01   barcode 06    barcode 11     barcode 16           barcode 20
fast mode.                                                                          barcode 02   barcode 07    barcode 12     barcode 17           barcode 22
                                                                                    barcode 03   barcode 08    barcode 13     barcode 18           barcode 23
   Deepnano-Coral (Perešíni et al., 2020) is a convolu-                             barcode 04   barcode 09    barcode 14     barcode 19           barcode 24
                                                                          100       barcode 05   barcode 10    barcode 15
tional neural network base caller. It requires Coral Edge                       0         5000      10000      15000        20000          25000            30000
                                                                                                              Position
TPU, which is a sub-$100 accelerator from Google that
can connect to a USB port, with very low power require-
ments. Deepnano-Coral is best suited for laptop com-           Figure 2: Variance in the coverage within and between
puters that do not have GPU support, as well as in sce-        samples.
narios where power consumption may become a limiting
factor (such as sequencing in the field). The accuracy of
real-time base calling with Deepnano-Coral falls between       used the results of this standard pipeline as a ground truth.
Guppy 4.4 fast and high-accuracy (HAC) modes.                     For each of the above mentioned base callers, as well
   Osprey (Boža et al., 2021) is a CPU-based base caller       as for Guppy 3.4 in the high-accuracy mode, we reran the
that uses architecture similar to Deepnano-Coral, but is       ARTIC pipeline with their base calls and compare the re-
further improved by using a technique called dynamic           sults (see Table 1). Guppy 3.4 was used as a representa-
pooling and decoding via transducers. The accuracy of          tive base caller from a year ago. We also run all of our
real-time base calling is equivalent to Guppy 3.4 HAC and      base callers with a lower demutiplexing threshold, which
better than Guppy 4.4 fast. Computational requirements         slightly increases the coverage, due to more reads being
are similar to Deepnano-blitz 96.                              demutiplexed to individual samples.
   Using faster base callers usually results in sacrificing       Only very few positions (up to 2 in 23 samples) are
accuracy at the individual read level. However, in case of     called differently (B→B column). Even though these
the ARTIC pipeline, multiple reads are aligned to each re-     clearly represent erroneous base calls (see Table 2), there
gion, and only differences that consistently occur in many     are so few of them that they do not impact the overall
reads are considered proper variants. Moreover, the AR-        accuracy significantly. The largest problem presents an
TIC pipeline uses Nanopolish (Loman et al., 2015), which       increased number of “unknown” calls (B→N column).
works directly with the raw sequencing signal, as an un-       These are mainly concentrated within a single 310bp re-
derlying variant caller. Therefore the base calling accu-      gion (21242-21551) which in several samples had an ex-
racy is not as important, since base calls are only used for   tremely low coverage (see Figure 2). With lower efficiency
demultiplexing and for the initial alignment of the read to    of demultiplexing due to base calling errors, the coverage
the reference in Nanopolish.                                   of this region was in some samples pushed below the mini-
   The ARTIC pipeline sometimes calls a particular base        mum coverage threshold of the ARTIC pipeline and conse-
as unknown (denoted as N in the sequence). This can            quently was masked with Ns in the result. There were sev-
happen for two reasons: low coverage of an amplicon or         eral additional “unknown” calls of individual bases which
conflicting information from different sequencing reads.       were clustered around certain positions in the genome. We
Assigning an unknown base represents a conservative de-        suspect that this is due to some biases stemming from
cision and is used wherever it is impossible to decide         nanopore sequencing, where variants of some bases in cer-
whether a particular base is the same as the reference or      tain contexts are difficult to distinguish.
represents a mutation with high enough confidence.                On the other hand, some additional bases are called
   We have evaluated the performance of each of the            compared to baseline (N→B column). In all cases, these
above mentioned base callers in the context of the AR-         were called as the original reference. Almost all cases
TIC pipeline. For the evaluation purposes, we have used        were at positions 16255 and 16256 and one case was in the
a sequencing run from January 13, 2021 with 23 barcoded        region 21220-21296, where coral-q50 increased the cover-
SARS-CoV-2 samples (the 24th sample was excluded due           age over the minimum threshold.
to very low coverage) using a MinION run with R9.4.1              While in some cases the use of our alternative base
flow cell, LSK109 chemistry, and 2-kbp amplicon scheme         callers may result in an incomplete sequence (compared
by Resende et al. (2020). In the standard software pipeline,   to the baseline), in general our results show that each of
we use Guppy 4.4 (highest version available in the time of     these tools is a viable alternative to the standard base call-
analysis) in the high accuracy mode to base call the reads,    ing with Guppy 4.4 in high accuracy mode with similar
followed by the ARTIC pipeline for variant calling. We         quality of the final sequence. While Guppy 4.4 HAC re-
Table 1: Comparison of the results of the ARTIC pipeline using different base callers. The values represent the total
number and median of differences in 23 consensus sequences compared to the baseline. N→B: position marked as
unknown in the baseline was resolved as a base. B→N: position resolved as a base in the baseline was marked as
unknown. B→B: a different base was called. Q50: lower the required demultiplexing score from 60 to 50.

                                 Total 23 samples        Median 23 samples
         Base caller         N→B B→N B→B                N→B B→N B→B                  Hardware to achieve real time
         Guppy 3.4 HAC        2        116      0        0      0       0            High-performance GPU
         Blitz48               4       1135     1        0      4       0            Single desktop CPU core
         Blitz48-Q50           4       404      2        0      3       0
         Blitz96               4       273      0        0      1       0            Multi-core desktop CPU
         Blitz96-Q50           4       131      0        0      2       0
         Coral                 4       261      0        0      1       0            Sub-$100 Coral accelerator
         Coral-Q50            80        113     0        0      1       0
         Osprey                4       253      1        0      0       0            Multi-core desktop CPU
         Osprey-Q50            4       108      1        0      0       0


               Table 2: Mutations identified by various base callers in addition to the gold standard calls.

    Base caller    Barcode     Position    Reference    Variant     Coverage                      Notes
      Blitz48        18         22009          C         CA            31        Frameshift, thus likely invalid mutation
    Blitz48-Q50      18         22009          C         CA            46
    Blitz48-Q50       7         1706          TC          T           473        Frameshift, thus likely invalid mutation
      Osprey         23          237          G          GT            25           Not present in GISAID before,
    Osprey-Q50       23          237          G          GT            30                   probably invalid


quires high performance GPU to have a reasonable run-             interest for a particular country at a particular time, and
ning time, the alternatives only require a CPU or a sub-          each lineage also has a threshold for number of mutations
$100 accelerator connected through a standard USB port.           required to be present to make a call as shown in an exam-
                                                                  ple in Figure 3.
                                                                     During the sequencing run, we use a fast base caller
3   Determining Virus Lineages During                             (Deepnano-blitz 48 in our experiments) to provide live
    Sequencing from Incomplete Data                               base calling and by aligning individual sequencing reads
                                                                  to the reference sequence and simply counting the sup-
One of the key tasks in analysis of sequenced SARS-CoV-           port for a mutation at a particular position, we make provi-
2 samples is determination of the virus lineage according         sional variant calls. Note that this would be highly impre-
to the standardized lineage classification (Rambaut et al.,       cise for insertions and deletions due to the frequent indels
2020). The standard tool to accomplish this task is pan-          in nanopore sequencing reads. For this reason, we only
golin (O’Toole et al., 2021), which uses machine learn-           focus on single nucleotide variants. Once the number of
ing approach to determine the lineage from the finished           characteristic mutations passes the threshold for a particu-
sequence. Pangolin currently fits a (single) decision tree        lar sample, the lineage is provisionally called.
classifier to sequence data to determine the lineage. While          We have integrated our tool within the RAMPART se-
this approach seems to have high accuracy for complete            quencing run monitoring framework (Hadfield, 2021) and
sequence data, it handles incomplete sequences by simply          tested its performance on three runs: one run with 24 bar-
filling them using bases from the reference sequence. This        coded samples, and two with 96 barcoded samples each.
naturally leads to unpredictable changes in classification        There were no disagreements when both our tool and pan-
as sequence is being completed, since each new mutation           golin called the lineage, however, in certain cases one or
might lead to a complete change in the decision tree path.        the other tool did not make a call (see a summary of results
    To quickly make provisional lineage classification, even      in Table 3).
for incomplete sequences during the sequencing, we pro-              Figure 4 shows that our tool can provide early informa-
pose a simple classification scheme based on a manually           tion about lineages detected in the sequencing run. Even
curated list of characteristic mutations. We identify a list      though barcodes in our samples were highly unbalanced,
of these characteristic mutations for expected lineages of        some samples can be identified within minutes of start-
B.1.1.7 14 C3267T C5388A T6954C A23063T C23271A C23604A C23709T T24506G G24914C C27972T G28048T
           A28111G C28977T G28280C A28281T T28282A
B.1.160 5 G9526T G15766T A16889G G17019T G22992A T26876C
B.1.177 6 T445C C6286T G21255C C22227T C26801G C28932T G29645T
B.1.258 4 G12988T G15598A G18028T T24910C T26972C
B.1.221 4 C21855T A25505G G25906C C28651T C28869T
A.23.1 3 C10747T G11521T C23604G T24097C
B.1.351 6 G174T G5230T G23012A A21801C A23063T C28253T C23664T
P.1 8 T733C C2749T C3828T A5648C C12778T C13860T G17259T C21614T C21621A
P.2 7 T10667G C11824T A12964G G23012A C28253T G28628T G28975T C29754T
B.1.427/9 4 G17014T G22018T C26681T A28272T C28887T
B.1.525 8 C1498T A1807G G2659A C6285T T8593C C14407T A21717G C21762T T24224C
B.1.526 8 T9867C C25517T C27925T A20262G C21575T C21846T A22320G C23664T C28869T
B.1.617 8 G210T T22917G C23604G C25469T T27638C G28881T G29402T G29742T
R.1 8 C14340T G17551A C18877T A19167G C19274A G22017T G23012A G23868T T26604C


Figure 3: Mutation specification for determining virus lineages. Note that only certain lineages of interest are included.
For example, to identify sample as B.1.1.7 variant, we require 14 out of 16 mutations listed.


                                                                               80
Table 3: Comparison of our lineage identification with                         75
pangolin results. There were no disagreements (if both                         70
                                                                               65
tools identified a lineage, the output was always the same).                   60
                                                                               55
                                                               Samples classified


In a small number of cases, a lineage was identified only                      50
                                                                               45
by one of the tools.                                                           40
                                                                               35
                                                                               30
                                 Lineage identified by                         25
                                                                               20
    Dataset date   barcodes   both pangolin our tool                           15
                                                                               10                                                  96 barcodes, 2021-03-25
    2021-02-03        24       22       2            0                          5                                                  96 barcodes, 2021-03-11
                                                                                0                                                  24 barcodes, 2021-02-03
    2021-03-11        96       80       1            0                           0        50             100               150            200             250
    2021-03-25        96       70       1            2                                              Minutes from start of sequencing


                                                               Figure 4: The number of samples with lineage classi-
                                                               fication over time. Horizontal lines show the number of
ing the run, and our tool has provided accurate detection
                                                               classified samples at the end of the sequencing run.
of lineages for 50% of barcodes as early as 40 minutes
from the start of a 96-barcode runs. Due to the low quality
                                                                                                NB71 - B.1.1.7
of some samples, we typically run the sequencing for ap-                       1.00                                                       C3267T      G28048T
                                                                                                                                          C5388A      A28111G
proximately 24 hours, so such on-the-fly analysis provides                     0.98                                                       A23063T
                                                                                                                                          C23271A
                                                                                                                                                      C28977T
                                                                                                                                                      G28280C
                                                                                                                                          C23604A     A28281T
                                                               % of mutated bases


us an opportunity to report the basic information on se-                       0.96                                                       C23709T     T28282A
                                                                                                                                          T24506G     T6954C
quenced samples to health authorities as early as one day                      0.94                                                       C27972T     G24914C

before the final analysis is finished.                                         0.92
   While our determination of single nucleotide sequence                       0.90
                                                                               0.88
variants is somewhat simplistic, Figure 5 shows that on
                                                                               0.86
real data even such a simple method can achieve results
                                                                                   0   100        200        300        400      500
with high confidence. In all cases, mutations were sup-                                  Minutes from start of sequencing
ported by over 85% of reads and there were no calls that
would suffer from ambiguity.                                   Figure 5: The percentage of overlapping reads supporting
                                                               individual identified mutations over time.
4     Conclusions and Discussion
                                                               context of the ARTIC pipeline and determined that they
One of the great advantages of nanopore sequencing is the      can provide results with similar quality at a fraction of
ability to analyze data as they are sequenced. Fast base       computational cost.
callers that can replace default base callers provided by        In the case of the ARTIC pipeline, the quality of base
Oxford Nanopore Technologies are a key in utilizing this       calls mainly affects the demultiplexing stage, and does
advantage. Here, we have evaluated fast base callers in the    not play as important role in the variant calling since this
is done with the assistance of the raw sequencing signal.      Payne, A., Holmes, N., Clarke, T., Munro, R., Debebe,
Moreover, we have also demonstrated that fast base callers       B. J., and Loose, M. (2021). Readfish enables targeted
can be used in the context of RAMPART monitoring tool            nanopore sequencing of gigabase-sized genomes. Na-
to identify virus lineages on-the-fly during the sequencing.     ture biotechnology, 39(4):442–450.
Such application allows us to relay important information
to health authorities much faster.                             Perešíni, P., Boža, V., Brejová, B., and Vinař, T. (2020).
   One of the advantages of RAMPART monitoring tool              Nanopore Base Calling on the Edge. Technical Report
is that it can monitor in real time the coverage of all re-      arXiV:2011.04312, arXiv.
gions in all barcoded samples, allowing us to make an in-      Quick, J., Loman, N. J., Duraffour, S., Simpson, J. T., Sev-
formed determination when to stop the sequencing run. As         eri, E., Cowley, L., Bore, J. A., Koundouno, R., Dudas,
a future work, we would like to use a similar framework          G., Mikhail, A., Ouedraogo, N., Afrough, B., Bah, A.,
in connection with the selective sequencing (Payne et al.,       Baum, J. H., Becker-Ziaja, B., Boettcher, J. P., Cabeza-
2021) to achieve a more uniform coverage between sam-            Cabrerizo, M., Camino-Sanchez, A., Carter, L. L., Do-
ples, as well as to mitigate uneven coverage within sam-         errbecker, J., Enkirch, T., Dorival, I. G. G., Hetzelt, N.,
ples stemming from varying efficiency of individual PCR          Hinzmann, J., Holm, T., Kafetzopoulou, L. E., Koro-
primers, by rejecting reads belonging to the regions that        pogui, M., Kosgey, A., Kuisma, E., Logue, C. H., Maz-
are already well covered.                                        zarelli, A., Meisel, S., Mertens, M., Michel, J., Ngabo,
                                                                 D., Nitzsche, K., Pallash, E., Patrono, L. V., Portmann,
Acknowledgements. This research was supported by a               J., Repits, J. G., Rickett, N. Y., Sachse, A., Singethan,
grant ITMS:313011ATL7 “Pangenomics for personalized              K., Vitoriano, I., Yemanaberhan, R. L., Zekeng, E. G.,
clinical management of infected persons based on identi-         Trina, R., Bello, A., Sall, A. A., Faye, O., Faye, O., Ma-
fied viral genome and human exome” from the Operational          gassouba, N., Williams, C. V., Amburgey, V., Winona,
Program Integrated Infrastructure (90%) co-financed by           L., Davis, E., Gerlach, J., Washington, F., Monteil, V.,
the European Regional Development Fund. The research             Jourdain, M., Bererd, M., Camara, A., Somlare, H., Ca-
was also supported by VEGA 1/0458/18 to TV (10%).                mara, A., Gerard, M., Bado, G., Baillet, B., Delaune,
                                                                 D., Nebie, K. Y., Diarra, A., Savane, Y., Pallawo, R. B.,
                                                                 Gutierrez, G. J., Milhano, N., Roger, I., Williams, C. J.,
References                                                       Yattara, F., Lewandowski, K., Taylor, J., Rachwal, P.,
                                                                 Turner, D., Pollakis, G., Hiscox, J. A., Matthews, D. A.,
Boža, V., Perešíni, P., Brejová, B., and Vinař, T. (2020).      O’Shea, M. K., Johnston, A. M., Wilson, D., Hutley,
  Deepnano-blitz: a fast base caller for minion nanopore         E., Smit, E., Di Caro, A., Woelfel, R., Stoecker, K.,
  sequencers. Bioinformatics, 36(14):4191–4192.                  Fleischmann, E., Gabriel, M., Weller, S. A., Koivogui,
                                                                 L., Diallo, B., Keita, S., Rambaut, A., Formenty, P.,
Boža, V., Perešíni, P., Brejová, B., and Vinař, T. (2021).
                                                                 Gunther, S., and Carroll, M. W. (2016). Real-time,
  Dynamic Pooling Improves Nanopore Base Calling Ac-
                                                                 portable genome sequencing for Ebola surveillance. Na-
  curacy. London Calling 2021 poster.
                                                                 ture, 530(7589):228–232.
Hadfield, J. (2021).       Rampart: Read assignment,           Rambaut, A., Holmes, E. C., O’Toole, A., Hill, V., Mc-
  mapping, and phylogenetic analysis in real time.               Crone, J. T., Ruis, C., du Plessis, L., and Pybus, O. G.
  https://github.com/artic-network/rampart.                      (2020). A dynamic nomenclature proposal for SARS-
                                                                 CoV-2 lineages to assist genomic epidemiology. Nat
Huang, N., Nie, F., Ni, P., Luo, F., and Wang, J.
                                                                 Microbiol, 5(11):1403–1407.
  (2020). Sacall: a neural network basecaller for oxford
  nanopore sequencing data based on self-attention mech-       Resende, P. C. et al. (2020). SARS-CoV-2 genomes
  anism. IEEE/ACM Transactions on Computational Bi-              recovered by long amplicon tiling multiplex ap-
  ology and Bioinformatics.                                      proach using nanopore sequencing and applicable
                                                                 to other sequencing platforms.     Technical Report
Loman, N. J., Quick, J., and Simpson, J. T. (2015). A com-       doi:10.1101/2020.04.30.069039, bioRxiv.
  plete bacterial genome assembled de novo using only
  nanopore sequencing data. Nat Methods, 12(8):733–            Seymour, C. (2020).             Bonito:    A        pytorch
  735.                                                           basecaller      for    oxford     nanopore         reads.
                                                                 https://github.com/nanoporetech/bonito.
O’Toole, A., Scher, E., Underwood, A., Jackson, B., Hill,
  V., McCrone, J., Ruis, C., Abu-Dahab, K., Taylor, B.,        Tyson, J. R. et al. (2020).     Improvements to the
  Yeats, C., du Plessis, L., Aanensen, D., Holmes, E., Py-       ARTIC multiplex PCR method for SARS-CoV-2
  bus, O., and Rambaut, A. (2021). pangolin: lineage             genome sequencing using nanopore. Technical Report
  assignment in an emerging pandemic as an epidemio-             doi:10.1101/2020.09.04.283077, bioRxiv.
  logical tool. github.com/cov-lineages/pangolin.

</pre>