=Paper=
{{Paper
|id=Vol-2962/paper11
|storemode=property
|title=Alternative Base Callers Aid Real-time Analysis of SARS-CoV-2 Sequencing Runs
|pdfUrl=https://ceur-ws.org/Vol-2962/paper11.pdf
|volume=Vol-2962
|authors=Vladimír Boža,Matej Fedor,Kristína Boršová,Viktória Čabanová,Jana Černíková,Viktória Hodorová,Peter Perešíni,Klára Sládečková,Boris Klempa,Jozef Nosek,Broňa Brejová,Tomáš Vinař
|dblpUrl=https://dblp.org/rec/conf/itat/BozaFBCCHPSKNBV21
}}
==Alternative Base Callers Aid Real-time Analysis of SARS-CoV-2 Sequencing Runs==
Alternative Base Callers Aid Real-Time Analysis of SARS-CoV-2 Sequencing Runs Vladimír Boža1 , Matej Fedor1 , Kristína Boršová2,3 , Viktória Čabanová3 , Jana Černíková1 , Viktória Hodorová2 , Peter Perešíni1 , Klára Sládečková1 , Boris Klempa3 , Jozef Nosek2 , Broňa Brejová1 , Tomáš Vinař1 1 Faculty of Mathematics, Physics and Informatics, Comenius University, Bratislava, Slovakia 2 Faculty of Natural Sciences, Comenius University, Bratislava, Slovakia 3 Biomedical Research Center of the Slovak Academy of Sciences, Bratislava, Slovakia Abstract: One of the advantages of nanopore sequenc- the raw sequencing signal using nanopolish (Loman et al., ing is its ability to provide data in real time, which al- 2015). lows monitoring, early stopping, and fast identification One of the problems with this protocol is that the PCR of mutations in sequenced material. Nanopore sequencer amplification step introduces wide variation in coverage, measures electrical current induced by the DNA passing both between samples and between different amplicons through a pore and this signal needs to be translated to within a sample. Due to the high error rate of nanopore a string over the alphabet {A,C,G,T} through a process sequencing, it is not advisable to determine mutations in called base calling. To achieve base calling in real time, regions with low coverage (the standard pipeline set the the mainstream tools (such as Guppy provided by Ox- coverage threshold at 20). In such scenarios, it is diffi- ford Nanopore Technologies) require the support of high- cult to estimate when to stop data acquisition. Fortunately, performance GPUs. This is prohibitive in many settings. results of nanopore sequencing can be processed in real- Here, we evaluate the accuracy of several alternative base time and on-the-fly monitoring during sequencing helps to callers, which only require use of a desktop CPU or a sup- inform decisions on when to stop the run. port of low-cost USB-connected accelerator. While their A nanopore sequencer reads an electrical signal induced accuracy is, in general, lower than that of Guppy in a high- by the DNA passing through a pore and before subse- accuracy mode using GPUs, we show that these alterna- quent analysis, this signal needs to be translated to DNA tive base callers can act as a replacement for monitoring bases via base calling. A base caller provided by manu- and mutation detection in SARS-CoV-2 sequencing runs, facturer (called Guppy), requires a machine with a high without sacrificing the accuracy of the final result. performance GPU, which is not available in many laptop Availability: http://compbio.fmph.uniba.sk/ computers and is also problem in desktops due to current sars-cov-2-sequencing/ NVIDIA GPU shortages. In this work, we propose to use alternative base callers with lower demands on computational resources, albeit 1 Introduction producing reads with a slightly lower accuracy (Boža et al., 2020; Perešíni et al., 2020; Boža et al., 2021). We The ARTIC protocol has originally been developed for se- demonstrate that using our alternative base caller not only quencing viral genomes with nanopore sequencing devices allows monitoring, but can also produce the final sequence (Quick et al., 2016), and it has become a commonly used of similar quality as using the standard base caller. More- protocol for SARS-CoV-2 sequencing (Tyson et al., 2020). over, we are able to call tentative variants during sequenc- Briefly, overlapping segments of the viral genome are first ing from incomplete sequence using a custom made clas- amplified using PCR, and the resulting amplicons are se- sifier. This allows us to report important information about quenced using nanopore sequencing (see a simplified il- virus lineage determination already during the sequencing lustration in Figure 1). Typically, multiple samples are run, well before the full sequence is determined. sequenced in parallel using barcoding. In bioinformatics post processing, the individual reads are first assigned to individual samples, using demultiplexing according to the 2 Evaluation of Alternative Base Callers barcodes. Stricter parameters (requiring the presence of barcodes on both ends of the read) are typically used in We have evaluated three alternative base callers that order to avoid barcode bleeding and to discard partially can achieve real-time base calling without the use of a sequenced reads. The reads are then aligned to the refer- GPU: Deepnano-blitz (Boža et al., 2020), Deepnano-Coral ence genome and mutations are discovered with the aid of (Perešíni et al., 2020), and Osprey (Boža et al., 2021). There are also other alternative base callers such as Bonito Copyright c 2021 for this paper by its authors. Use permitted un- (Seymour, 2020) and SACall (Huang et al., 2020), but der Creative Commons License Attribution 4.0 International (CC BY none of them offers real-time base calling on a CPU or 4.0). a low power USB-connected TPU. Input cDNA Primer pool 1 b) Primer pool 2 a) Combination of pools c) AAAGTAGATGCTAAAGCTTACAAAGAAGT AAA GTAGATGCTAAAGCTTACAAAGA AGT AAA ATTCTTTTAAGGCGGGTCATGGT AGT GGGCCTTTTTATATATCCTACTATTGTTT TATCTCTGCTATAGTAACCTGAAAGTCTC GGG CCTTTTTATATATCCTACTATTG TTT AAAATTCTTTTAAGGCGGGTCATGGTAGT TAT CTCTGCTATAGTAACCTGAAAGT CTC TATTTATGTTCTTTTAACGTGCAACCCTC TAT TTATGTTCTTTTAACGTGCAACC CTC d) e) AGGTGCCACTACTTGTGGTTACTTACCCCAAAATGCTGTTGTTAAAATTTATTGTCCAGC AGGTGCCACTACATGTGGTTACTTACCCCAAAA GGTGCCACTACATGTGGTTACTTACCCCAAAAT GTGCCACTACTTGTGGTTACTTACCCCAA GGTGCCACTACATGTGGTTACTTACCCCAAAA GTGCCACTACATGTGGTTACTTACCCCAAAA TTACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAGC TACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAG CTTACCCCAAAATGCTGTTGTT-AAATTTATTGTCC TTACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAG ACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAGC f) AGGTGCCACTACATGTGGTTACTTACCCCAAAATGCTGTTGTT-AAATTTATTGTCCAGC g) Figure 1: A simplified illustration of the ARTIC protocol workflow. (a) First each virus sample is amplified, using target-specific primers. Two pools of primers are used to obtain overlapping amplicons. (b) Amplicons from multiple samples are tagged using barcodes and sequenced together. (c) All reads are sequenced on a MinION device which for each sequenced molecule produces an electrical signal. (d) The electrical signal is converted to the string using base calling software. (e) Barcodes at ends of reads are recognized by the demultiplexing software and individual reads are assigned to their samples. (f) Reads are mapped to the reference. (g) Mutations are called based on the consensus of multiple reads. Deepnano-blitz (Boža et al., 2020) is a real-time CPU 104 base caller based on recurrent neural networks. Deepnano- blitz allows adjustment of the time vs. accuracy tradeoff 103 by changing the size of the neural network model. Smaller version (48) can run in real time on a single CPU core, Coverage larger version (96) requires multiple cores to achieve real- 102 time performance. The accuracy of the smaller version is slightly lower than the accuracy of Guppy 4.4 in the fast mode, the larger version is comparable to Guppy 4.4 in the 101 barcode 01 barcode 06 barcode 11 barcode 16 barcode 20 fast mode. barcode 02 barcode 07 barcode 12 barcode 17 barcode 22 barcode 03 barcode 08 barcode 13 barcode 18 barcode 23 Deepnano-Coral (Perešíni et al., 2020) is a convolu- barcode 04 barcode 09 barcode 14 barcode 19 barcode 24 100 barcode 05 barcode 10 barcode 15 tional neural network base caller. It requires Coral Edge 0 5000 10000 15000 20000 25000 30000 Position TPU, which is a sub-$100 accelerator from Google that can connect to a USB port, with very low power require- ments. Deepnano-Coral is best suited for laptop com- Figure 2: Variance in the coverage within and between puters that do not have GPU support, as well as in sce- samples. narios where power consumption may become a limiting factor (such as sequencing in the field). The accuracy of real-time base calling with Deepnano-Coral falls between used the results of this standard pipeline as a ground truth. Guppy 4.4 fast and high-accuracy (HAC) modes. For each of the above mentioned base callers, as well Osprey (Boža et al., 2021) is a CPU-based base caller as for Guppy 3.4 in the high-accuracy mode, we reran the that uses architecture similar to Deepnano-Coral, but is ARTIC pipeline with their base calls and compare the re- further improved by using a technique called dynamic sults (see Table 1). Guppy 3.4 was used as a representa- pooling and decoding via transducers. The accuracy of tive base caller from a year ago. We also run all of our real-time base calling is equivalent to Guppy 3.4 HAC and base callers with a lower demutiplexing threshold, which better than Guppy 4.4 fast. Computational requirements slightly increases the coverage, due to more reads being are similar to Deepnano-blitz 96. demutiplexed to individual samples. Using faster base callers usually results in sacrificing Only very few positions (up to 2 in 23 samples) are accuracy at the individual read level. However, in case of called differently (B→B column). Even though these the ARTIC pipeline, multiple reads are aligned to each re- clearly represent erroneous base calls (see Table 2), there gion, and only differences that consistently occur in many are so few of them that they do not impact the overall reads are considered proper variants. Moreover, the AR- accuracy significantly. The largest problem presents an TIC pipeline uses Nanopolish (Loman et al., 2015), which increased number of “unknown” calls (B→N column). works directly with the raw sequencing signal, as an un- These are mainly concentrated within a single 310bp re- derlying variant caller. Therefore the base calling accu- gion (21242-21551) which in several samples had an ex- racy is not as important, since base calls are only used for tremely low coverage (see Figure 2). With lower efficiency demultiplexing and for the initial alignment of the read to of demultiplexing due to base calling errors, the coverage the reference in Nanopolish. of this region was in some samples pushed below the mini- The ARTIC pipeline sometimes calls a particular base mum coverage threshold of the ARTIC pipeline and conse- as unknown (denoted as N in the sequence). This can quently was masked with Ns in the result. There were sev- happen for two reasons: low coverage of an amplicon or eral additional “unknown” calls of individual bases which conflicting information from different sequencing reads. were clustered around certain positions in the genome. We Assigning an unknown base represents a conservative de- suspect that this is due to some biases stemming from cision and is used wherever it is impossible to decide nanopore sequencing, where variants of some bases in cer- whether a particular base is the same as the reference or tain contexts are difficult to distinguish. represents a mutation with high enough confidence. On the other hand, some additional bases are called We have evaluated the performance of each of the compared to baseline (N→B column). In all cases, these above mentioned base callers in the context of the AR- were called as the original reference. Almost all cases TIC pipeline. For the evaluation purposes, we have used were at positions 16255 and 16256 and one case was in the a sequencing run from January 13, 2021 with 23 barcoded region 21220-21296, where coral-q50 increased the cover- SARS-CoV-2 samples (the 24th sample was excluded due age over the minimum threshold. to very low coverage) using a MinION run with R9.4.1 While in some cases the use of our alternative base flow cell, LSK109 chemistry, and 2-kbp amplicon scheme callers may result in an incomplete sequence (compared by Resende et al. (2020). In the standard software pipeline, to the baseline), in general our results show that each of we use Guppy 4.4 (highest version available in the time of these tools is a viable alternative to the standard base call- analysis) in the high accuracy mode to base call the reads, ing with Guppy 4.4 in high accuracy mode with similar followed by the ARTIC pipeline for variant calling. We quality of the final sequence. While Guppy 4.4 HAC re- Table 1: Comparison of the results of the ARTIC pipeline using different base callers. The values represent the total number and median of differences in 23 consensus sequences compared to the baseline. N→B: position marked as unknown in the baseline was resolved as a base. B→N: position resolved as a base in the baseline was marked as unknown. B→B: a different base was called. Q50: lower the required demultiplexing score from 60 to 50. Total 23 samples Median 23 samples Base caller N→B B→N B→B N→B B→N B→B Hardware to achieve real time Guppy 3.4 HAC 2 116 0 0 0 0 High-performance GPU Blitz48 4 1135 1 0 4 0 Single desktop CPU core Blitz48-Q50 4 404 2 0 3 0 Blitz96 4 273 0 0 1 0 Multi-core desktop CPU Blitz96-Q50 4 131 0 0 2 0 Coral 4 261 0 0 1 0 Sub-$100 Coral accelerator Coral-Q50 80 113 0 0 1 0 Osprey 4 253 1 0 0 0 Multi-core desktop CPU Osprey-Q50 4 108 1 0 0 0 Table 2: Mutations identified by various base callers in addition to the gold standard calls. Base caller Barcode Position Reference Variant Coverage Notes Blitz48 18 22009 C CA 31 Frameshift, thus likely invalid mutation Blitz48-Q50 18 22009 C CA 46 Blitz48-Q50 7 1706 TC T 473 Frameshift, thus likely invalid mutation Osprey 23 237 G GT 25 Not present in GISAID before, Osprey-Q50 23 237 G GT 30 probably invalid quires high performance GPU to have a reasonable run- interest for a particular country at a particular time, and ning time, the alternatives only require a CPU or a sub- each lineage also has a threshold for number of mutations $100 accelerator connected through a standard USB port. required to be present to make a call as shown in an exam- ple in Figure 3. During the sequencing run, we use a fast base caller 3 Determining Virus Lineages During (Deepnano-blitz 48 in our experiments) to provide live Sequencing from Incomplete Data base calling and by aligning individual sequencing reads to the reference sequence and simply counting the sup- One of the key tasks in analysis of sequenced SARS-CoV- port for a mutation at a particular position, we make provi- 2 samples is determination of the virus lineage according sional variant calls. Note that this would be highly impre- to the standardized lineage classification (Rambaut et al., cise for insertions and deletions due to the frequent indels 2020). The standard tool to accomplish this task is pan- in nanopore sequencing reads. For this reason, we only golin (O’Toole et al., 2021), which uses machine learn- focus on single nucleotide variants. Once the number of ing approach to determine the lineage from the finished characteristic mutations passes the threshold for a particu- sequence. Pangolin currently fits a (single) decision tree lar sample, the lineage is provisionally called. classifier to sequence data to determine the lineage. While We have integrated our tool within the RAMPART se- this approach seems to have high accuracy for complete quencing run monitoring framework (Hadfield, 2021) and sequence data, it handles incomplete sequences by simply tested its performance on three runs: one run with 24 bar- filling them using bases from the reference sequence. This coded samples, and two with 96 barcoded samples each. naturally leads to unpredictable changes in classification There were no disagreements when both our tool and pan- as sequence is being completed, since each new mutation golin called the lineage, however, in certain cases one or might lead to a complete change in the decision tree path. the other tool did not make a call (see a summary of results To quickly make provisional lineage classification, even in Table 3). for incomplete sequences during the sequencing, we pro- Figure 4 shows that our tool can provide early informa- pose a simple classification scheme based on a manually tion about lineages detected in the sequencing run. Even curated list of characteristic mutations. We identify a list though barcodes in our samples were highly unbalanced, of these characteristic mutations for expected lineages of some samples can be identified within minutes of start- B.1.1.7 14 C3267T C5388A T6954C A23063T C23271A C23604A C23709T T24506G G24914C C27972T G28048T A28111G C28977T G28280C A28281T T28282A B.1.160 5 G9526T G15766T A16889G G17019T G22992A T26876C B.1.177 6 T445C C6286T G21255C C22227T C26801G C28932T G29645T B.1.258 4 G12988T G15598A G18028T T24910C T26972C B.1.221 4 C21855T A25505G G25906C C28651T C28869T A.23.1 3 C10747T G11521T C23604G T24097C B.1.351 6 G174T G5230T G23012A A21801C A23063T C28253T C23664T P.1 8 T733C C2749T C3828T A5648C C12778T C13860T G17259T C21614T C21621A P.2 7 T10667G C11824T A12964G G23012A C28253T G28628T G28975T C29754T B.1.427/9 4 G17014T G22018T C26681T A28272T C28887T B.1.525 8 C1498T A1807G G2659A C6285T T8593C C14407T A21717G C21762T T24224C B.1.526 8 T9867C C25517T C27925T A20262G C21575T C21846T A22320G C23664T C28869T B.1.617 8 G210T T22917G C23604G C25469T T27638C G28881T G29402T G29742T R.1 8 C14340T G17551A C18877T A19167G C19274A G22017T G23012A G23868T T26604C Figure 3: Mutation specification for determining virus lineages. Note that only certain lineages of interest are included. For example, to identify sample as B.1.1.7 variant, we require 14 out of 16 mutations listed. 80 Table 3: Comparison of our lineage identification with 75 pangolin results. There were no disagreements (if both 70 65 tools identified a lineage, the output was always the same). 60 55 Samples classified In a small number of cases, a lineage was identified only 50 45 by one of the tools. 40 35 30 Lineage identified by 25 20 Dataset date barcodes both pangolin our tool 15 10 96 barcodes, 2021-03-25 2021-02-03 24 22 2 0 5 96 barcodes, 2021-03-11 0 24 barcodes, 2021-02-03 2021-03-11 96 80 1 0 0 50 100 150 200 250 2021-03-25 96 70 1 2 Minutes from start of sequencing Figure 4: The number of samples with lineage classi- fication over time. Horizontal lines show the number of ing the run, and our tool has provided accurate detection classified samples at the end of the sequencing run. of lineages for 50% of barcodes as early as 40 minutes from the start of a 96-barcode runs. Due to the low quality NB71 - B.1.1.7 of some samples, we typically run the sequencing for ap- 1.00 C3267T G28048T C5388A A28111G proximately 24 hours, so such on-the-fly analysis provides 0.98 A23063T C23271A C28977T G28280C C23604A A28281T % of mutated bases us an opportunity to report the basic information on se- 0.96 C23709T T28282A T24506G T6954C quenced samples to health authorities as early as one day 0.94 C27972T G24914C before the final analysis is finished. 0.92 While our determination of single nucleotide sequence 0.90 0.88 variants is somewhat simplistic, Figure 5 shows that on 0.86 real data even such a simple method can achieve results 0 100 200 300 400 500 with high confidence. In all cases, mutations were sup- Minutes from start of sequencing ported by over 85% of reads and there were no calls that would suffer from ambiguity. Figure 5: The percentage of overlapping reads supporting individual identified mutations over time. 4 Conclusions and Discussion context of the ARTIC pipeline and determined that they One of the great advantages of nanopore sequencing is the can provide results with similar quality at a fraction of ability to analyze data as they are sequenced. Fast base computational cost. callers that can replace default base callers provided by In the case of the ARTIC pipeline, the quality of base Oxford Nanopore Technologies are a key in utilizing this calls mainly affects the demultiplexing stage, and does advantage. Here, we have evaluated fast base callers in the not play as important role in the variant calling since this is done with the assistance of the raw sequencing signal. Payne, A., Holmes, N., Clarke, T., Munro, R., Debebe, Moreover, we have also demonstrated that fast base callers B. J., and Loose, M. (2021). Readfish enables targeted can be used in the context of RAMPART monitoring tool nanopore sequencing of gigabase-sized genomes. Na- to identify virus lineages on-the-fly during the sequencing. ture biotechnology, 39(4):442–450. Such application allows us to relay important information to health authorities much faster. Perešíni, P., Boža, V., Brejová, B., and Vinař, T. (2020). One of the advantages of RAMPART monitoring tool Nanopore Base Calling on the Edge. Technical Report is that it can monitor in real time the coverage of all re- arXiV:2011.04312, arXiv. gions in all barcoded samples, allowing us to make an in- Quick, J., Loman, N. J., Duraffour, S., Simpson, J. T., Sev- formed determination when to stop the sequencing run. As eri, E., Cowley, L., Bore, J. A., Koundouno, R., Dudas, a future work, we would like to use a similar framework G., Mikhail, A., Ouedraogo, N., Afrough, B., Bah, A., in connection with the selective sequencing (Payne et al., Baum, J. H., Becker-Ziaja, B., Boettcher, J. P., Cabeza- 2021) to achieve a more uniform coverage between sam- Cabrerizo, M., Camino-Sanchez, A., Carter, L. L., Do- ples, as well as to mitigate uneven coverage within sam- errbecker, J., Enkirch, T., Dorival, I. G. G., Hetzelt, N., ples stemming from varying efficiency of individual PCR Hinzmann, J., Holm, T., Kafetzopoulou, L. E., Koro- primers, by rejecting reads belonging to the regions that pogui, M., Kosgey, A., Kuisma, E., Logue, C. H., Maz- are already well covered. zarelli, A., Meisel, S., Mertens, M., Michel, J., Ngabo, D., Nitzsche, K., Pallash, E., Patrono, L. V., Portmann, Acknowledgements. This research was supported by a J., Repits, J. G., Rickett, N. Y., Sachse, A., Singethan, grant ITMS:313011ATL7 “Pangenomics for personalized K., Vitoriano, I., Yemanaberhan, R. L., Zekeng, E. G., clinical management of infected persons based on identi- Trina, R., Bello, A., Sall, A. A., Faye, O., Faye, O., Ma- fied viral genome and human exome” from the Operational gassouba, N., Williams, C. V., Amburgey, V., Winona, Program Integrated Infrastructure (90%) co-financed by L., Davis, E., Gerlach, J., Washington, F., Monteil, V., the European Regional Development Fund. The research Jourdain, M., Bererd, M., Camara, A., Somlare, H., Ca- was also supported by VEGA 1/0458/18 to TV (10%). mara, A., Gerard, M., Bado, G., Baillet, B., Delaune, D., Nebie, K. Y., Diarra, A., Savane, Y., Pallawo, R. B., Gutierrez, G. J., Milhano, N., Roger, I., Williams, C. J., References Yattara, F., Lewandowski, K., Taylor, J., Rachwal, P., Turner, D., Pollakis, G., Hiscox, J. A., Matthews, D. A., Boža, V., Perešíni, P., Brejová, B., and Vinař, T. (2020). O’Shea, M. K., Johnston, A. M., Wilson, D., Hutley, Deepnano-blitz: a fast base caller for minion nanopore E., Smit, E., Di Caro, A., Woelfel, R., Stoecker, K., sequencers. Bioinformatics, 36(14):4191–4192. Fleischmann, E., Gabriel, M., Weller, S. A., Koivogui, L., Diallo, B., Keita, S., Rambaut, A., Formenty, P., Boža, V., Perešíni, P., Brejová, B., and Vinař, T. (2021). Gunther, S., and Carroll, M. W. (2016). Real-time, Dynamic Pooling Improves Nanopore Base Calling Ac- portable genome sequencing for Ebola surveillance. Na- curacy. London Calling 2021 poster. ture, 530(7589):228–232. Hadfield, J. (2021). Rampart: Read assignment, Rambaut, A., Holmes, E. C., O’Toole, A., Hill, V., Mc- mapping, and phylogenetic analysis in real time. Crone, J. T., Ruis, C., du Plessis, L., and Pybus, O. G. https://github.com/artic-network/rampart. (2020). A dynamic nomenclature proposal for SARS- CoV-2 lineages to assist genomic epidemiology. Nat Huang, N., Nie, F., Ni, P., Luo, F., and Wang, J. Microbiol, 5(11):1403–1407. (2020). Sacall: a neural network basecaller for oxford nanopore sequencing data based on self-attention mech- Resende, P. C. et al. (2020). SARS-CoV-2 genomes anism. IEEE/ACM Transactions on Computational Bi- recovered by long amplicon tiling multiplex ap- ology and Bioinformatics. proach using nanopore sequencing and applicable to other sequencing platforms. Technical Report Loman, N. J., Quick, J., and Simpson, J. T. (2015). A com- doi:10.1101/2020.04.30.069039, bioRxiv. plete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods, 12(8):733– Seymour, C. (2020). Bonito: A pytorch 735. basecaller for oxford nanopore reads. https://github.com/nanoporetech/bonito. O’Toole, A., Scher, E., Underwood, A., Jackson, B., Hill, V., McCrone, J., Ruis, C., Abu-Dahab, K., Taylor, B., Tyson, J. R. et al. (2020). Improvements to the Yeats, C., du Plessis, L., Aanensen, D., Holmes, E., Py- ARTIC multiplex PCR method for SARS-CoV-2 bus, O., and Rambaut, A. (2021). pangolin: lineage genome sequencing using nanopore. Technical Report assignment in an emerging pandemic as an epidemio- doi:10.1101/2020.09.04.283077, bioRxiv. logical tool. github.com/cov-lineages/pangolin.