=Paper=
{{Paper
|id=Vol-2962/paper10
|storemode=property
|title=Estimation of Proportions of SARS-CoV-2 Variants in a Mixed Sequencing Sample
|pdfUrl=https://ceur-ws.org/Vol-2962/paper10.pdf
|volume=Vol-2962
|authors=Askar Gafurov,Andrej Baláž,Tomáš Vinař,Broňa Brejová
|dblpUrl=https://dblp.org/rec/conf/itat/GafurovBVB21
}}
==Estimation of Proportions of SARS-CoV-2 Variants in a Mixed Sequencing Sample==
Estimation of Proportions of SARS-CoV-2 Variants in a Mixed Sequencing Sample Askar Gafurov1 , Andrej Baláž2 , Tomáš Vinař2 , and Broňa Brejová1 1 Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia 2 Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Slovakia Abstract: Estimation of proportions of SARS-CoV-2 2 Problem description genome variants (e.g. variant B.1.1.7 originating from Britain, variant B.1.351 originating from South-Africa) in 2.1 Variants a population is currently done by sequencing individual In our experiments, we consider several variants of the samples, which demands individual laboratory processing SARS-CoV-2 virus, which are characterized by specific of each sample. This labor can be significantly reduced mutations described in the Table 1. For instance, if a par- by mixing several samples together and processing them ticular SARS-CoV-2 genome has the nucleotide at posi- in one batch. Our project aims to estimate the proportion tion 3267 mutated from cytosine to thymine, it has one of samples with given variants from such mixtures using of the mutations characteristic for the variant which origi- probabilistic modeling. nated in the United Kingdom [14] and is known as a vari- ant of concern alpha [16]. In the Pangolin SARS-CoV-2 1 Introduction lineage classification [15], it is denoted B.1.1.7. In this paper, we will denote this variant by acronym UK listed Since December 2019, the severe acute respiratory syn- in the table. In our list of variants, we have included two drome coronavirus 2 (SARS-CoV-2) is rapidly spreading additional variant of concern (beta and gamma) as well as across the world, causing the coronavirus infectious dis- several variants that had high prevalence in Slovakia in the ease (COVID-19) pandemic. Up to April 5, 2021, there fall of 2020 and early 2021, which is the time from which were 132 mil. cases of COVID-19 infection. Among these, our data originate. For each variant, the table also con- 109 mil. cases have already had an outcome from which tains the minimum number of these characteristic muta- there was 2.8 mil. deaths, resulting in an estimate of the tions which need to be observed in the genome in order to mortality rate around 2.5% [18]. be considered as belonging to a given variant in our study. Due to its worldwide spread and a mutation rate of about If a genome did not reach the number of characteristic mu- 2 mutations per month [9], the SARS-CoV-2 developed tations for any considered variant, it was assigned the label multiple variants. Some emerging variants have accumu- “other”. lated significantly more mutations and proved to be more This characterization was used on all samples from the dangerous, causing concerns around the globe [4]. Scien- GISAID database (downloaded on Mar 4, 2021) [5], which tists estimate that continued transmission of SARS-CoV-2 provides a comprehensive resource of more than 650 thou- and selective pressures, such as vaccines, are creating ideal sands fully assembled SARS-CoV-2 genomes. Individ- conditions for additional significant virus evolution [11]. ual genomes were aligned to the SARS-CoV-2 reference Therefore, it is critical to characterize the virus strains fur- Wuhan/Hu-1/2019 by minimap2 [10]. Then, each se- ther and monitor the spread of the variants in the popula- quence was assigned a label as described earlier. Fi- tion in order to inform public policies and assess the effec- nally, a matrix P was constructed, which consisted of tiveness of containment strategies. entries pk,i,a representing the relative frequency of nu- Monitoring of the SARS-CoV-2 variants requires con- cleotide a at genomic position i of variant k among aligned ducting many sequencing experiments to determine the genomes. genome of the virus and to identify its mutations compared This matrix thus characterizes each variant not only to a reference genome. Based on this, it is then possi- by the selected mutations listed in Table 1, but also cap- ble to decide which variant was sequenced. Some part of tures any additional significant mutations present in the the laboratory work is done individually for each sample, genomic sequences classified to the variant. This approach which is time-consuming and resource-intensive. In this can be easily applied to a different set of variants by simply work, we propose a model capable of estimating the pro- modifying the input table of variants and their characteris- portions of different variants in a pooled sample, which tic mutations. allows to combine multiple samples into one sequencing experiment. 2.2 Sequencing data Copyright c 2021 for this paper by its authors. Use permitted un- der Creative Commons License Attribution 4.0 International (CC BY In order to identify the SARS-CoV-2 variant of a new 4.0). patient, it is necessary to sequence an obtained sam- Variant Acronym Required Mutations P.1 (gamma) Brazil 8 T733C, C2749T, C3828T, A5648C, C12778T, B.1.258 CZ 4 G12988T, G15598A, G18028T, T24910C, T26972C B.1.177 EU1 4 C22227T, C28932T, G29645T, G21255C, C26801G B.1.160 EU2 3 C4543T, G5629T, G22992A, T26876C B.1.351 (beta) South.Afr 3 G23012A, A21801C, A23063T, G22813T B.1.1.7 (alpha) UK 11 C3267T, C5388A, T6954C, A23063T, C23271A, C23604A, C23709T, T24506G, G24914C, C27972T, G28048T, A28111G, C28977T C13860T, G17259T, C21614T, C21621A B.1.1.170 UKBA318 3 G12824A, C25777T, G26062T, C29754T Table 1: Characteristic mutations of individual variants considered in this study. ple. In this work, the used samples were sequenced us- 5 months after the analysis. This suggests that sewage ing the ARTIC amplicon protocol [17] and the Oxford samples may provide a more comprehensive snapshot of Nanopore MinION sequencer. currently circulating SARS-CoV-2 variants in comparison In the ARTIC protocol, the RNA of the virus is first with solely clinical cases. reverse transcribed into DNA and specific regions of the DNA are amplified. These amplicons cover almost A wastewater study conducted in Spain [13] identified the entire length of the genome with slight overlaps be- 238 SNVs and 6 deletions in comparison with the refer- tween adjacent regions. In our data, the amplicons had ence genome of SARS-CoV-2 isolate Wuhan-Hu-1. The length about 1700 bp. Amplified DNA was sequenced study used 40 samples, which were sequenced with AR- by the MinION sequencer, and the resulting sequences are TIC protocol v.3, analysed using the iVAR software. called reads. The reads are then mapped to the reference genome by tool minimap2 [10] and stored in the BAM for- In the wastewater study from Switzerland [7], the au- mat. thors showed that it is possible to use Illumina reads to de- Because each read comes from a genome of a specific tect the variants before they appear in clinical cases. The variant, it is expected that it contains mutations that are 48 samples were collected as 24-hours composite or grab characteristic for this variant. When analyzing a mixture samples. Amplicons were created using the ARTIC v.3 of reads from different patient samples, the task at hand is protocol and sequenced using Illumina NovaSeq 6000, re- to estimate the proportions of reads coming from individ- sulting in paired reads of length 250bp. The mutations ual variants, based on the presence of mutations character- were identified from these reads using the V-pipe bioin- istic for these variants. formatics pipeline. These mutations were then compared with the clinical cases from Switzerland in the GISAID 3 Related work database for the presence of characteristic mutations of the B.1.1.7 and B.1.351 variants using Fisher’s exact test. There had been several attempts to identify SARS-CoV-2 The authors also looked at mutations co-occurring in the variants from mixed samples. Most commonly, these were same read and found further evidence of the presence of wastewater-based epidemiological studies. the B.1.1.7 variant in Switzerland already in early Decem- Crits-Christoph et al. [3] showed that single nucleotide ber. variants (SNVs)1 detected from the sewage water from San Francisco Bay Area were significantly similar to local Mixed samples can also originate from environmental California-based patient-derived genotypes, thus demon- sources other than sewage water. The first report on re- strating the possibility of identifying local SARS-CoV-2 covering near-complete SARS-CoV-2 genome sequences variants in the wastewater samples. The SNVs were ob- from environmental surface swabs assessed the contami- tained via SNV caller inStrain v1.3.2 [12] and the similar- nation with the virus in a hospital [2]. The authors of this ity of genotypes was established via Fisher’s exact test. study confirmed the low likelihood that SARS-CoV-2 con- Another study [6] analyzed 91 wastewater samples from tamination on hospital surfaces contains infectious virus. 11 states in the USA and identified 7973 SNVs, of which 5680 were “novel” at the time of the analysis with respect Although many studies have identified SARS-CoV-2 to the global clinically derived data. Interestingly, almost variants from mixed environmental samples, all have qual- half of the “novel” variants were confirmed within the next itative results. In this paper, we estimate the proportions 1 Single nucleotide variants are simple mutations substituting one of the variants in the sample, which we believe is corre- DNA base for another, not to be confused with virus variants, which are lated with the proportions of the variants circulating in the groups of evolutionarily related viruses. community. 4 Model description do the memory requirements of data preprocessing, allow- ing for the efficient processing of gigabytes of sequencing In our model, we assume that in a set of aligned reads data even on common computers. All processes are easily coming from one variant, the expected observed symbol parallelizable and could be potentially accelerated using counts at any given position in genome are proportional a GPU. to the symbol frequencies among all sequences in a given variant, which are captured in matrix P described in sub- 4.3 Adding error model section 2.1. The observed symbol counts at a particular position can be thus described by a multinomial distribu- Sequencing reads contain errors, where the symbol in the tion. To simplify the model, we further assume that indi- read differs from the actual genome being sequenced. We vidual positions in the genome are independent. When se- assume that these errors occur uniformly at random and quencing a mixed sample, we assume that individual vari- add them to the model as follows. Let ε ∈ (0, 1) be the ants are present at some unknown proportions which are substitution error rate. Let m̄i,a (W ) := 1 − mi,a (W ) denote represented as weights in a mixture model. We assume the probability of observing a base different from base a that these proportions stay the same at all positions in the at position i in a mixture sample without sequencing er- genome. A more detailed description of the model is given rors. The probability of observing symbol a at position i in the next subsection. in a mixture sample sequenced with errors is then equal to: ε qi,a (W, ε) = (1 − ε) · mi,a (W ) + · m̄i,a (W ). 4.1 Basic model 3 Let Σ = {A,C, G, T } be the symbol alphabet. Let K de- The likelihood of observing O given mixture weights W note the number of virus variants and L denote the length and substitution rate ε is then proportional to: of the reference genome. Let W = (w1 , . . . , wK ) denote L the (unknown) weights of individual variants in a mixture. Pr[O|W, ε] ∼ ∏ ∏ qi,a (W, ε)Oi,a . Let pk,i,a denote the probability of observing symbol a at i=1 a∈Σ position i in the k-th variant and Oi,a denote the count of The error rate ε can be set to a particular value, or be symbol a at position i in the observed reads. inferred simultaneously with the mixture weights W : The probability of observing symbol a at position i in a mixture with weights W is then equal to mi,a (W ) := L ∑Kk=1 wk · pk,i,a . (W ∗ , ε ∗ ) := arg min − ∑ ∑ Oi,a · log qi,a (W, ε) W,ε i=1 a∈Σ The total probability of observations O given mixture weights W is then proportional to: In our experiments, we used the latter option. L Pr[O|W ] ∼ ∏ ∏ mi,a (W )Oi,a . 4.4 Evaluation of a posteriori probability of a given i=1 a∈Σ read belonging to a particular variant The inference of the mixture weights from observations We can represent a read as a data set consisting of only one can be then solved via maximisation of the log-likelihood read, i.e. Oi,a is equal to 1 if read has symbol a aligned to function: reference’s position i, and 0 otherwise. Let V ∈ {1, . . . , K} L be a random variable representing the variant that the read W ∗ := arg max ∑ ∑ Oi,a · log mi,a (W ). belongs to. The likelihood of a read belonging to variant W i=1 a∈Σ k is equal to Pr[O|V = k] = Pr[O|W = ek ], where ek is the This task is equivalent to minimisation of cross-entropy vector of length K with value 1 at position k and value 0 between m and O: elsewhere. The probability of the read belonging to variant k given its sequence is then, by the Bayes theorem, equal L to: W ∗ := arg min − ∑ ∑ Oi,a · log mi,a (W ) W i=1 a∈Σ Pr[O|V = k] · Pr[V = k] Pr[V = k|O] = . ∑Kj=1 Pr[O|V = j] · Pr[V = j] 4.2 Minimisation and efficiency 1 Assuming a uniform prior Pr[V ] = , the formula reduces The minimisation is done using the L-BFGS-B algorithm K [19] implemented in Python library scipy [8]. Since the to: Pr[O|V = k] input data for the minimisation process are essentially two Pr[V = k|O] = K . tables of fixed sizes L × |Σ| and K × L × |Σ|, the time and ∑ j=1 Pr[O|V = j] memory requirements of the optimisation process itself do The notion of posterior probability enables us to clas- not depend on the amount of sequencing reads and neither sify individual reads into variants by choosing the variant Variant Isolate The true proportions of variants in the samples is calcu- CZ UKBA-702, UKBA-722, UKBA-809, lated as the proportion of the sums of read lengths for each UKBA-818 variant. EU1 UKBA-716, UKBA-717 EU2 UKBA-701 5.2 Results on all reads UK UKBA-703, UKBA-704, UKBA-705, The results on the simulated mixes are shown in Table 3. UKBA-706, UKBA-707, UKBA-708, We can see that the model is able to distinguish between UKBA-713, UKBA-714, UKBA-718, present and absent variants, while significantly overesti- UKBA-719, UKBA-720, UKBA-801, mating the “other” variant in all samples. UKBA-802, UKBA-803, UKBA-804, In order to find an explanation for this discrepancy, we UKBA-805, UKBA-806, UKBA-807, analysed the posterior probabilities (see subsection 4.4) of UKBA-808, UKBA-814, UKBA-815, individual reads (see Figure 1). The analysis shows that UKBA-816, UKBA-817 there are many “uncertain” reads (i.e. reads with no strong UKBA318 UKBA-709, UKBA-710, UKBA-711, affinity towards any variant), particularly so among reads, UKBA-712, UKBA-723, UKBA-724 classified into “other” variant. We decided to formalize the notion of “uncertainty” and other UKBA-715 filter out such reads from the data set before running the Table 2: Table of used samples model. with the maximum a posteriori probability (MAP). We use 5.3 Filtering out uncertain reads this read classification for visualization and filtering as dis- cussed in the next section. Let k-certainty of a read be defined as a sum of its k highest posterior probabilities of belonging to a particular variant. Unfortunately, this approach requires to set the substi- tution rate ε in advance. In our experiments, we estimated We decided to use 1-certainty criterion in our experi- the substitution rate by first running the minimisation al- ments. gorithm on the available data. The pipeline works as follows: first we estimate the sub- stitution error rate by running the model on the full data set. Then, using the estimated error rate, we evaluate the 5 Experiments 1-certainty for each read. Reads with 1-certainty below a given threshold are then removed from the data set, and 5.1 Simulated data the observed counts Oi,a are calculated. Finally, we run the model on these counts, reporting the mixture weights To evaluate the model, we created simulated data sets, W as the final answer. where the correct proportions of the variants were known. We ran our algorithm on the simulated mixes with dif- We have used 37 FASTQ files from the ARTIC sequenc- ferent filtering thresholds. The estimated weights at dif- ing experiments, conducted at Biomedical Research Cen- ferent filtration thresholds are shown in Figure 2. The ter of the Slovak Academy of Sciences, using protocols KL-divergence between the true and estimated weights are described in [1], where each SARS-CoV-2 sample was se- shown in Figure 3. quenced separately (using a different barcode). This heuristic works reasonably well on data sets with- The list of samples and their classification to variants is out reads from “other” variant (Figures 2a and 2b). How- shown in Table 2. Each file was mapped to the reference ever, it fails on data sets with “other” variant present and the resulting BAM files were merged to create four (Figures 2c and 2d) because reads from “other” variant mixed samples as follows: have generally lower 1-certainty (Figure 1), and are con- sequently filtered out. 1. 1x CZ, 1x EU1, 1x EU2, 1x UK (92 840 reads, aver- age base coverage 5 858) 6 Discussion 2. 4x CZ, 2x EU1, 1x EU2, 3x UK, 1x UKBA318 (264 113 reads, average base coverage 16 562) We have demonstrated that our method can predict the 3. 2x EU1, 1x EU2, 1x other (99 711 reads, average base proportions of individual SARS-CoV-2 variants in mixed coverage 6 273) samples when all variants in a sample are known to the model (i.e. there are no reads from “other” variant). 4. 3x CZ, 2x EU1, 9x UK, 1x other (335 601 reads, av- In our future work on this topic, we would like to ad- erage base coverage 21 044) dress the following issues: variant Mix #1 Mix #2 Mix #3 Mix #4 true estimated true estimated true estimated true estimated Brazil 0.000 0.000 0.000 0.000 0.000 0.001 0.000 0.000 CZ 0.327 0.245 0.322 0.249 0.000 0.004 0.163 0.128 EU1 0.225 0.230 0.189 0.179 0.506 0.434 0.149 0.142 EU2 0.252 0.200 0.088 0.070 0.235 0.178 0.000 0.000 South.Afr 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.000 UK 0.196 0.158 0.246 0.209 0.000 0.002 0.612 0.546 UKBA318 0.000 0.000 0.155 0.121 0.000 0.005 0.000 0.000 other 0.000 0.168 0.000 0.173 0.259 0.372 0.076 0.183 Table 3: Estimated weights for the simulated samples without any filtration. Note that the “other” variant is significantly overestimated in all samples. Aside from that, the method is able to distinguish between present and absent variants. Figure 1: The base coverage of a sequence by reads classified into individual variants by maximum a posteriori predictor (MAP). The first simulated mix is shown in the left, the third in the right. The blue line represents the coverage by all reads classified into a particular variant, the orange and red lines only reads with 1-certainty above 50% and 90%, respectively. Reads that have been, both erroneously (first mix) and correctly (third mix) classified into “other” variant, have mostly low 1-certainty. Even for correct variants, there are regions in the reference where the model cannot capture any reads. These regions correspond well with the regions without characteristic mutations (see Table 1). (a) Simulated mix #1 (b) Simulated mix #2 (c) Simulated mix #3 (d) Simulated mix #4 Figure 2: The estimated weights for each simulated mix at different thresholds of filtration of uncertain reads. The ver- tical axis shows the filtration threshold. Lengths of individual color stripes represent their estimated weight for a given threshold of filtration. The last row (denoted “real”) shows the true weights for each simulated mix. The lowest filtration threshold does not filter out any reads (because it’s equal to 1/K, which is the lowest possible value for the 1-certainty). • We would like to be able to evaluate confidence in- tervals on our estimations. That would give us, for example, the ability to rule out the presence of a vari- ant with a low estimated weight. • There are many non-informative reads coming from the areas of the genome with no significant alter- ations between individual variants. The proportion of such reads is even higher when the reads are shorter (e.g. obtained by Illumina sequencing). The goal is to extend the model so that it could handle such data. • The sequencing data shows severe non-uniformity of the coverage. This issue may be addressed by a selec- tive weighting of individual reads based on the cov- erage at their position. Figure 3: The KL-divergence between the true and esti- mated weights for different read filtration thresholds. The • The current model does not take into account inser- horizontal axis shows the filtration threshold. The vertical tions and deletions, both in the individual variants and axis shows the resulting KL-divergence (lower is better). during the sequencing process. It can be seen that the filtration method works reasonably well for the mixes without “other” variant present (the first • The way we characterize the individual variants using and second mixes). the averaging of all available data from the GISAID database may incur a bias toward specific subvariants (e.g. some countries, such as the UK, submit a much • As we discussed in the previous section, our model higher number of sequence samples). tends to overestimate the “other” variant, and our cur- rent filtering heuristics is not a robust solution. The • Our approach splits all reads into individual bases, filtering may introduce bias into the estimation. This thus potentially losing all long-range information. issue may be addressed by a more systematic ap- In the future, we would like to adjust the model, so proach to defining the “other” variant. that it would keep the reads intact. • In our model we assume that the individual positions [8] E. Jones, T. Oliphant, P. Peterson, et al. SciPy: Open source are independent. But, the way the variant profiles scientific tools for Python, 2001–. are created (by demanding at least n mutations to be [9] K. Kupferschmidt. The pandemic virus is slowly mutating. present in a sequence) introduces some dependency but does it matter?, 2020. between them. In the future, we would like to resolve [10] H. Li. Minimap2: pairwise alignment for nucleotide se- that inconsistency either by changing of the way the quences. Bioinformatics, 34(18):3094–3100, 2018. variant profiles are estimated or by relaxing the as- [11] NCIRD. Science brief: Emerging SARS-CoV-2 variants, sumption in the model. 2021. [12] M. R. Olm, A. Crits-Christoph, K. Bouma-Gregson, B. A. Code availability. The code is available at Firek, M. J. Morowitz, and J. F. Banfield. inStrain profiles https://github.com/fmfi-compbio/covid-pooling. population microdiversity from metagenomic data and sen- sitively detects shared microbial strains. Nature Biotech- nology, pages 1–10, 2021. Acknowledgements. We would like to thank the reviewers [13] A. Pérez Cataluña, Á. Chiner-Oms, E. Cuevas Ferrando, for their thorough reviews and helpful suggestions. A. Díaz-Reolid, I. Falcó, W. Randazzo, I. Girón-Guzmán, A. Allende, M. A. Bracho, I. Comas, et al. Detection of Funding. This research was supported by the Operational genomic variants of SARS-CoV-2 circulating in wastewater Program Integrated Infrastructure within project: Pange- by high-throughput sequencing. 2021. nomics for personalized clinical management of infected [14] A. Rambaut et al. Preliminary genomic characterisation of persons based on identified viral genome and human ex- an emergent SARS-CoV-2 lineage in the UK defined by a ome (Code ITMS:313011ATL7, 80%) co-financed by the novel set of spike mutations, 2020. European Regional Development Fund. It was also sup- [15] A. Rambaut, E. C. Holmes, Á. O’Toole, V. Hill, J. T. Mc- ported by VEGA 1/0458/18 (10%) and Comenius Univer- Crone, C. Ruis, L. du Plessis, and O. G. Pybus. A dynamic sity Grant UK/336/2021 (10%). nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature microbiology, 5(11):1403– 1407, 2020. References [16] The World Health Organization. Tracking SARS- CoV-2 variants, 2021. https://www.who.int/en/ activities/tracking-SARS-CoV-2-variants/. [1] B. Brejova, K. Borsova, V. Hodorova, V. Cabanova, A. Ga- [17] J. R. Tyson, P. James, D. Stoddart, N. Sparks, A. Wick- furov, D. Fricova, M. Nebohacova, T. Vinar, B. Klempa, enhagen, G. Hall, J. H. Choi, H. Lapointe, K. Kamelian, and J. Nosek. Nanopore sequencing of SARS-CoV-2: A. D. Smith, et al. Improvements to the ARTIC multiplex Comparison of short and long PCR-tiling amplicon proto- PCR method for SARS-CoV-2 genome sequencing using cols. medRxiv, 2021. nanopore. bioRxiv, 2020. [2] D. A. Coil, T. Albertson, S. Banerjee, G. Brennan, A. J. [18] M. Wasiuta. Worldometers, 2009. Campbell, S. H. Cohen, S. Dandekar, S. L. Díaz-Muñoz, J. A. Eisen, T. Goldstein, et al. SARS-CoV-2 detection [19] C. Zhu, R. H. Byrd, P. Lu, and J. Nocedal. Algorithm 778: and genomic sequencing from hospital surface samples col- L-BFGS-B: Fortran Subroutines for Large-Scale Bound- lected at UC Davis. medRxiv, 2021. Constrained Optimization. ACM Transactions on Mathe- matical Software, 23(4):550–560, 1997. [3] A. Crits-Christoph, R. S. Kantor, M. R. Olm, O. N. Whit- ney, B. Al-Shayeb, Y. C. Lou, A. Flamholz, L. C. Kennedy, H. Greenwald, A. Hinkle, et al. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. medRxiv, 2020. [4] ECDC. Rapid increase of a SARS-CoV-2 variant with mul- tiple spike protein mutations observed in the United King- dom—20 december 2020. ECDC: Stockholm, 2020. [5] S. Elbe and G. Buckland-Merrett. Data, disease and diplo- macy: GISAID’s innovative contribution to global health. Global Challenges, 1(1):33–46, 2017. [6] R. S. Fontenele, S. Kraberger, J. Hadfield, E. M. Driver, D. Bowes, L. A. Holland, T. O. Faleye, S. Adhikari, R. Ku- mar, R. Inchausti, et al. High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulat- ing variants. medRxiv, 2021. [7] K. Jahn, D. Dreifuss, I. Topolsky, A. Kull, P. Ganesanan- damoorthy, X. Fernandez-Cassi, C. Bänziger, E. Stach- ler, L. Fuhrmann, K. P. Jablonski, et al. Detection of SARS-CoV-2 variants in Switzerland by genomic analysis of wastewater samples. medRxiv, 2021.