ContentsIntroductionHere we compare as of April 2024 the contiguity and accuracy results of the Oxford Nanopore Technologies (ONT) Native (NBK), Rapid (RBK), and Rapid-PCR Barcoding Kits (RPBK). MethodsSeven culture collections strains that have finished NCBI RefSeq genomes were re-sequenced with Illumina DNA Prep, PacBio HiFi v3, and ONT NBK, RBK, and RPBK (with NEB LongAmp Taq) library preparations and then loading on a MiSeq, Sequel IIe, or MinION R10.4.1 flowcell, respectively. These RefSeq isolates spanned a GC range from 32.1-73.1%. Additionally, a couple of isolates from a ring-trial organized by Gabriel Wagner (Univ. Graz, Austria) with methylation profiles causing strand-specific errors with the ONT technology were sequenced with PacBio and ONT. DNA was extracted with the Zymo Biomics DNA Miniprep kit (Zymo Research; Freiburg, Germany). Basecalling was done with Dorado (v0.5.3) with model dna_r10.4.1_e8.2_400bps_sup@v4.3.0 either in simplex or duplex mode. Reads were trimmed with chopper (v0.7.0; with quality 10 and minimum length 500) and then de novo assembled with Flye (v2.9.3; with option --nano-hq). Consensus sequences were polished with Medaka (v1.11.3) either with r1041_ e82_400bps_sup_v4.3.0 (‘Dorado’) or r103_min_high_g360 (‘Guppy’) model. Plasmid reconstruction was done with MOB-recon. Seed-only cgMLST task templates that were defined from the downloaded NCBI genomes or the PacBio genomes of the same strains were used with SeqSphere+. Whole genomes were compared and analyzed with Quast (v5.2.0). Please note that the rates of InDels and substitutions between the assemblies and NCBI genomes (smaller values are better) reported by Quast will never be zero due to sequencing errors in the finished NCBI genomes and/or micro-evolutionary changes of the re-sequenced culture collection strains. NanoComp (v1.23.1) was used to compare sample specific quality metrics on read level. BRIG (v0.95) generated the genome comparison figure. NCBI RefSeq StrainsAs can be seen it was not possible to achieve assemblies with the RPBK data for the two high GC content isolates, i.e., P. aeruginosa (66.6%) and M. luteus (73.1% GC). Therefore, we also took a closer look at some missing cgMLST target blocks of the RPBK K. quasipneumoniae data which had on average a GC content of 57.6%. The seven blocks inspected ranged in size between 1.5 and 13.3 kb and had a GC content ranging from 62.1 to 65.8%. Table: Illumina DNA Prep, PacBio HiFi, and ONT NBK with ‘Dorado’ and ‘Guppy’ models for Medaka polishing. Figure: NBK log-transformed read lengths of the seven RefSeq samples with average read length N50 9.96 kb (mean read quality 18.3). Figure: RBK log-transformed read lengths of the seven RefSeq samples with average read length N50 8.62 kb (mean read quality 16.7). Figure: RPBK log-transformed read lengths of the seven RefSeq samples with average read length N50 3.69 kb (mean read quality 17). Table: ONT NBK, RBK, and RPBK all with ‘Guppy’ model for Medaka polishing. Figure: Genome comparison of E. coli ATTC 83739 re-sequenced with ONT-RBK against the NCBI RefSeq (GCF_000019385.1) genome of the very same isolate. Flye generated a circular chromosome and according to Quast analysis there was no single misassembly.
Strains with Known Modification-mediated ErrorsONT published in April 2023 a note informing customers of possible issues with strand-specific errors due to methylation. In a preprint by Mara Lohde et al. the authors found in over 40% of 264 screened isolates of in total 32 different species more than 50 problematic positions in each of the NBK genomes (involving 25 of the 32 species; [bioRxiv]). The authors recommended using RPBK to get rid of the issues but evaluated only eight K. pneumoniae isolates with this kit. Finally, Lohde et al. suggested utilizing NBK and RPBK sequencing kits for library preparation and pooling their libraries in approximately 30/70 ratio prior to flowcell loading. In the following, only results from 2 exemplary strains with same issues are shown as they summarize all our observations. Table: PacBio HiFi (that was used for comparison as no NCBI RefSeqs were available), ONT NBK, RBK, and RPBK all with ‘Dorado’ and ONT NBK with ‘Guppy’ model for Medaka polishing.
Table: ONT RBK with simplex, ONT RBK and ONT RBK optimized with duplex, and ONT RBK/RPBK (30/70 ratio) with simplex Dorado basecalling. All data were polished with the ‘Dorado’ model for Medaka polishing.
Finally, we did some preliminary evaluation with the Modpolish tool that corrects modification-mediated errors in ONT sequencing by nucleotide demodification and reference-based correction. The tool was able for example to correct all but one error in our ‘problematic’ Listeria isolate from above. However, it is very compute-intensive and we observed also instances where the software even introduced errors in previously correct genomes. In summary, the Medaka ‘Guppy’ model reduced but did not remove all cgMLST errors of ‘problematic’ strains. The amplification step with RPBK eliminated (nearly) all errors due to methylation. Data basecalling with Dorado in duplex mode had almost no beneficial effect. The same was true for duplex-optimized runs. With a ratio of about 30/70% pooled RBK/RPBK libraries all errors were eliminated, but contiguity seemed to be slightly worse as with RBK-only. Furthermore, high GC RPBK gaps were not bridged by the RBK reads. Modpolish not only relies on others to produce high-quality reference genomes but is also very compute-intensive and did not always produce correct genome sequences. ConclusionFor a reliable fine scale analysis (e.g., SNV or cgMLST genotyping) with ONT the RPBK library preparation should be used. For structural analysis (e.g., plasmid characterization) ONT RBK-only – as with NBK especially smaller plasmids are not so well recovered - assemblies suffices. However, in case of high GC issues with RPBK, or if best contiguity and accuracy is aimed for, then assemblies with ONT RBK and Illumina reads for polishing are still the best solution. Similar conclusions were obtained by a recent publication by Nicole Lerminiaux et al. [PubMed 38354391]. |