High-accuracy long-read amplicon sequences using unique molecular identifiers

2020-01-27 By Quick Biology

The major challenge of third-generation sequencing (i.e. PacBio and Nanopore sequencing) is the high sequencing error rate, PacBio has 10-15% while Nanopore is 5-20%.  Although Illumina allows rapid, cheap, and accurate sequencing, but its short reads (usually < 300 bp) don’t enable complete genome assembly or intact full-length mRNA transcriptome analysis. Scientists have developed hybrid assembly pipelines by combing long-reads for contig formation and short-reads for nucleotide accuracy to improve de novo genome assembly.

      In Nature Methods, Albertsen Lab from Denmark applied UMIs (unique molecular identifiers) into longer amplicons (ref1), performed error filtering strategies, they can get high-accuracy sequences just using third-generation sequencers. UMI strategy into NGS field is popularly used in de-duplicate reads, in PCR error correction.  Here, Albertsen Lab uses dual UMIs for chimera filtering and profiling the error of the generated consensus sequence (Fig.1). To improve recognition of UMI-tagged error-prone reads, they designed UMIs to contain recognizable internal patterns that avoid error-prone homopolymer stretches, combined with filtering based on UMI length and pattern allow for a mean error rate of 0.0042% (ONT R10.3), 0.0041% (ONT R9.4.1) and 0.0007% (Pacific Biosciences circular consensus sequencing) in ribosomal RNA operon ~4500 bp amplicons.

Figure 1: Dual UMI-tagging approach for long-read amplicon sequencing. a, A schematic overview of the dual-UMI-tagged molecule. b,c, Overview of laboratory (b) and bioinformatics workflow (c) for the UMI-tagged molecules. The two UMIs are detected and processed together in the bioinformatics pipeline.

Figure 1 Dual UMI-tagging approach for long-read amplicon sequencing_0.png

Quick Biology can assistant you with nanopore sequencing and data analysis for long amplicons. Find More at Quick Biology.

See resource:

  1. 1. Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio Sequencing. Nature Methods (2021). https://www.nature.com/articles/s41592-020-01041-y