ONT Long-read sequencing uncovers uncharted regions in Brassica

2020-08-17 by Quick Biology Inc.

Decoding the genome of the species is a fundamental question in biology.  Current short-read sequencing technology cannot cover the low complexity of simple repetitive regions such as centromeric or telomeric, and cannot assemble large structural variation. In plants, higher genomic rearrangement events, abundant repeat expansion and plant polyploids make it much more challenging for uncovering their complete genomes.   

In recent Nature Plants, Sampath Perumal and colleagues comprehensively exploited the genome of the neglected oilseed Brassica nigra. By Oxford Nanopore long reads DNA sequencing and Hi-C,  they generated two high-quality de novo genome assemblies.  Long reads and high sequencing depth allowed them to access low-complexity regions (Fig. 2-3, ref1). Interestingly, pericentromeric regions and the coincidence of hypomethylation enabled localization of active centromeres (Fig.4).  Their work provides a vivid example of how Nanopore long reads sequencing improves assembly contiguity. Their method/pipeline as a toolkit can be used for generating a genome sequence that works for any crop and can facilitate the rapid selection of agronomically important traits. 

The sequencing workflow (Fig. 1) represents the near-complete assembly of two B. nigra genomes using a combination of ONT nanopore sequencing, Illumina error correction, Hi-C sequencing, and genetic mapping for scaffolding. The complete experimental and data analysis workflow is as the following: (1) Plant materials and high-molecular-weight DNA extraction, (2) ONT genomic DNA libraries preparation, sequencing and reads processing, (3) Illumina WGS library preparation sequencing (4) Nanopore sequence assembly and polishing with short reads, (5) Contig scaffolding with HiC-seq reads, (6) Genome annotation using RNA-sequencing data, and finally (7) repeat annotation. 

Fig 1. ONT Long read, Ilumina Short read, and HiC-seq assembly schema for two B. nigra genomes (CN115125 (A) and Ni100 (B)).

Fig. 2: Genomic features of the B. nigra Ni100-LR assembly. Bands: (1) chromosomes with centromere positions (black band); (2) class I retrotransposons (nucleotides per 100-kb bins); (3) class II DNA repeats (nucleotides per 100-kb bins); (4) gene density (genes per 100-kb bins); (5) gene expression in leaf tissue (log10[average TPM] in 100-kb bins); (6) ONT CG methylation profile (ratio per 100 kb); (7) whole-genome bisulfite methylation profile (nucleotides per 100-kb bins). CG, blue; CHG, yellow; CHH, red. (from ref1)

Fig. 3: Comparison of B. nigra assemblies. a, Chromosome-level genome alignment of the Ni100-SR (NS) assembly (centre) against the LR assemblies, C2-LR (bottom) and Ni100-LR (top). The plot was created using Synvisio (https://github.com/kiranbandi/synvisio). b, Circular map generated using Circos89 showing the alignment of the SR and LR assemblies for chromosome B5 of Ni100 (from ref1).

Fig. 4: Characterization of centromeric region of chromosome B5 of Ni100-LR genome. a, Distribution of various genomic features on the 5-Mb centromere region, including genes, methylome (ONT and WGBS) and full-length LTRs (ALE-LTR and 13 other family LTRs); distribution of young (<1 Ma) and old LTRs (>1 Ma); and distribution of centromeric repeat sequences of B. nigra based on chromatin immunoprecipitation (ChIP) analysis of CENH3 (ref. 38). b, Nested insertion of full-length LTRs in the centromeric region. Age (in Ma) is shown above each element. (from ref1)

 

Quick Biology provides complete end-to-end De novo assembly service with Nanopore sequencing, Illumina, and Hi-C. Find More at Quick Biology.

Ref: 

  1. 1. Perumal, S. et al. Brassica genome. Nat. Plants 6, (2020). https://www.nature.com/articles/s41477-020-0735-y