Limits...
The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome.

Sakai H, Naito K, Ogiso-Tanaka E, Takahashi Y, Iseki K, Muto C, Satou K, Teruya K, Shiroma A, Shimoji M, Hirano T, Itoh T, Kaga A, Tomooka N - Sci Rep (2015)

Bottom Line: Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms.The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies.We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

View Article: PubMed Central - PubMed

Affiliation: Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 305-8602, Japan.

ABSTRACT
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

No MeSH data available.


Related in: MedlinePlus

NG graphs of legume genomes of (a) contigs and (b) pseudomolecules. The x-axis indicates NG integers, and the y-axis indicates the calculated NG length in each assembled genome. The vertical line indicates the NG50 contig/scaffold length. The labels are sorted according to the ranking of contig/scaffold NG50. The solid lines indicate the reference grade assemblies (total size of anchored scaffolds covering ~80% of genome), whereas broken and dotted lines indicate the draft assemblies (total size of anchored scaffolds covering ~50% and ~30%, respectively).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4663752&req=5

f3: NG graphs of legume genomes of (a) contigs and (b) pseudomolecules. The x-axis indicates NG integers, and the y-axis indicates the calculated NG length in each assembled genome. The vertical line indicates the NG50 contig/scaffold length. The labels are sorted according to the ranking of contig/scaffold NG50. The solid lines indicate the reference grade assemblies (total size of anchored scaffolds covering ~80% of genome), whereas broken and dotted lines indicate the draft assemblies (total size of anchored scaffolds covering ~50% and ~30%, respectively).

Mentions: Compared to those genome assemblies, our assembly (Assembly_3) had the best coverage and the least amount of gaps, in both the total assembly and the pseudomolecules (anchored scaffolds) (Table 2). As shown in Fig. 3, the NG graph revealed great variance between the assemblies. The scaffold (pseudomolecule) NG graph showed almost no difference between reference-grade assemblies, including soybean, alfalfa, and common bean, in which Sanger sequencing, BAC libraries, optical mapping, or high-density genetic maps had been integrated. A similar graph was also obtained for Assembly_3, whereas SGS-only draft assemblies, including mungbean, chick pea, pigeon pea, and azuki bean, had a shorter length and coverage of only 20–60% (Fig. 3a). For contigs, the NG graph revealed that Assembly_3 had the highest contiguity, whereas the SGS-only assemblies were much shorter and were similar to each other (Fig. 3b).


The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome.

Sakai H, Naito K, Ogiso-Tanaka E, Takahashi Y, Iseki K, Muto C, Satou K, Teruya K, Shiroma A, Shimoji M, Hirano T, Itoh T, Kaga A, Tomooka N - Sci Rep (2015)

NG graphs of legume genomes of (a) contigs and (b) pseudomolecules. The x-axis indicates NG integers, and the y-axis indicates the calculated NG length in each assembled genome. The vertical line indicates the NG50 contig/scaffold length. The labels are sorted according to the ranking of contig/scaffold NG50. The solid lines indicate the reference grade assemblies (total size of anchored scaffolds covering ~80% of genome), whereas broken and dotted lines indicate the draft assemblies (total size of anchored scaffolds covering ~50% and ~30%, respectively).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4663752&req=5

f3: NG graphs of legume genomes of (a) contigs and (b) pseudomolecules. The x-axis indicates NG integers, and the y-axis indicates the calculated NG length in each assembled genome. The vertical line indicates the NG50 contig/scaffold length. The labels are sorted according to the ranking of contig/scaffold NG50. The solid lines indicate the reference grade assemblies (total size of anchored scaffolds covering ~80% of genome), whereas broken and dotted lines indicate the draft assemblies (total size of anchored scaffolds covering ~50% and ~30%, respectively).
Mentions: Compared to those genome assemblies, our assembly (Assembly_3) had the best coverage and the least amount of gaps, in both the total assembly and the pseudomolecules (anchored scaffolds) (Table 2). As shown in Fig. 3, the NG graph revealed great variance between the assemblies. The scaffold (pseudomolecule) NG graph showed almost no difference between reference-grade assemblies, including soybean, alfalfa, and common bean, in which Sanger sequencing, BAC libraries, optical mapping, or high-density genetic maps had been integrated. A similar graph was also obtained for Assembly_3, whereas SGS-only draft assemblies, including mungbean, chick pea, pigeon pea, and azuki bean, had a shorter length and coverage of only 20–60% (Fig. 3a). For contigs, the NG graph revealed that Assembly_3 had the highest contiguity, whereas the SGS-only assemblies were much shorter and were similar to each other (Fig. 3b).

Bottom Line: Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms.The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies.We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

View Article: PubMed Central - PubMed

Affiliation: Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 305-8602, Japan.

ABSTRACT
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

No MeSH data available.


Related in: MedlinePlus