Limits...
The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome.

Sakai H, Naito K, Ogiso-Tanaka E, Takahashi Y, Iseki K, Muto C, Satou K, Teruya K, Shiroma A, Shimoji M, Hirano T, Itoh T, Kaga A, Tomooka N - Sci Rep (2015)

Bottom Line: Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms.The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies.We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

View Article: PubMed Central - PubMed

Affiliation: Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 305-8602, Japan.

ABSTRACT
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

No MeSH data available.


Related in: MedlinePlus

Summary of annotations.(a) The amounts of unique sequences, repetitive sequences, gaps, and unassembled sequences in each assembly. (b) Examples of wrong annotations in Assembly_2. At the locus of Vigan.02G030200 (top) in Assembly_3, sequence from the 2nd to the 3rd intron was left as a gap in Assembly_2, leading to fragmentations of this locus. The 23 kb region of the locus Vigan.03G124500 (bottom) was assembled into only a 13 kb contig in Assembly_2, in which both ends of this region were totally unassembled, and a 2 kb region in the 9th intron was missing. In this case, two genes were also annotated, one of which was mostly comprised of intronic sequences. (c) Number of gene families with size differences. ++ and −− indicate gene families with differences of more than +4 and −4 in size, respectively. (d) Difference in total gene numbers in gene families with size differences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4663752&req=5

f2: Summary of annotations.(a) The amounts of unique sequences, repetitive sequences, gaps, and unassembled sequences in each assembly. (b) Examples of wrong annotations in Assembly_2. At the locus of Vigan.02G030200 (top) in Assembly_3, sequence from the 2nd to the 3rd intron was left as a gap in Assembly_2, leading to fragmentations of this locus. The 23 kb region of the locus Vigan.03G124500 (bottom) was assembled into only a 13 kb contig in Assembly_2, in which both ends of this region were totally unassembled, and a 2 kb region in the 9th intron was missing. In this case, two genes were also annotated, one of which was mostly comprised of intronic sequences. (c) Number of gene families with size differences. ++ and −− indicate gene families with differences of more than +4 and −4 in size, respectively. (d) Difference in total gene numbers in gene families with size differences.

Mentions: Before gene annotation, we identified repeat elements to mask the assembled sequences. As expected, the amounts of repeats were the largest in Assembly_3 and the smallest in Assembly_2 (Fig. 2a). Of the estimated genome size, Assembly_3 had 273 Mb (50.6%) as repeat-masked, whereas Assembly_1 and Assembly_2 had 232 Mb (43.0%) and 189 Mb (35.1%) of repeat-masked sequences, respectively (Fig. 2a, Supplementary Table 4). Interestingly, repeat masking also revealed that the amount of unique (unmasked) sequences greatly varied between assemblies. It was 222 Mb, 200 Mb and 240 Mb in Assembly_1, Assembly_2, and Assembly_3, respectively (Fig. 2a, Supplementary Table 4).


The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome.

Sakai H, Naito K, Ogiso-Tanaka E, Takahashi Y, Iseki K, Muto C, Satou K, Teruya K, Shiroma A, Shimoji M, Hirano T, Itoh T, Kaga A, Tomooka N - Sci Rep (2015)

Summary of annotations.(a) The amounts of unique sequences, repetitive sequences, gaps, and unassembled sequences in each assembly. (b) Examples of wrong annotations in Assembly_2. At the locus of Vigan.02G030200 (top) in Assembly_3, sequence from the 2nd to the 3rd intron was left as a gap in Assembly_2, leading to fragmentations of this locus. The 23 kb region of the locus Vigan.03G124500 (bottom) was assembled into only a 13 kb contig in Assembly_2, in which both ends of this region were totally unassembled, and a 2 kb region in the 9th intron was missing. In this case, two genes were also annotated, one of which was mostly comprised of intronic sequences. (c) Number of gene families with size differences. ++ and −− indicate gene families with differences of more than +4 and −4 in size, respectively. (d) Difference in total gene numbers in gene families with size differences.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4663752&req=5

f2: Summary of annotations.(a) The amounts of unique sequences, repetitive sequences, gaps, and unassembled sequences in each assembly. (b) Examples of wrong annotations in Assembly_2. At the locus of Vigan.02G030200 (top) in Assembly_3, sequence from the 2nd to the 3rd intron was left as a gap in Assembly_2, leading to fragmentations of this locus. The 23 kb region of the locus Vigan.03G124500 (bottom) was assembled into only a 13 kb contig in Assembly_2, in which both ends of this region were totally unassembled, and a 2 kb region in the 9th intron was missing. In this case, two genes were also annotated, one of which was mostly comprised of intronic sequences. (c) Number of gene families with size differences. ++ and −− indicate gene families with differences of more than +4 and −4 in size, respectively. (d) Difference in total gene numbers in gene families with size differences.
Mentions: Before gene annotation, we identified repeat elements to mask the assembled sequences. As expected, the amounts of repeats were the largest in Assembly_3 and the smallest in Assembly_2 (Fig. 2a). Of the estimated genome size, Assembly_3 had 273 Mb (50.6%) as repeat-masked, whereas Assembly_1 and Assembly_2 had 232 Mb (43.0%) and 189 Mb (35.1%) of repeat-masked sequences, respectively (Fig. 2a, Supplementary Table 4). Interestingly, repeat masking also revealed that the amount of unique (unmasked) sequences greatly varied between assemblies. It was 222 Mb, 200 Mb and 240 Mb in Assembly_1, Assembly_2, and Assembly_3, respectively (Fig. 2a, Supplementary Table 4).

Bottom Line: Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms.The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies.We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

View Article: PubMed Central - PubMed

Affiliation: Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki, 305-8602, Japan.

ABSTRACT
Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed support of SGS data, achieved a near-complete assembly of a eukaryotic genome.

No MeSH data available.


Related in: MedlinePlus