Limits...
Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40.

Umemura M, Koyama Y, Takeda I, Hagiwara H, Ikegami T, Koike H, Machida M - PLoS ONE (2013)

Bottom Line: The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds.Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi.We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

View Article: PubMed Central - PubMed

Affiliation: National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan.

ABSTRACT
The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

Show MeSH
Proportion of assembled scaffold fragments aligned to the Aspergillus oryzae RIB40 reference genome.The length of aligned fragments are indicated by color (bluegreen, >50 kb; purple, >10 kb; gray, ≤10 kb; and yellow, 0 or none). The graph includes the results of the assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. For lib2.8.qv10 and lib1.9.qv10, the results using k-mers of 25 to 35 are included.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3646829&req=5

pone-0063673-g004: Proportion of assembled scaffold fragments aligned to the Aspergillus oryzae RIB40 reference genome.The length of aligned fragments are indicated by color (bluegreen, >50 kb; purple, >10 kb; gray, ≤10 kb; and yellow, 0 or none). The graph includes the results of the assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. For lib2.8.qv10 and lib1.9.qv10, the results using k-mers of 25 to 35 are included.

Mentions: Genes must be correctly continuous to identify SMB gene clusters. As shown in Figure 4, the assembled scaffolds composed of >50- and >10-kb fragments covered ∼27% and ∼85% of the reference genome, respectively, in lib2.8.nofilter.k31. We also examined gene continuity in the assembled scaffolds using three gene clusters, AO090026000008−AO090026000036 (29 genes, ∼73 kb), AO090001000018−AO090001000055 (38 genes, ∼75 kb), and AO090113000136−AO090113000138 (3 genes, ∼6 kb), which correspond to SMB gene clusters of aflatoxin (A. flavus), gliotoxin (A. flavus, hypothetical), and KA (A. oryzae) [22]. A. oryzae does not produce aflatoxin due to several gene deficiencies and mutations [35]–[38], but does have a genomic region corresponding to the aflatoxin biosynthetic gene cluster of A. flavus[21], [39]–[41]. Similarly, A. oryzae is not reported to produce gliotoxin, but a region with homology to a gliotoxin biosynthetic gene cluster from A. flavus was identified in a homology search. As summarized in Table 3, complete gene continuity was preserved in lib2.8.nofilter.k31 and other assemblies. These results suggest that our de novo assembly approach can reconstruct SMB gene clusters in a fungal genome sequence.


Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40.

Umemura M, Koyama Y, Takeda I, Hagiwara H, Ikegami T, Koike H, Machida M - PLoS ONE (2013)

Proportion of assembled scaffold fragments aligned to the Aspergillus oryzae RIB40 reference genome.The length of aligned fragments are indicated by color (bluegreen, >50 kb; purple, >10 kb; gray, ≤10 kb; and yellow, 0 or none). The graph includes the results of the assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. For lib2.8.qv10 and lib1.9.qv10, the results using k-mers of 25 to 35 are included.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3646829&req=5

pone-0063673-g004: Proportion of assembled scaffold fragments aligned to the Aspergillus oryzae RIB40 reference genome.The length of aligned fragments are indicated by color (bluegreen, >50 kb; purple, >10 kb; gray, ≤10 kb; and yellow, 0 or none). The graph includes the results of the assemblies using lib2.8 and lib1.9 with either unfiltered (nofilter), no undetermined bases (nodot), or QV >10 data. For lib2.8.qv10 and lib1.9.qv10, the results using k-mers of 25 to 35 are included.
Mentions: Genes must be correctly continuous to identify SMB gene clusters. As shown in Figure 4, the assembled scaffolds composed of >50- and >10-kb fragments covered ∼27% and ∼85% of the reference genome, respectively, in lib2.8.nofilter.k31. We also examined gene continuity in the assembled scaffolds using three gene clusters, AO090026000008−AO090026000036 (29 genes, ∼73 kb), AO090001000018−AO090001000055 (38 genes, ∼75 kb), and AO090113000136−AO090113000138 (3 genes, ∼6 kb), which correspond to SMB gene clusters of aflatoxin (A. flavus), gliotoxin (A. flavus, hypothetical), and KA (A. oryzae) [22]. A. oryzae does not produce aflatoxin due to several gene deficiencies and mutations [35]–[38], but does have a genomic region corresponding to the aflatoxin biosynthetic gene cluster of A. flavus[21], [39]–[41]. Similarly, A. oryzae is not reported to produce gliotoxin, but a region with homology to a gliotoxin biosynthetic gene cluster from A. flavus was identified in a homology search. As summarized in Table 3, complete gene continuity was preserved in lib2.8.nofilter.k31 and other assemblies. These results suggest that our de novo assembly approach can reconstruct SMB gene clusters in a fungal genome sequence.

Bottom Line: The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds.Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi.We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

View Article: PubMed Central - PubMed

Affiliation: National Institute of Advanced Industrial Science and Technology (AIST), Sapporo, Hokkaido, Japan.

ABSTRACT
The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ∼2.0 kb, and k-mer size 33.

Show MeSH