Limits...
Enhancing de novo transcriptome assembly by incorporating multiple overlap sizes.

Chen CC, Lin WD, Chang YJ, Chen CL, Ho JM - ISRN Bioinform (2012)

Bottom Line: Methodology.Significance.The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.

ABSTRACT
Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

No MeSH data available.


Illustration of overlap measures and consistent measures.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4417554&req=5

fig5: Illustration of overlap measures and consistent measures.

Mentions: Figure 5(a) presents an example of how overlap precision and overlap recall were computed: the overlap precision rate is (a1 + a2 + a34)/(C1 + C2) and the overlap recall rate is (a12 + a3 + a4)/(R1 + R2). Note that the true positive area may include overlapping regions of alignments, so we named these measures overlap. Also note that these measures may overestimate the performance because of recounting the overlaps. However, most works use them as the benchmark, for example, the “sequence coverage” used in Velvet and the “genome coverage” used in ABySS. Accordingly, we take overlap measures as the upper bound of performance.


Enhancing de novo transcriptome assembly by incorporating multiple overlap sizes.

Chen CC, Lin WD, Chang YJ, Chen CL, Ho JM - ISRN Bioinform (2012)

Illustration of overlap measures and consistent measures.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4417554&req=5

fig5: Illustration of overlap measures and consistent measures.
Mentions: Figure 5(a) presents an example of how overlap precision and overlap recall were computed: the overlap precision rate is (a1 + a2 + a34)/(C1 + C2) and the overlap recall rate is (a12 + a3 + a4)/(R1 + R2). Note that the true positive area may include overlapping regions of alignments, so we named these measures overlap. Also note that these measures may overestimate the performance because of recounting the overlaps. However, most works use them as the benchmark, for example, the “sequence coverage” used in Velvet and the “genome coverage” used in ABySS. Accordingly, we take overlap measures as the upper bound of performance.

Bottom Line: Methodology.Significance.The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.

ABSTRACT
Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

No MeSH data available.