Limits...
Enhancing de novo transcriptome assembly by incorporating multiple overlap sizes.

Chen CC, Lin WD, Chang YJ, Chen CL, Ho JM - ISRN Bioinform (2012)

Bottom Line: Methodology.Significance.The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.

ABSTRACT
Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

No MeSH data available.


The relationship between optimum k's and coverage depth for one transcriptome sequence in different error rate.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4417554&req=5

fig2: The relationship between optimum k's and coverage depth for one transcriptome sequence in different error rate.

Mentions: Because a sequencing error rate of 0.3% is commonly seen in the control lane of the Illumina Solexa sequencer [17], it is possible that the sequencing error rate might increase for noncontrol lanes. To see the crosstalks among coverage depth, sequencing error rate, and optimum k, we arbitrarily picked five mouse transcripts and generated simulated datasets with coverage depths 2x, 4x, 8x, 16x,…, and 16384x, respectively. Additionally, errors were simulated with average rates of 0%, 0.3%, 0.6%, 0.9%,…, and 2.4% for every coverage depth (Section 4). Figure 2 shows results of one simulated transcript (see Figures S1–S4 in Supplementary Material available on line at doi:10.5402/2012/816402) for results of other four transcripts), which demonstrate a consistent trend with Figure 1(b). With the increased error rate, the range of optimum k's of each coverage depth narrows and a positive correlation between coverage depth and optimum k's becomes noticeable. It should be noticed that, for all datasets with sequencing errors, no k remains optimum for most tested coverage depths.


Enhancing de novo transcriptome assembly by incorporating multiple overlap sizes.

Chen CC, Lin WD, Chang YJ, Chen CL, Ho JM - ISRN Bioinform (2012)

The relationship between optimum k's and coverage depth for one transcriptome sequence in different error rate.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4417554&req=5

fig2: The relationship between optimum k's and coverage depth for one transcriptome sequence in different error rate.
Mentions: Because a sequencing error rate of 0.3% is commonly seen in the control lane of the Illumina Solexa sequencer [17], it is possible that the sequencing error rate might increase for noncontrol lanes. To see the crosstalks among coverage depth, sequencing error rate, and optimum k, we arbitrarily picked five mouse transcripts and generated simulated datasets with coverage depths 2x, 4x, 8x, 16x,…, and 16384x, respectively. Additionally, errors were simulated with average rates of 0%, 0.3%, 0.6%, 0.9%,…, and 2.4% for every coverage depth (Section 4). Figure 2 shows results of one simulated transcript (see Figures S1–S4 in Supplementary Material available on line at doi:10.5402/2012/816402) for results of other four transcripts), which demonstrate a consistent trend with Figure 1(b). With the increased error rate, the range of optimum k's of each coverage depth narrows and a positive correlation between coverage depth and optimum k's becomes noticeable. It should be noticed that, for all datasets with sequencing errors, no k remains optimum for most tested coverage depths.

Bottom Line: Methodology.Significance.The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Science and Information Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei 10617, Taiwan.

ABSTRACT
Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.

No MeSH data available.