Limits...
Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.

Lomsadze A, Burns PD, Borodovsky M - Nucleic Acids Res. (2014)

Bottom Line: Use of 'assembled' RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments.We demonstrated in computational experiments that the proposed method of incorporation of 'unassembled' RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%.In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.

View Article: PubMed Central - PubMed

Affiliation: Joint Georgia Tech and Emory Wallace H. Coulter Department of Biomedical Engineering, Atlanta, GA, USA 30332.

Show MeSH

Related in: MedlinePlus

Observed dynamics of change in iterations of the mean of Sn and Sp internal exon prediction values for the GeneMark-ET and GeneMark-ES algorithms in cases of Drosophila melanogaster (A) and Anopheles aegypti (B) genomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4150757&req=5

Figure 4: Observed dynamics of change in iterations of the mean of Sn and Sp internal exon prediction values for the GeneMark-ET and GeneMark-ES algorithms in cases of Drosophila melanogaster (A) and Anopheles aegypti (B) genomes.

Mentions: We analyzed the dependence of mean values of internal exon Sn and Sp on iteration index for D. melanogaster and A. aegypti genomes for both GeneMark-ES and GeneMark-ET (Figure 4). The GeneMark-ET initial parameterization integrating information from mapped RNA-Seq reads improved accuracy of predictions in the first iteration by 55–60% in comparison with GeneMark-ES. For D. melanogaster, further iterations reduced the large initial gap in accuracy down to 4%. In contrast, for the large A. aegypti genome, although the gap was reduced with iterations, the accuracy of GeneMark-ET at convergence remained almost 20% higher than one of GeneMark-ES. Also, GeneMark-ET reached convergence 2–3 iterations earlier (Figure 4). The reduction in number of iterations was observed for the other three genomes as well (data not shown).


Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.

Lomsadze A, Burns PD, Borodovsky M - Nucleic Acids Res. (2014)

Observed dynamics of change in iterations of the mean of Sn and Sp internal exon prediction values for the GeneMark-ET and GeneMark-ES algorithms in cases of Drosophila melanogaster (A) and Anopheles aegypti (B) genomes.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4150757&req=5

Figure 4: Observed dynamics of change in iterations of the mean of Sn and Sp internal exon prediction values for the GeneMark-ET and GeneMark-ES algorithms in cases of Drosophila melanogaster (A) and Anopheles aegypti (B) genomes.
Mentions: We analyzed the dependence of mean values of internal exon Sn and Sp on iteration index for D. melanogaster and A. aegypti genomes for both GeneMark-ES and GeneMark-ET (Figure 4). The GeneMark-ET initial parameterization integrating information from mapped RNA-Seq reads improved accuracy of predictions in the first iteration by 55–60% in comparison with GeneMark-ES. For D. melanogaster, further iterations reduced the large initial gap in accuracy down to 4%. In contrast, for the large A. aegypti genome, although the gap was reduced with iterations, the accuracy of GeneMark-ET at convergence remained almost 20% higher than one of GeneMark-ES. Also, GeneMark-ET reached convergence 2–3 iterations earlier (Figure 4). The reduction in number of iterations was observed for the other three genomes as well (data not shown).

Bottom Line: Use of 'assembled' RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments.We demonstrated in computational experiments that the proposed method of incorporation of 'unassembled' RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%.In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.

View Article: PubMed Central - PubMed

Affiliation: Joint Georgia Tech and Emory Wallace H. Coulter Department of Biomedical Engineering, Atlanta, GA, USA 30332.

Show MeSH
Related in: MedlinePlus