Limits...
Comparative evaluation of intron prediction methods and detection of plant genome annotation using intron length distributions.

Yang L, Cho HG - Genomics Inform (2012)

Bottom Line: The results showed that each of the tools had its own advantages and disadvantages.This discrepancy could have been the result of errors in intron prediction.It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

View Article: PubMed Central - PubMed

Affiliation: Tobacco Laboratory, Shandong Agricultural University, Shandong 271-018, China.

ABSTRACT
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

No MeSH data available.


Related in: MedlinePlus

An example of three phases of intron from an Arabidopsis gene, AT1G17600.1. Upper/lowercase sequence indicates exon/intron sequence. Asterisks indicate frameshifts introduced by non-3n introns; intronic in-frame stop codons are underlined. Intron 1 is a 99-bp intron (3n) with one in-frame stop codon. Intron 2 is a 100-bp intron (3n + 2), which has two in-frame stop codons and thus does not interrupt the open reading frame. Intron 3 is a 74-bp intron (3n + 1) with three stop codons.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC3475488&req=5

Figure 2: An example of three phases of intron from an Arabidopsis gene, AT1G17600.1. Upper/lowercase sequence indicates exon/intron sequence. Asterisks indicate frameshifts introduced by non-3n introns; intronic in-frame stop codons are underlined. Intron 1 is a 99-bp intron (3n) with one in-frame stop codon. Intron 2 is a 100-bp intron (3n + 2), which has two in-frame stop codons and thus does not interrupt the open reading frame. Intron 3 is a 74-bp intron (3n + 1) with three stop codons.

Mentions: According to Roy's method, many predicted introns in the plant genomes had in-frame stop codons, and the predicted introns in these genomes were equally as likely to be a multiple of 3 bp (3n) as to contain a plus one (3n + 1) or two (3n + 2) bp. Here was an example of three phases from an Arabidopsis thaliana gene, AT1G17600.1 (Fig. 2).


Comparative evaluation of intron prediction methods and detection of plant genome annotation using intron length distributions.

Yang L, Cho HG - Genomics Inform (2012)

An example of three phases of intron from an Arabidopsis gene, AT1G17600.1. Upper/lowercase sequence indicates exon/intron sequence. Asterisks indicate frameshifts introduced by non-3n introns; intronic in-frame stop codons are underlined. Intron 1 is a 99-bp intron (3n) with one in-frame stop codon. Intron 2 is a 100-bp intron (3n + 2), which has two in-frame stop codons and thus does not interrupt the open reading frame. Intron 3 is a 74-bp intron (3n + 1) with three stop codons.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC3475488&req=5

Figure 2: An example of three phases of intron from an Arabidopsis gene, AT1G17600.1. Upper/lowercase sequence indicates exon/intron sequence. Asterisks indicate frameshifts introduced by non-3n introns; intronic in-frame stop codons are underlined. Intron 1 is a 99-bp intron (3n) with one in-frame stop codon. Intron 2 is a 100-bp intron (3n + 2), which has two in-frame stop codons and thus does not interrupt the open reading frame. Intron 3 is a 74-bp intron (3n + 1) with three stop codons.
Mentions: According to Roy's method, many predicted introns in the plant genomes had in-frame stop codons, and the predicted introns in these genomes were equally as likely to be a multiple of 3 bp (3n) as to contain a plus one (3n + 1) or two (3n + 2) bp. Here was an example of three phases from an Arabidopsis thaliana gene, AT1G17600.1 (Fig. 2).

Bottom Line: The results showed that each of the tools had its own advantages and disadvantages.This discrepancy could have been the result of errors in intron prediction.It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

View Article: PubMed Central - PubMed

Affiliation: Tobacco Laboratory, Shandong Agricultural University, Shandong 271-018, China.

ABSTRACT
Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

No MeSH data available.


Related in: MedlinePlus