Limits...
MetaProm: a neural network based meta-predictor for alternative human promoter prediction.

Wang J, Ungar LH, Tseng H, Hannenhalli S - BMC Genomics (2007)

Bottom Line: We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands.Our meta-predictor outperforms any individual PPP in sensitivity and specificity.Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA. junwen2u@gmail.com

ABSTRACT

Background: De novo eukaryotic promoter prediction is important for discovering novel genes and understanding gene regulation. In spite of the great advances made in the past decade, recent studies revealed that the overall performances of the current promoter prediction programs (PPPs) are still poor, and predictions made by individual PPPs do not overlap each other. Furthermore, most PPPs are trained and tested on the most-upstream promoters; their performances on alternative promoters have not been assessed.

Results: In this paper, we evaluate the performances of current major promoter prediction programs (i.e., PSPA, FirstEF, McPromoter, DragonGSF, DragonPF, and FProm) using 42,536 distinct human gene promoters on a genome-wide scale, and with emphasis on alternative promoters. We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands. Our specific analysis of recently discovered alternative promoters reveals that although only 41% of the 3' most promoters overlap a CpG island, 74% of 5' most promoters overlap a CpG island.

Conclusion: Our assessment of six PPPs on 1.06 x 109 bps of human genome sequence reveals the specific strengths and weaknesses of individual PPPs. Our meta-predictor outperforms any individual PPP in sensitivity and specificity. Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.

Show MeSH

Related in: MedlinePlus

Histogram of distances between Transcription Start Site (TSS) and Coding Start (CDS). ATSS: based on 30,964 Alternative TSS from DBTSS database; RefSeq: based on 25,647 TSS from RefSeq database; DBTSS 5': based on 14,628 most upstream TSS from DBTSS database, a subset of ATSS. All data are binned by size of 1 kb, with registered on the x-axis by the middle point. Positive values in the x-axis indicate TSS is upstream of CDS. Note that there is no significant difference between RefSeq and DBTSS 5'. ATSS from DBTSS is present both up- and down-stream of CDS, with a symmetrical distribution around the bin of 500.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194789&req=5

Figure 2: Histogram of distances between Transcription Start Site (TSS) and Coding Start (CDS). ATSS: based on 30,964 Alternative TSS from DBTSS database; RefSeq: based on 25,647 TSS from RefSeq database; DBTSS 5': based on 14,628 most upstream TSS from DBTSS database, a subset of ATSS. All data are binned by size of 1 kb, with registered on the x-axis by the middle point. Positive values in the x-axis indicate TSS is upstream of CDS. Note that there is no significant difference between RefSeq and DBTSS 5'. ATSS from DBTSS is present both up- and down-stream of CDS, with a symmetrical distribution around the bin of 500.

Mentions: We used TSS annotations from DBTSS[33] and RefSeq [34] as our reference. Since DBTSS includes alternative TSS, we extracted the must upstream TSS as a subset, named DBTSS 5'. We compared the distance between MUTSS and the upstream coding sequence (CDS) documented in these databases and found no significant difference (Figure 2). Both DBTSS 5' and RefSeq annotated TSSs were upstream of the CDS, and about 67% were within 1 kb upstream. However, when DBTSS ATSSs were counted, only 30% of the ATSSs were within 1 kb upstream of CDS, and the rest distributed symmetrically around this region.


MetaProm: a neural network based meta-predictor for alternative human promoter prediction.

Wang J, Ungar LH, Tseng H, Hannenhalli S - BMC Genomics (2007)

Histogram of distances between Transcription Start Site (TSS) and Coding Start (CDS). ATSS: based on 30,964 Alternative TSS from DBTSS database; RefSeq: based on 25,647 TSS from RefSeq database; DBTSS 5': based on 14,628 most upstream TSS from DBTSS database, a subset of ATSS. All data are binned by size of 1 kb, with registered on the x-axis by the middle point. Positive values in the x-axis indicate TSS is upstream of CDS. Note that there is no significant difference between RefSeq and DBTSS 5'. ATSS from DBTSS is present both up- and down-stream of CDS, with a symmetrical distribution around the bin of 500.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194789&req=5

Figure 2: Histogram of distances between Transcription Start Site (TSS) and Coding Start (CDS). ATSS: based on 30,964 Alternative TSS from DBTSS database; RefSeq: based on 25,647 TSS from RefSeq database; DBTSS 5': based on 14,628 most upstream TSS from DBTSS database, a subset of ATSS. All data are binned by size of 1 kb, with registered on the x-axis by the middle point. Positive values in the x-axis indicate TSS is upstream of CDS. Note that there is no significant difference between RefSeq and DBTSS 5'. ATSS from DBTSS is present both up- and down-stream of CDS, with a symmetrical distribution around the bin of 500.
Mentions: We used TSS annotations from DBTSS[33] and RefSeq [34] as our reference. Since DBTSS includes alternative TSS, we extracted the must upstream TSS as a subset, named DBTSS 5'. We compared the distance between MUTSS and the upstream coding sequence (CDS) documented in these databases and found no significant difference (Figure 2). Both DBTSS 5' and RefSeq annotated TSSs were upstream of the CDS, and about 67% were within 1 kb upstream. However, when DBTSS ATSSs were counted, only 30% of the ATSSs were within 1 kb upstream of CDS, and the rest distributed symmetrically around this region.

Bottom Line: We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands.Our meta-predictor outperforms any individual PPP in sensitivity and specificity.Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA. junwen2u@gmail.com

ABSTRACT

Background: De novo eukaryotic promoter prediction is important for discovering novel genes and understanding gene regulation. In spite of the great advances made in the past decade, recent studies revealed that the overall performances of the current promoter prediction programs (PPPs) are still poor, and predictions made by individual PPPs do not overlap each other. Furthermore, most PPPs are trained and tested on the most-upstream promoters; their performances on alternative promoters have not been assessed.

Results: In this paper, we evaluate the performances of current major promoter prediction programs (i.e., PSPA, FirstEF, McPromoter, DragonGSF, DragonPF, and FProm) using 42,536 distinct human gene promoters on a genome-wide scale, and with emphasis on alternative promoters. We describe an artificial neural network (ANN) based meta-predictor program that integrates predictions from the current PPPs and the predicted promoters' relation to CpG islands. Our specific analysis of recently discovered alternative promoters reveals that although only 41% of the 3' most promoters overlap a CpG island, 74% of 5' most promoters overlap a CpG island.

Conclusion: Our assessment of six PPPs on 1.06 x 109 bps of human genome sequence reveals the specific strengths and weaknesses of individual PPPs. Our meta-predictor outperforms any individual PPP in sensitivity and specificity. Furthermore, we discovered that the 5' alternative promoters are more likely to be associated with a CpG island.

Show MeSH
Related in: MedlinePlus