Limits...
RBF-TSS: identification of transcription start site in human using radial basis functions network and oligonucleotide positional frequencies.

Mahdi RN, Rouchka EC - PLoS ONE (2009)

Bottom Line: Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome.However, models providing more accurate modeling of promoters and TSS are needed.A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA.

ABSTRACT
Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods.

Show MeSH

Related in: MedlinePlus

ROC curve for chunk sizes 50 and 500.Both axes are scaled to logarithm base 10 to highlight the difference.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2654504&req=5

pone-0004878-g004: ROC curve for chunk sizes 50 and 500.Both axes are scaled to logarithm base 10 to highlight the difference.

Mentions: The true positive rate (TPR) for TSS identification was calculated as the percentage of positive samples identified as such by RBF-TSS while the false positive rate was calculated as the percentage of true negative samples mistakenly labeled as positive. A comparison of these rates is shown in Figs. 4 and 5. The positive predictive value (PPV) is calculated as the ratio of the positive samples whose true label is positive to the total number of samples classified as positive. As illustrated in Fig. 5, the area under the precision recall curve is relatively low due to the fact that the ratio of negative to positive samples is very high, and varies widely between the two cases of chunk size of 50 and 500.


RBF-TSS: identification of transcription start site in human using radial basis functions network and oligonucleotide positional frequencies.

Mahdi RN, Rouchka EC - PLoS ONE (2009)

ROC curve for chunk sizes 50 and 500.Both axes are scaled to logarithm base 10 to highlight the difference.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2654504&req=5

pone-0004878-g004: ROC curve for chunk sizes 50 and 500.Both axes are scaled to logarithm base 10 to highlight the difference.
Mentions: The true positive rate (TPR) for TSS identification was calculated as the percentage of positive samples identified as such by RBF-TSS while the false positive rate was calculated as the percentage of true negative samples mistakenly labeled as positive. A comparison of these rates is shown in Figs. 4 and 5. The positive predictive value (PPV) is calculated as the ratio of the positive samples whose true label is positive to the total number of samples classified as positive. As illustrated in Fig. 5, the area under the precision recall curve is relatively low due to the fact that the ratio of negative to positive samples is very high, and varies widely between the two cases of chunk size of 50 and 500.

Bottom Line: Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome.However, models providing more accurate modeling of promoters and TSS are needed.A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA.

ABSTRACT
Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods.

Show MeSH
Related in: MedlinePlus