Limits...
RBF-TSS: identification of transcription start site in human using radial basis functions network and oligonucleotide positional frequencies.

Mahdi RN, Rouchka EC - PLoS ONE (2009)

Bottom Line: Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome.However, models providing more accurate modeling of promoters and TSS are needed.A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA.

ABSTRACT
Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods.

Show MeSH
Training sequences are divided around the TSS with overlapping regions.This specific subdivision shows feature 7 settings, as described in Table 1.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2654504&req=5

pone-0004878-g001: Training sequences are divided around the TSS with overlapping regions.This specific subdivision shows feature 7 settings, as described in Table 1.

Mentions: In order to capture the characteristics of the given promoter sequences, training sequences with known TSS are divided into overlapping regions (Fig. 1). Either 4-mer or 3-mer oligonucleotide frequencies are measured in every sub-region. All of these sub-frequencies are combined to form a feature vector to describe and represent the given sequence sample. This approach is a compromise between methods that use the frequencies of all oligonucleotides around the TSS regardless of their positions, and those that measure positional densities at every single base relative to the TSS. Knowing the region in which each oligonucleotide occurs yields approximate positional information about the motifs.


RBF-TSS: identification of transcription start site in human using radial basis functions network and oligonucleotide positional frequencies.

Mahdi RN, Rouchka EC - PLoS ONE (2009)

Training sequences are divided around the TSS with overlapping regions.This specific subdivision shows feature 7 settings, as described in Table 1.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2654504&req=5

pone-0004878-g001: Training sequences are divided around the TSS with overlapping regions.This specific subdivision shows feature 7 settings, as described in Table 1.
Mentions: In order to capture the characteristics of the given promoter sequences, training sequences with known TSS are divided into overlapping regions (Fig. 1). Either 4-mer or 3-mer oligonucleotide frequencies are measured in every sub-region. All of these sub-frequencies are combined to form a feature vector to describe and represent the given sequence sample. This approach is a compromise between methods that use the frequencies of all oligonucleotides around the TSS regardless of their positions, and those that measure positional densities at every single base relative to the TSS. Knowing the region in which each oligonucleotide occurs yields approximate positional information about the motifs.

Bottom Line: Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome.However, models providing more accurate modeling of promoters and TSS are needed.A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm.

View Article: PubMed Central - PubMed

Affiliation: Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY, USA.

ABSTRACT
Accurate identification of promoter regions and transcription start sites (TSS) in genomic DNA allows for a more complete understanding of the structure of genes and gene regulation within a given genome. Many recently published methods have achieved high identification accuracy of TSS. However, models providing more accurate modeling of promoters and TSS are needed. A novel identification method for identifying transcription start sites that improves the accuracy of TSS recognition for recently published methods is proposed. This method incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function neural network for identifying transcription start sites (RBF-TSS) is proposed and employed as a classification algorithm. Using non-overlapping chunks (windows) of size 50 and 500 on the human genome, the proposed method achieves an area under the Receiver Operator Characteristic curve (auROC) of 94.75% and 95.08% respectively, providing increased performance over existing TSS prediction methods.

Show MeSH