Limits...
Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species.

Huang WL, Tung CW, Liaw C, Huang HL, Ho SY - ScientificWorldJournal (2014)

Bottom Line: PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors.Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction.The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

View Article: PubMed Central - PubMed

Affiliation: Department of Management Information System, Asia Pacific Institute of Creativity, Miaoli 351, Taiwan.

ABSTRACT
The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

Show MeSH
The promoter of a DNA sequence containing a transcription factor binding site and a TATA box is immediately upstream to a transcription start site.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3927563&req=5

fig1: The promoter of a DNA sequence containing a transcription factor binding site and a TATA box is immediately upstream to a transcription start site.

Mentions: Gene expression is often regulated by the transcription rate, which is largely controlled by the binding of RNA polymerase II (Pol II) to the regulatory regions of DNA sequences in eukaryotic cells [1]. The regulatory regions (called promoters) that contain a transcription factor binding site and a TATA box are immediately upstream of transcription start sites at which transcription factors and Pol II are accumulated to initiate the transcription (Figure 1) [2, 3]. Promoters are extremely diverse and difficult to identify experimentally using specific sequence patterns or motifs [3, 4]. Therefore, the identification of promoters is very challenging, especially in the sequencing of eukaryotic genomes. Some methods for predicting promoters have been developed, and these methods may be categorized into the following four classes according their types of sequence features (see Table 1).


Rule-based knowledge acquisition method for promoter prediction in human and Drosophila species.

Huang WL, Tung CW, Liaw C, Huang HL, Ho SY - ScientificWorldJournal (2014)

The promoter of a DNA sequence containing a transcription factor binding site and a TATA box is immediately upstream to a transcription start site.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3927563&req=5

fig1: The promoter of a DNA sequence containing a transcription factor binding site and a TATA box is immediately upstream to a transcription start site.
Mentions: Gene expression is often regulated by the transcription rate, which is largely controlled by the binding of RNA polymerase II (Pol II) to the regulatory regions of DNA sequences in eukaryotic cells [1]. The regulatory regions (called promoters) that contain a transcription factor binding site and a TATA box are immediately upstream of transcription start sites at which transcription factors and Pol II are accumulated to initiate the transcription (Figure 1) [2, 3]. Promoters are extremely diverse and difficult to identify experimentally using specific sequence patterns or motifs [3, 4]. Therefore, the identification of promoters is very challenging, especially in the sequencing of eukaryotic genomes. Some methods for predicting promoters have been developed, and these methods may be categorized into the following four classes according their types of sequence features (see Table 1).

Bottom Line: PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors.Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction.The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

View Article: PubMed Central - PubMed

Affiliation: Department of Management Information System, Asia Pacific Institute of Creativity, Miaoli 351, Taiwan.

ABSTRACT
The rapid and reliable identification of promoter regions is important when the number of genomes to be sequenced is increasing very speedily. Various methods have been developed but few methods investigate the effectiveness of sequence-based features in promoter prediction. This study proposes a knowledge acquisition method (named PromHD) based on if-then rules for promoter prediction in human and Drosophila species. PromHD utilizes an effective feature-mining algorithm and a reference feature set of 167 DNA sequence descriptors (DNASDs), comprising three descriptors of physicochemical properties (absorption maxima, molecular weight, and molar absorption coefficient), 128 top-ranked descriptors of 4-mer motifs, and 36 global sequence descriptors. PromHD identifies two feature subsets with 99 and 74 DNASDs and yields test accuracies of 96.4% and 97.5% in human and Drosophila species, respectively. Based on the 99- and 74-dimensional feature vectors, PromHD generates several if-then rules by using the decision tree mechanism for promoter prediction. The top-ranked informative rules with high certainty grades reveal that the global sequence descriptor, the length of nucleotide A at the first position of the sequence, and two physicochemical properties, absorption maxima and molecular weight, are effective in distinguishing promoters from non-promoters in human and Drosophila species, respectively.

Show MeSH