Limits...
A multistep bioinformatic approach detects putative regulatory elements in gene promoters.

Bortoluzzi S, Coppe A, Bisognin A, Pizzi C, Danieli GA - BMC Bioinformatics (2005)

Bottom Line: Searching for approximate patterns in large promoter sequences frequently produces an exceedingly high numbers of results.Methodology and results were tested by analysing 1,000 groups of putatively unrelated sequences, randomly selected among 17,156 human gene promoters.The approach described in this paper seems effective for identifying a tractable number of sequence motifs with putative regulatory role.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology, University of Padova - Via Bassi 58/B, 35131, Padova, Italy. stefibo@bio.unipd.it

ABSTRACT

Background: Searching for approximate patterns in large promoter sequences frequently produces an exceedingly high numbers of results. Our aim was to exploit biological knowledge for definition of a sheltered search space and of appropriate search parameters, in order to develop a method for identification of a tractable number of sequence motifs.

Results: Novel software (COOP) was developed for extraction of sequence motifs, based on clustering of exact or approximate patterns according to the frequency of their overlapping occurrences. Genomic sequences of 1 Kb upstream of 91 genes differentially expressed and/or encoding proteins with relevant function in adult human retina were analyzed. Methodology and results were tested by analysing 1,000 groups of putatively unrelated sequences, randomly selected among 17,156 human gene promoters. When applied to a sample of human promoters, the method identified 279 putative motifs frequently occurring in retina promoters sequences. Most of them are localized in the proximal portion of promoters, less variable in central region than in lateral regions and similar to known regulatory sequences. COOP software and reference manual are freely available upon request to the Authors.

Conclusion: The approach described in this paper seems effective for identifying a tractable number of sequence motifs with putative regulatory role.

Show MeSH
Comparison of patterns discovery results in retinal gene promoter sequences and in 1,000 negative control datasets. Plots of number of patterns (12 bp long, with at most two variable positions) vs number of sequences in which they were found, in retinal gene promoter sequences (open squares) and in 1,000 negative control datasets (filled diamonds). For negative control datasets, the average value of 1,000 sets of sequences is given, with a two standard deviations interval. Statistically significant differences (0.05 threshold) are marked by stars. (A) Comparison between the 1000M52 dataset (52 promoter sequences of genes overexpressed in the retina) and the RAN1000M52i dataset (1,000 groups of 52 randomly chosen human promoters); (B) Comparison between the 1000M91 dataset (91 retinal gene promoter sequences) and the RAN1000M91i (1,000 groups of 91 randomly chosen human promoters).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1173081&req=5

Figure 2: Comparison of patterns discovery results in retinal gene promoter sequences and in 1,000 negative control datasets. Plots of number of patterns (12 bp long, with at most two variable positions) vs number of sequences in which they were found, in retinal gene promoter sequences (open squares) and in 1,000 negative control datasets (filled diamonds). For negative control datasets, the average value of 1,000 sets of sequences is given, with a two standard deviations interval. Statistically significant differences (0.05 threshold) are marked by stars. (A) Comparison between the 1000M52 dataset (52 promoter sequences of genes overexpressed in the retina) and the RAN1000M52i dataset (1,000 groups of 52 randomly chosen human promoters); (B) Comparison between the 1000M91 dataset (91 retinal gene promoter sequences) and the RAN1000M91i (1,000 groups of 91 randomly chosen human promoters).

Mentions: In each group of sequences, approximate patterns of length ranging from 10 to 14 nucleotides, with at most two variable positions (10-2, 12-2, 14-2 patterns), were searched by SPEXS (Tables 3 and 4). For each dataset, patterns were ranked in different classes, according to the number of sequences in which they were represented (Tables 3 and 4, Figure 2).


A multistep bioinformatic approach detects putative regulatory elements in gene promoters.

Bortoluzzi S, Coppe A, Bisognin A, Pizzi C, Danieli GA - BMC Bioinformatics (2005)

Comparison of patterns discovery results in retinal gene promoter sequences and in 1,000 negative control datasets. Plots of number of patterns (12 bp long, with at most two variable positions) vs number of sequences in which they were found, in retinal gene promoter sequences (open squares) and in 1,000 negative control datasets (filled diamonds). For negative control datasets, the average value of 1,000 sets of sequences is given, with a two standard deviations interval. Statistically significant differences (0.05 threshold) are marked by stars. (A) Comparison between the 1000M52 dataset (52 promoter sequences of genes overexpressed in the retina) and the RAN1000M52i dataset (1,000 groups of 52 randomly chosen human promoters); (B) Comparison between the 1000M91 dataset (91 retinal gene promoter sequences) and the RAN1000M91i (1,000 groups of 91 randomly chosen human promoters).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1173081&req=5

Figure 2: Comparison of patterns discovery results in retinal gene promoter sequences and in 1,000 negative control datasets. Plots of number of patterns (12 bp long, with at most two variable positions) vs number of sequences in which they were found, in retinal gene promoter sequences (open squares) and in 1,000 negative control datasets (filled diamonds). For negative control datasets, the average value of 1,000 sets of sequences is given, with a two standard deviations interval. Statistically significant differences (0.05 threshold) are marked by stars. (A) Comparison between the 1000M52 dataset (52 promoter sequences of genes overexpressed in the retina) and the RAN1000M52i dataset (1,000 groups of 52 randomly chosen human promoters); (B) Comparison between the 1000M91 dataset (91 retinal gene promoter sequences) and the RAN1000M91i (1,000 groups of 91 randomly chosen human promoters).
Mentions: In each group of sequences, approximate patterns of length ranging from 10 to 14 nucleotides, with at most two variable positions (10-2, 12-2, 14-2 patterns), were searched by SPEXS (Tables 3 and 4). For each dataset, patterns were ranked in different classes, according to the number of sequences in which they were represented (Tables 3 and 4, Figure 2).

Bottom Line: Searching for approximate patterns in large promoter sequences frequently produces an exceedingly high numbers of results.Methodology and results were tested by analysing 1,000 groups of putatively unrelated sequences, randomly selected among 17,156 human gene promoters.The approach described in this paper seems effective for identifying a tractable number of sequence motifs with putative regulatory role.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Biology, University of Padova - Via Bassi 58/B, 35131, Padova, Italy. stefibo@bio.unipd.it

ABSTRACT

Background: Searching for approximate patterns in large promoter sequences frequently produces an exceedingly high numbers of results. Our aim was to exploit biological knowledge for definition of a sheltered search space and of appropriate search parameters, in order to develop a method for identification of a tractable number of sequence motifs.

Results: Novel software (COOP) was developed for extraction of sequence motifs, based on clustering of exact or approximate patterns according to the frequency of their overlapping occurrences. Genomic sequences of 1 Kb upstream of 91 genes differentially expressed and/or encoding proteins with relevant function in adult human retina were analyzed. Methodology and results were tested by analysing 1,000 groups of putatively unrelated sequences, randomly selected among 17,156 human gene promoters. When applied to a sample of human promoters, the method identified 279 putative motifs frequently occurring in retina promoters sequences. Most of them are localized in the proximal portion of promoters, less variable in central region than in lateral regions and similar to known regulatory sequences. COOP software and reference manual are freely available upon request to the Authors.

Conclusion: The approach described in this paper seems effective for identifying a tractable number of sequence motifs with putative regulatory role.

Show MeSH