Limits...
MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes.

Marinescu VD, Kohane IS, Riva A - BMC Bioinformatics (2005)

Bottom Line: In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods.Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Children's Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA. vdmarinescu@chip.org

ABSTRACT

Background: Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.

Results: We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.

Conclusion: The search engine, available at http://mapper.chip.org, allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

Show MeSH

Related in: MedlinePlus

The page for model T05206 for E2F-4:DP-1. The model page displays detailed information regarding the model including the name and (if available) organism and classification of the factor, the model length, the number of sequences in the alignment used to train the model and the references used to select these sequences. The page also displays the HMM logo generated using the LogoMat-M software [89].
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1131891&req=5

Figure 5: The page for model T05206 for E2F-4:DP-1. The model page displays detailed information regarding the model including the name and (if available) organism and classification of the factor, the model length, the number of sequences in the alignment used to train the model and the references used to select these sequences. The page also displays the HMM logo generated using the LogoMat-M software [89].

Mentions: Figures 3 and 4 present the output of MAPPER when the human MCM5 gene (Entrez Gene ID 4174) is used as an example. The promoter of the human MCM5 gene contains multiple experimentally characterized binding sites for the E2F transcription factor. These binding sites were retrieved by our search, and were found to be conserved across the human, mouse and Drosophila MCM5 orthologs. MCM5 genes code for proteins involved in the initiation of DNA replication [83], and are members of the MCM family of chromatin-binding proteins that participate in cell cycle regulation. The E2F family of transcription factors plays a critical role in the control of cell proliferation and consists of six factors, E2F-1 to E2F-6, that heterodimerize with two other subunits, DP-1 and DP-2; the activity of these complexes is modulated by the retinoblastoma tumor suppressor protein (pRB) that binds E2F [84]. TRANSFAC and our database contain multiple models describing the binding sites in target genes characterized for different combinations of E2F and DP proteins, complexed or not with pRB. Below, we refer to these models generically as "E2F" models given the fact that, while the transcriptional role of E2F family members is different given the identity of the E2F and DP moieties that forms the complex [85], no specificity has been detected in vivo for the association of particular complexes to known E2F-regulated promoters [86,87]. Experimental evidence showed that the upregulation of the human MCM5 gene in response to growth stimulation is mediated by the binding of E2F to four sites within the MCM5 promoter, and that mutations in these sites abolish this response [88]. The four E2F binding sites consist of two sets of overlapping sequences running on opposite strands and were mapped by RNase protection assays to positions -194 to -183 and +2 to +13 respectively, relative to the start of the transcript [88]. In our search, three models for the E2F family (MA0024 for E2F, T05206 for E2F-4:DP-1 and T05208 for Rb:E2F-1:DP-1) retrieved all four E2F sites at the location and in orientation described in the literature (Figures 3 and 4). To simplify the display in these figures and to highlight the four E2F binding sites retrieved by the three models, a more stringent set of parameters was used for the query (500 bp upstream of the ATG, score > 3, E-value < 6.8). Figure 3 shows the list of all TFBSs retrieved given these input parameters with the four E2F binding sites described by Ohtani et al. boxed, as well as the list of factors for which putative binding sites where found also in the other two MCM5 homologs selected (mouse and Drosophila MCM5). For each hit in the listing the model identifier is displayed as a double link, to a pop-up window showing the match between the sequence and the model (Figure 3) and to a separate page giving detailed information regarding the model including its length, the number of sequences in the training set, associated models, HMM logo [89], and the references used to build the alignment (Figure 5).


MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes.

Marinescu VD, Kohane IS, Riva A - BMC Bioinformatics (2005)

The page for model T05206 for E2F-4:DP-1. The model page displays detailed information regarding the model including the name and (if available) organism and classification of the factor, the model length, the number of sequences in the alignment used to train the model and the references used to select these sequences. The page also displays the HMM logo generated using the LogoMat-M software [89].
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1131891&req=5

Figure 5: The page for model T05206 for E2F-4:DP-1. The model page displays detailed information regarding the model including the name and (if available) organism and classification of the factor, the model length, the number of sequences in the alignment used to train the model and the references used to select these sequences. The page also displays the HMM logo generated using the LogoMat-M software [89].
Mentions: Figures 3 and 4 present the output of MAPPER when the human MCM5 gene (Entrez Gene ID 4174) is used as an example. The promoter of the human MCM5 gene contains multiple experimentally characterized binding sites for the E2F transcription factor. These binding sites were retrieved by our search, and were found to be conserved across the human, mouse and Drosophila MCM5 orthologs. MCM5 genes code for proteins involved in the initiation of DNA replication [83], and are members of the MCM family of chromatin-binding proteins that participate in cell cycle regulation. The E2F family of transcription factors plays a critical role in the control of cell proliferation and consists of six factors, E2F-1 to E2F-6, that heterodimerize with two other subunits, DP-1 and DP-2; the activity of these complexes is modulated by the retinoblastoma tumor suppressor protein (pRB) that binds E2F [84]. TRANSFAC and our database contain multiple models describing the binding sites in target genes characterized for different combinations of E2F and DP proteins, complexed or not with pRB. Below, we refer to these models generically as "E2F" models given the fact that, while the transcriptional role of E2F family members is different given the identity of the E2F and DP moieties that forms the complex [85], no specificity has been detected in vivo for the association of particular complexes to known E2F-regulated promoters [86,87]. Experimental evidence showed that the upregulation of the human MCM5 gene in response to growth stimulation is mediated by the binding of E2F to four sites within the MCM5 promoter, and that mutations in these sites abolish this response [88]. The four E2F binding sites consist of two sets of overlapping sequences running on opposite strands and were mapped by RNase protection assays to positions -194 to -183 and +2 to +13 respectively, relative to the start of the transcript [88]. In our search, three models for the E2F family (MA0024 for E2F, T05206 for E2F-4:DP-1 and T05208 for Rb:E2F-1:DP-1) retrieved all four E2F sites at the location and in orientation described in the literature (Figures 3 and 4). To simplify the display in these figures and to highlight the four E2F binding sites retrieved by the three models, a more stringent set of parameters was used for the query (500 bp upstream of the ATG, score > 3, E-value < 6.8). Figure 3 shows the list of all TFBSs retrieved given these input parameters with the four E2F binding sites described by Ohtani et al. boxed, as well as the list of factors for which putative binding sites where found also in the other two MCM5 homologs selected (mouse and Drosophila MCM5). For each hit in the listing the model identifier is displayed as a double link, to a pop-up window showing the match between the sequence and the model (Figure 3) and to a separate page giving detailed information regarding the model including its length, the number of sequences in the training set, associated models, HMM logo [89], and the references used to build the alignment (Figure 5).

Bottom Line: In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods.Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Children's Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA. vdmarinescu@chip.org

ABSTRACT

Background: Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.

Results: We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.

Conclusion: The search engine, available at http://mapper.chip.org, allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

Show MeSH
Related in: MedlinePlus