Limits...
MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes.

Marinescu VD, Kohane IS, Riva A - BMC Bioinformatics (2005)

Bottom Line: In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods.Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Children's Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA. vdmarinescu@chip.org

ABSTRACT

Background: Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.

Results: We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.

Conclusion: The search engine, available at http://mapper.chip.org, allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

Show MeSH
The selection page of the search engine. The selection page for the MCM5 gene displays detailed information on the gene and its homologs available in our database, and allows the user to select the gene region to be scanned. The same region will be scanned for all homologs included in the search.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC1131891&req=5

Figure 2: The selection page of the search engine. The selection page for the MCM5 gene displays detailed information on the gene and its homologs available in our database, and allows the user to select the gene region to be scanned. The same region will be scanned for all homologs included in the search.

Mentions: We designed and implemented a web-based application, called MAPPER (Multi-genome Analysis of Positions and Patterns of Elements of Regulation), to facilitate the retrieval of putative TFBSs in a given sequence based on the library of 1,079 HMM models described above. The interface takes as input a gene identifier (e.g. NCBI Gene ID, RNA accession number) or a user supplied sequence in FastA format. The user then selects the models to be used (all, TRANSFAC or JASPAR only) and has the option to build his/her own model starting with a multiple sequence alignment of binding sites in FastA format. The search can be performed on the entire gene region flanked by a user-specified distance upstream and downstream, on a specified gene region (promoter, introns, exons, 3'-UTR) or within a certain distance upstream of the ATG or the start of the transcript (Figure 2). If a gene identifier is provided the program will also display the actual nucleotide sequence scanned (in FastA and Genbank format), a useful option in the case of those genes for which discrepancies exist between different annotations.


MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes.

Marinescu VD, Kohane IS, Riva A - BMC Bioinformatics (2005)

The selection page of the search engine. The selection page for the MCM5 gene displays detailed information on the gene and its homologs available in our database, and allows the user to select the gene region to be scanned. The same region will be scanned for all homologs included in the search.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC1131891&req=5

Figure 2: The selection page of the search engine. The selection page for the MCM5 gene displays detailed information on the gene and its homologs available in our database, and allows the user to select the gene region to be scanned. The same region will be scanned for all homologs included in the search.
Mentions: We designed and implemented a web-based application, called MAPPER (Multi-genome Analysis of Positions and Patterns of Elements of Regulation), to facilitate the retrieval of putative TFBSs in a given sequence based on the library of 1,079 HMM models described above. The interface takes as input a gene identifier (e.g. NCBI Gene ID, RNA accession number) or a user supplied sequence in FastA format. The user then selects the models to be used (all, TRANSFAC or JASPAR only) and has the option to build his/her own model starting with a multiple sequence alignment of binding sites in FastA format. The search can be performed on the entire gene region flanked by a user-specified distance upstream and downstream, on a specified gene region (promoter, introns, exons, 3'-UTR) or within a certain distance upstream of the ATG or the start of the transcript (Figure 2). If a gene identifier is provided the program will also display the actual nucleotide sequence scanned (in FastA and Genbank format), a useful option in the case of those genes for which discrepancies exist between different annotations.

Bottom Line: In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods.Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Children's Hospital Boston, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA. vdmarinescu@chip.org

ABSTRACT

Background: Cis-regulatory modules are combinations of regulatory elements occurring in close proximity to each other that control the spatial and temporal expression of genes. The ability to identify them in a genome-wide manner depends on the availability of accurate models and of search methods able to detect putative regulatory elements with enhanced sensitivity and specificity.

Results: We describe the implementation of a search method for putative transcription factor binding sites (TFBSs) based on hidden Markov models built from alignments of known sites. We built 1,079 models of TFBSs using experimentally determined sequence alignments of sites provided by the TRANSFAC and JASPAR databases and used them to scan sequences of the human, mouse, fly, worm and yeast genomes. In several cases tested the method identified correctly experimentally characterized sites, with better specificity and sensitivity than other similar computational methods. Moreover, a large-scale comparison using synthetic data showed that in the majority of cases our method performed significantly better than a nucleotide weight matrix-based method.

Conclusion: The search engine, available at http://mapper.chip.org, allows the identification, visualization and selection of putative TFBSs occurring in the promoter or other regions of a gene from the human, mouse, fly, worm and yeast genomes. In addition it allows the user to upload a sequence to query and to build a model by supplying a multiple sequence alignment of binding sites for a transcription factor of interest. Due to its extensive database of models, powerful search engine and flexible interface, MAPPER represents an effective resource for the large-scale computational analysis of transcriptional regulation.

Show MeSH