Limits...
De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.

Niu M, Tabari ES, Su Z - BMC Genomics (2014)

Bottom Line: DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way.We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USA. zcsu@uncc.edu.

ABSTRACT

Background: In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.

Results: We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.

Conclusion: Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.

Show MeSH
A schematic view of our hypothesis. If the binding peak is shorter than 3,000 bp, we equally extended from the two ends to have a length up to 3,000 bp. We assume that in addition to the CREs of the ChIP-ed TF (red circle), CREs of different cooperative TFs (the other shapes) are also enriched in the neighborhoods of at least some subsets of the binding peak dataset. Each line represents an extended binding peak sequence.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4265420&req=5

Fig1: A schematic view of our hypothesis. If the binding peak is shorter than 3,000 bp, we equally extended from the two ends to have a length up to 3,000 bp. We assume that in addition to the CREs of the ChIP-ed TF (red circle), CREs of different cooperative TFs (the other shapes) are also enriched in the neighborhoods of at least some subsets of the binding peak dataset. Each line represents an extended binding peak sequence.

Mentions: As TFs in eukaryotes tend to work together by binding to their CREs in CRMs with a typical size of 500 ~ 3,000 bp[52], we assume that although a ChIP experiment is mainly aimed to identify the binding locations of the ChIP-ed TF, if we extend shorter binding peaks toward the two ends to reach the typical size of CRMs (e.g., 3,000 bp), then extended binding peaks are more likely to contain the CREs of different cooperative TFs (TFs that co-act in a CRM) in addition to the CREs of the ChIP-ed TF as illustrated in Figure 1. In other words, if two different TFs (e.g. the red circle and black circle TFs in Figure 1) cooperatively regulate the same regulons in certain cell types by binding to their respective CREs in CRMs, then their extended ChIP binding peaks from these cell types should overlap with one another to some extent. Hence, if we have enough number of ChIP datasets for different TFs from the same and/or different cell types, then the datasets are likely to include overlapping binding peaks for cooperative TFs. Accordingly, our algorithm predicts CRMs through identifying overrepresented co-occurring putative motif patterns in a large number of ChIP datasets, ideally for different TFs in different cell types and developmental stages.Figure 1


De novo prediction of cis-regulatory elements and modules through integrative analysis of a large number of ChIP datasets.

Niu M, Tabari ES, Su Z - BMC Genomics (2014)

A schematic view of our hypothesis. If the binding peak is shorter than 3,000 bp, we equally extended from the two ends to have a length up to 3,000 bp. We assume that in addition to the CREs of the ChIP-ed TF (red circle), CREs of different cooperative TFs (the other shapes) are also enriched in the neighborhoods of at least some subsets of the binding peak dataset. Each line represents an extended binding peak sequence.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4265420&req=5

Fig1: A schematic view of our hypothesis. If the binding peak is shorter than 3,000 bp, we equally extended from the two ends to have a length up to 3,000 bp. We assume that in addition to the CREs of the ChIP-ed TF (red circle), CREs of different cooperative TFs (the other shapes) are also enriched in the neighborhoods of at least some subsets of the binding peak dataset. Each line represents an extended binding peak sequence.
Mentions: As TFs in eukaryotes tend to work together by binding to their CREs in CRMs with a typical size of 500 ~ 3,000 bp[52], we assume that although a ChIP experiment is mainly aimed to identify the binding locations of the ChIP-ed TF, if we extend shorter binding peaks toward the two ends to reach the typical size of CRMs (e.g., 3,000 bp), then extended binding peaks are more likely to contain the CREs of different cooperative TFs (TFs that co-act in a CRM) in addition to the CREs of the ChIP-ed TF as illustrated in Figure 1. In other words, if two different TFs (e.g. the red circle and black circle TFs in Figure 1) cooperatively regulate the same regulons in certain cell types by binding to their respective CREs in CRMs, then their extended ChIP binding peaks from these cell types should overlap with one another to some extent. Hence, if we have enough number of ChIP datasets for different TFs from the same and/or different cell types, then the datasets are likely to include overlapping binding peaks for cooperative TFs. Accordingly, our algorithm predicts CRMs through identifying overrepresented co-occurring putative motif patterns in a large number of ChIP datasets, ideally for different TFs in different cell types and developmental stages.Figure 1

Bottom Line: DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way.We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics and Genomics, College of Computing and Informatics, The University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USA. zcsu@uncc.edu.

ABSTRACT

Background: In eukaryotes, transcriptional regulation is usually mediated by interactions of multiple transcription factors (TFs) with their respective specific cis-regulatory elements (CREs) in the so-called cis-regulatory modules (CRMs) in DNA. Although the knowledge of CREs and CRMs in a genome is crucial to elucidate gene regulatory networks and understand many important biological phenomena, little is known about the CREs and CRMs in most eukaryotic genomes due to the difficulty to characterize them by either computational or traditional experimental methods. However, the exponentially increasing number of TF binding location data produced by the recent wide adaptation of chromatin immunoprecipitation coupled with microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) technologies has provided an unprecedented opportunity to identify CRMs and CREs in genomes. Nonetheless, how to effectively mine these large volumes of ChIP data to identify CREs and CRMs at nucleotide resolution is a highly challenging task.

Results: We have developed a novel graph-theoretic based algorithm DePCRM for genome-wide de novo predictions of CREs and CRMs using a large number of ChIP datasets. DePCRM predicts CREs and CRMs by identifying overrepresented combinatorial CRE motif patterns in multiple ChIP datasets in an effective way. When applied to 168 ChIP datasets of 56 TFs from D. melanogaster, DePCRM identified 184 and 746 overrepresented CRE motifs and their combinatorial patterns, respectively, and predicted a total of 115,932 CRMs in the genome. The predictions recover 77.9% of known CRMs in the datasets and 89.3% of known CRMs containing at least one predicted CRE. We found that the putative CRMs as well as CREs as a whole in a CRM are more conserved than randomly selected sequences.

Conclusion: Our results suggest that the CRMs predicted by DePCRM are highly likely to be functional. Our algorithm is the first of its kind for de novo genome-wide prediction of CREs and CRMs using larger number of transcription factor ChIP datasets. The algorithm and predictions will hopefully facilitate the elucidation of gene regulatory networks in eukaryotes. All the predicted CREs, CRMs, and their target genes are available at http://bioinfo.uncc.edu/mniu/pcrms/www/.

Show MeSH