Limits...
Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors.

Yu X, Lin J, Zack DJ, Qian J - BMC Bioinformatics (2007)

Bottom Line: Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions.These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA. xyu15@jhmi.edu

ABSTRACT

Background: Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.

Results: The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity.

Conclusion: These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.

Show MeSH
Schematic of module detection method based on TF interactions. Based on gene expression profiles across different tissues, we identified groups of genes that are preferentially expressed in tissues (e.g. gene C and D in the schematic). For each group of genes, we searched the binding sites of known TFs in promoter regions and determined the TF pairs whose binding sites tend to co-occur in close proximity. A tissue-specific TF interaction network was obtained from the analysis. We then scanned the genomic regions and identified cis-regulatory regions (CRMs). The CRMs are defined as regions enriched with TF interactions. Note the first steps were implemented in our previous work [22] while this paper focuses on the last step.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2194798&req=5

Figure 1: Schematic of module detection method based on TF interactions. Based on gene expression profiles across different tissues, we identified groups of genes that are preferentially expressed in tissues (e.g. gene C and D in the schematic). For each group of genes, we searched the binding sites of known TFs in promoter regions and determined the TF pairs whose binding sites tend to co-occur in close proximity. A tissue-specific TF interaction network was obtained from the analysis. We then scanned the genomic regions and identified cis-regulatory regions (CRMs). The CRMs are defined as regions enriched with TF interactions. Note the first steps were implemented in our previous work [22] while this paper focuses on the last step.

Mentions: We utilized computationally predicted tissue-specific TF interactions to identify CRMs. In our previous work we identified 9060 putative tissue-specific TF interactions [22]. Two TFs were predicted as interacting if the relative positions and co-occurrence of their binding sites in promoters differed significantly from random expectation (Figure 1; Additional files 1 and 2). Since identifying the CRMs harboring these interactions in each individual promoter is not trivial, we developed an algorithm, CRM-PI, to detect CRMs by calculating an empirical "potential energy" between interacting TFBSs along the genomic sequence. A promoter region containing many interacting TFBSs will have low "potential energy" (see Methods). CRM-PI obtains an energy landscape along the regulatory regions and searches for regions with low "potential energy". For those locations at which the energy is below a given threshold, the region around the minimum is defined as a CRM.


Identification of tissue-specific cis-regulatory modules based on interactions between transcription factors.

Yu X, Lin J, Zack DJ, Qian J - BMC Bioinformatics (2007)

Schematic of module detection method based on TF interactions. Based on gene expression profiles across different tissues, we identified groups of genes that are preferentially expressed in tissues (e.g. gene C and D in the schematic). For each group of genes, we searched the binding sites of known TFs in promoter regions and determined the TF pairs whose binding sites tend to co-occur in close proximity. A tissue-specific TF interaction network was obtained from the analysis. We then scanned the genomic regions and identified cis-regulatory regions (CRMs). The CRMs are defined as regions enriched with TF interactions. Note the first steps were implemented in our previous work [22] while this paper focuses on the last step.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2194798&req=5

Figure 1: Schematic of module detection method based on TF interactions. Based on gene expression profiles across different tissues, we identified groups of genes that are preferentially expressed in tissues (e.g. gene C and D in the schematic). For each group of genes, we searched the binding sites of known TFs in promoter regions and determined the TF pairs whose binding sites tend to co-occur in close proximity. A tissue-specific TF interaction network was obtained from the analysis. We then scanned the genomic regions and identified cis-regulatory regions (CRMs). The CRMs are defined as regions enriched with TF interactions. Note the first steps were implemented in our previous work [22] while this paper focuses on the last step.
Mentions: We utilized computationally predicted tissue-specific TF interactions to identify CRMs. In our previous work we identified 9060 putative tissue-specific TF interactions [22]. Two TFs were predicted as interacting if the relative positions and co-occurrence of their binding sites in promoters differed significantly from random expectation (Figure 1; Additional files 1 and 2). Since identifying the CRMs harboring these interactions in each individual promoter is not trivial, we developed an algorithm, CRM-PI, to detect CRMs by calculating an empirical "potential energy" between interacting TFBSs along the genomic sequence. A promoter region containing many interacting TFBSs will have low "potential energy" (see Methods). CRM-PI obtains an energy landscape along the regulatory regions and searches for regions with low "potential energy". For those locations at which the energy is below a given threshold, the region around the minimum is defined as a CRM.

Bottom Line: Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions.These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA. xyu15@jhmi.edu

ABSTRACT

Background: Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.

Results: The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity.

Conclusion: These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.

Show MeSH