Limits...
iRegulon: from a gene list to a gene regulatory network using large motif and track collections.

Janky R, Verfaillie A, Imrichová H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G, Herten K, Naval Sanchez M, Potier D, Svetlichnyy D, Kalender Atak Z, Fiers M, Marine JC, Aerts S - PLoS Comput. Biol. (2014)

Bottom Line: Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology.Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data.Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, KU Leuven Center for Human Genetics, Leuven, Belgium.

ABSTRACT
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Show MeSH

Related in: MedlinePlus

Combined analysis using 10K motifs and 1K ChIP-Seq tracks.A. Two ranking databases were made using 9713 motifs and 1118 ChIP-Seq tracks. The ChIP-Seq tracks consisted of all ENCODE and Taipale ChIP-Seq data against TFs, and the p53 ChIP-Seq track generated in this study. B. AUC distributions for ChIP-Seq and motif rankings, using the p53 signature as input. C. The actual recovery curve for the p53 motif and track. Shaded area indicates the AUC. D. Top enriched ChIP tracks and motifs on the up- and down-regulated gene sets (NES>3, except for RFX5 motif that was detected with NES = 2.82 (b). (a) Predicted targets are shown for both enriched tracks and motifs respectively. E. Functional categories found enriched for predicted co-factors of p53. The annotation of p53-shared targets is shown in the inner circle, while the annotation of non-shared targets (for example, AP-1 targets but not p53) is shown on the outer circle. The co-factors shown here are those found by both motif and track enrichment (see also Table S7).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109854&req=5

pcbi-1003731-g006: Combined analysis using 10K motifs and 1K ChIP-Seq tracks.A. Two ranking databases were made using 9713 motifs and 1118 ChIP-Seq tracks. The ChIP-Seq tracks consisted of all ENCODE and Taipale ChIP-Seq data against TFs, and the p53 ChIP-Seq track generated in this study. B. AUC distributions for ChIP-Seq and motif rankings, using the p53 signature as input. C. The actual recovery curve for the p53 motif and track. Shaded area indicates the AUC. D. Top enriched ChIP tracks and motifs on the up- and down-regulated gene sets (NES>3, except for RFX5 motif that was detected with NES = 2.82 (b). (a) Predicted targets are shown for both enriched tracks and motifs respectively. E. Functional categories found enriched for predicted co-factors of p53. The annotation of p53-shared targets is shown in the inner circle, while the annotation of non-shared targets (for example, AP-1 targets but not p53) is shown on the outer circle. The co-factors shown here are those found by both motif and track enrichment (see also Table S7).

Mentions: We extended our motif discovery approach to allow the discovery of significantly enriched ChIP-Seq tracks in a set of co-expressed genes. We created a database with track-based gene rankings from a collection of 1118 ChIP-Seq experiments against 246 human sequence-specific TFs across 40 cell types and apply the same “ranking-and-recovery” enrichment calculation as employed earlier (see Materials and Methods). These and other recent resources further enlarged our motif collection to 9713 distinct PWMs (“10K collection”) (Table 1). To test whether motif and track discovery can be performed simultaneously, we combined the motif-based rankings and the track-based rankings into one enrichment analysis, although each AUC score distribution is kept separate for normalization (Fig. 6A–B). Applied to the 801 p53-dependent up-regulated gene set, the combined approach still detects p53, AP-1, NFY, and FOX in the top motifs. Both for p53 and AP-1, enriched ChIP-Seq tracks are found by the track discovery, being our in-house performed p53 ChIP-Seq in MCF-7 after Nutlin-3a (ranked first of all tracks, NES = 5.18) and the FOSL2 ChIP-Seq tracks in MCF-7 from ENCODE (NES = 3.30) (Fig. 6C–D, Table S7). In addition, we found five more candidate TFs with a putative role in the network downstream of p53 that were not detectable using the 6K motif collection only (Fig. 3). Three of these additional candidates, namely RFX5, NR2F2, and NFI have both their ChIP-seq track and motif enriched while two more candidates, namely p300 and TCF12 only show track enrichment (Fig. 6D). To our knowledge, no interaction of these TFs with p53 has been reported in the literature. Although the targetomes of the co-factors overlap to some extent (20–42%) with p53 targets, they have a considerably large set of target genes independent of p53. Hence, with these additional TFs added downstream of p53, we can once more explain an additional fraction of the up-regulated gene set, with all the ChIP-Seq track-derived interactions together regulating 542 of the 801 genes. RFX5 is of particular interest since the gene itself is strongly up-regulated by p53 and is in fact among the core set of 801 up-regulated genes (log2FC = 1.9 and adjusted p-value = 1.05E-15). RFX5 is mainly known as a regulator of MHC-II genes, and indeed, among the top predicted RFX5 target genes downstream of p53 we find HLA-F, MR1, and other genes involved in antigen and interferon-related processes. Interestingly, RFX5 has recently also been shown to act as a DNA mismatch repair stimulatory factor [72], and several p53-shared RFX5 targets, such as DDB2 and BBC3, are in fact related to DNA damage response (adjusted p-value = 6.99E-5, Wikipathway ID:WP707) (Fig. 6E). Hence, RFX5 can be considered as a new candidate co-factor to modulate certain aspects of the p53-regulated response, and may explain why MHC-II genes are up-regulated in a p53-dependent manner. This proof-of-principle of combined motif and track enrichment paves the way towards further integration of regulatory track data and enhancer prediction data to map gene regulatory networks.


iRegulon: from a gene list to a gene regulatory network using large motif and track collections.

Janky R, Verfaillie A, Imrichová H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G, Herten K, Naval Sanchez M, Potier D, Svetlichnyy D, Kalender Atak Z, Fiers M, Marine JC, Aerts S - PLoS Comput. Biol. (2014)

Combined analysis using 10K motifs and 1K ChIP-Seq tracks.A. Two ranking databases were made using 9713 motifs and 1118 ChIP-Seq tracks. The ChIP-Seq tracks consisted of all ENCODE and Taipale ChIP-Seq data against TFs, and the p53 ChIP-Seq track generated in this study. B. AUC distributions for ChIP-Seq and motif rankings, using the p53 signature as input. C. The actual recovery curve for the p53 motif and track. Shaded area indicates the AUC. D. Top enriched ChIP tracks and motifs on the up- and down-regulated gene sets (NES>3, except for RFX5 motif that was detected with NES = 2.82 (b). (a) Predicted targets are shown for both enriched tracks and motifs respectively. E. Functional categories found enriched for predicted co-factors of p53. The annotation of p53-shared targets is shown in the inner circle, while the annotation of non-shared targets (for example, AP-1 targets but not p53) is shown on the outer circle. The co-factors shown here are those found by both motif and track enrichment (see also Table S7).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109854&req=5

pcbi-1003731-g006: Combined analysis using 10K motifs and 1K ChIP-Seq tracks.A. Two ranking databases were made using 9713 motifs and 1118 ChIP-Seq tracks. The ChIP-Seq tracks consisted of all ENCODE and Taipale ChIP-Seq data against TFs, and the p53 ChIP-Seq track generated in this study. B. AUC distributions for ChIP-Seq and motif rankings, using the p53 signature as input. C. The actual recovery curve for the p53 motif and track. Shaded area indicates the AUC. D. Top enriched ChIP tracks and motifs on the up- and down-regulated gene sets (NES>3, except for RFX5 motif that was detected with NES = 2.82 (b). (a) Predicted targets are shown for both enriched tracks and motifs respectively. E. Functional categories found enriched for predicted co-factors of p53. The annotation of p53-shared targets is shown in the inner circle, while the annotation of non-shared targets (for example, AP-1 targets but not p53) is shown on the outer circle. The co-factors shown here are those found by both motif and track enrichment (see also Table S7).
Mentions: We extended our motif discovery approach to allow the discovery of significantly enriched ChIP-Seq tracks in a set of co-expressed genes. We created a database with track-based gene rankings from a collection of 1118 ChIP-Seq experiments against 246 human sequence-specific TFs across 40 cell types and apply the same “ranking-and-recovery” enrichment calculation as employed earlier (see Materials and Methods). These and other recent resources further enlarged our motif collection to 9713 distinct PWMs (“10K collection”) (Table 1). To test whether motif and track discovery can be performed simultaneously, we combined the motif-based rankings and the track-based rankings into one enrichment analysis, although each AUC score distribution is kept separate for normalization (Fig. 6A–B). Applied to the 801 p53-dependent up-regulated gene set, the combined approach still detects p53, AP-1, NFY, and FOX in the top motifs. Both for p53 and AP-1, enriched ChIP-Seq tracks are found by the track discovery, being our in-house performed p53 ChIP-Seq in MCF-7 after Nutlin-3a (ranked first of all tracks, NES = 5.18) and the FOSL2 ChIP-Seq tracks in MCF-7 from ENCODE (NES = 3.30) (Fig. 6C–D, Table S7). In addition, we found five more candidate TFs with a putative role in the network downstream of p53 that were not detectable using the 6K motif collection only (Fig. 3). Three of these additional candidates, namely RFX5, NR2F2, and NFI have both their ChIP-seq track and motif enriched while two more candidates, namely p300 and TCF12 only show track enrichment (Fig. 6D). To our knowledge, no interaction of these TFs with p53 has been reported in the literature. Although the targetomes of the co-factors overlap to some extent (20–42%) with p53 targets, they have a considerably large set of target genes independent of p53. Hence, with these additional TFs added downstream of p53, we can once more explain an additional fraction of the up-regulated gene set, with all the ChIP-Seq track-derived interactions together regulating 542 of the 801 genes. RFX5 is of particular interest since the gene itself is strongly up-regulated by p53 and is in fact among the core set of 801 up-regulated genes (log2FC = 1.9 and adjusted p-value = 1.05E-15). RFX5 is mainly known as a regulator of MHC-II genes, and indeed, among the top predicted RFX5 target genes downstream of p53 we find HLA-F, MR1, and other genes involved in antigen and interferon-related processes. Interestingly, RFX5 has recently also been shown to act as a DNA mismatch repair stimulatory factor [72], and several p53-shared RFX5 targets, such as DDB2 and BBC3, are in fact related to DNA damage response (adjusted p-value = 6.99E-5, Wikipathway ID:WP707) (Fig. 6E). Hence, RFX5 can be considered as a new candidate co-factor to modulate certain aspects of the p53-regulated response, and may explain why MHC-II genes are up-regulated in a p53-dependent manner. This proof-of-principle of combined motif and track enrichment paves the way towards further integration of regulatory track data and enhancer prediction data to map gene regulatory networks.

Bottom Line: Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology.Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data.Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, KU Leuven Center for Human Genetics, Leuven, Belgium.

ABSTRACT
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Show MeSH
Related in: MedlinePlus