Limits...
iRegulon: from a gene list to a gene regulatory network using large motif and track collections.

Janky R, Verfaillie A, Imrichová H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G, Herten K, Naval Sanchez M, Potier D, Svetlichnyy D, Kalender Atak Z, Fiers M, Marine JC, Aerts S - PLoS Comput. Biol. (2014)

Bottom Line: Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology.Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data.Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, KU Leuven Center for Human Genetics, Leuven, Belgium.

ABSTRACT
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Show MeSH

Related in: MedlinePlus

Validation of the p53 regulon by ChIP-Seq.A. Integrative Genomic Viewer (IGV) [131] screenshot for CDKN1A, a known p53 target gene, showing up-regulation by RNA-seq (red arrowhead) and ChIP peaks in the upstream region (green and blue arrowhead). IGV is free software under GNU Lesser General Public License, version 2.1 (LGPL-2.1). B. Gene Set Enrichment analysis, with on the x-axis all genes in the genome ranked according to their maximum ChIP-Seq peak (20 kb around TSS). The p53 targets (green curve) show higher enrichment than the total set of up-regulated genes (blue curve), approaching the previously known curated targets (red curve), while the non-predicted p53 targets (magenta curve) and the set of down-regulated genes (cyan curve) show no enrichment. The initial two steps in the magenta curve represent two false negative predictions of iRegulon (they fall just below the optimal cutoff), namely PLK3 and DDB2, which are up-regulated and have a ChIP peak. P-values in the legend are calculated by the hypergeometric formula of the leading edge determined by GSEA. C. Comparison between annotated up-regulated p53 targets and predicted p53 targets by iRegulon and ChIP-Seq, indicating the number of previously known p53 targets. See also Figure S6.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4109854&req=5

pcbi-1003731-g004: Validation of the p53 regulon by ChIP-Seq.A. Integrative Genomic Viewer (IGV) [131] screenshot for CDKN1A, a known p53 target gene, showing up-regulation by RNA-seq (red arrowhead) and ChIP peaks in the upstream region (green and blue arrowhead). IGV is free software under GNU Lesser General Public License, version 2.1 (LGPL-2.1). B. Gene Set Enrichment analysis, with on the x-axis all genes in the genome ranked according to their maximum ChIP-Seq peak (20 kb around TSS). The p53 targets (green curve) show higher enrichment than the total set of up-regulated genes (blue curve), approaching the previously known curated targets (red curve), while the non-predicted p53 targets (magenta curve) and the set of down-regulated genes (cyan curve) show no enrichment. The initial two steps in the magenta curve represent two false negative predictions of iRegulon (they fall just below the optimal cutoff), namely PLK3 and DDB2, which are up-regulated and have a ChIP peak. P-values in the legend are calculated by the hypergeometric formula of the leading edge determined by GSEA. C. Comparison between annotated up-regulated p53 targets and predicted p53 targets by iRegulon and ChIP-Seq, indicating the number of previously known p53 targets. See also Figure S6.

Mentions: To test the predicted p53 regulon we determined the genome-wide chromatin occupancy by p53 in Nutlin-3a stimulated MCF-7 cells using high-coverage ChIP-Seq (∼30 Million uniquely mapped reads). Fig. 4A shows the raw ChIP-Seq data for the known p53 target CDKN1A, with a very strong peak overlapping the known p53 binding site in the promoter of CDKN1A[69]. To avoid arbitrary thresholds on peak calling we used lenient peak calling settings to rank all genes in the genome according to their likelihood of being a p53 target based on ChIP peaks only (see Materials and Methods). To assess whether this ranking yields true p53 targets on top, we curated 223 bona fide p53 targets from the literature and public databases (Table S5), and indeed found these targets to be significantly enriched in the top of this ranking (Fig. 4B, p-value = 1.40E-24). Within the same ranking, the 307 predicted p53 targets by iRegulon are nearly as significantly enriched in the top as the curated targets (p-value = 2.60E-24), while the 494 remaining up-regulated genes are not significantly correlated with the ChIP peak data (p-value = 0.096). Importantly, this result shows that iRegulon is not only able to identify the master regulator, but is also able to correctly distinguish between direct and indirect targets from a set of co-expressed genes. Only two up-regulated genes with a high ChIP peak, namely PLK3 and DDB2, were missed by iRegulon. About 100 up-regulated genes have a small ChIP peak but have not been predicted by iRegulon as target genes. These peaks are likely false positive ChIP peaks because they do not show p53 motif enrichment when analyzed separately (Fig. S6A–C). Finally, to compare how many targets are missed by iRegulon, and how many by ChIP-Seq, we again used the set of curated targets, and found comparable numbers of false negatives, namely six for iRegulon and five for ChIP-Seq (Fig. 4C). In the previous section we had also found that gene repression downstream of p53 is indirect through E2F, which has been shown recently to be mediated by p21 and RB [63], [68]. If this is true, then the down-regulated genes should not contain p53 ChIP peaks. To test this, we plotted the recovery of the 790 down-regulated genes along the p53 ChIP-peak-based gene ranking generated above (Fig. 4B). Similar to the indirect up-regulated genes, the down-regulated genes are completely depleted of p53 ChIP peaks (p-value = 1.0). On the other hand, the down-regulated genes are positively correlated with E2F1 ChIP-Seq data in MCF-7 from ENCODE (Fig. S6D). When combining all the small p53 ChIP-Seq peaks that are detected amongst the down-regulated genes, the p53 motif is not found by de novo motif discovery, while the ChIP peaks of direct up-regulated targets are strongly enriched for de novo p53 motifs (Fig. S6A–C). From the ChIP-Seq validation data, we conclude that iRegulon predicts the correct master regulators (p53 and E2F) and that predicted target genes of these TFs significantly overlap with ChIP-Seq derived targets. By combining iRegulon and ChIP-Seq data, we propose a set of 110 “top targets” of p53 in MCF-7 that are directly and positively regulated. When further comparing these predicted targets to recent reports of several p53 targetomes based on combining gene expression profiles with p53 ChIP-Seq data under different experimental conditions [58], [59], [68], we could confirm many common targets, but also uncovered 56 new direct p53 target genes with our analysis (Table S6).


iRegulon: from a gene list to a gene regulatory network using large motif and track collections.

Janky R, Verfaillie A, Imrichová H, Van de Sande B, Standaert L, Christiaens V, Hulselmans G, Herten K, Naval Sanchez M, Potier D, Svetlichnyy D, Kalender Atak Z, Fiers M, Marine JC, Aerts S - PLoS Comput. Biol. (2014)

Validation of the p53 regulon by ChIP-Seq.A. Integrative Genomic Viewer (IGV) [131] screenshot for CDKN1A, a known p53 target gene, showing up-regulation by RNA-seq (red arrowhead) and ChIP peaks in the upstream region (green and blue arrowhead). IGV is free software under GNU Lesser General Public License, version 2.1 (LGPL-2.1). B. Gene Set Enrichment analysis, with on the x-axis all genes in the genome ranked according to their maximum ChIP-Seq peak (20 kb around TSS). The p53 targets (green curve) show higher enrichment than the total set of up-regulated genes (blue curve), approaching the previously known curated targets (red curve), while the non-predicted p53 targets (magenta curve) and the set of down-regulated genes (cyan curve) show no enrichment. The initial two steps in the magenta curve represent two false negative predictions of iRegulon (they fall just below the optimal cutoff), namely PLK3 and DDB2, which are up-regulated and have a ChIP peak. P-values in the legend are calculated by the hypergeometric formula of the leading edge determined by GSEA. C. Comparison between annotated up-regulated p53 targets and predicted p53 targets by iRegulon and ChIP-Seq, indicating the number of previously known p53 targets. See also Figure S6.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4109854&req=5

pcbi-1003731-g004: Validation of the p53 regulon by ChIP-Seq.A. Integrative Genomic Viewer (IGV) [131] screenshot for CDKN1A, a known p53 target gene, showing up-regulation by RNA-seq (red arrowhead) and ChIP peaks in the upstream region (green and blue arrowhead). IGV is free software under GNU Lesser General Public License, version 2.1 (LGPL-2.1). B. Gene Set Enrichment analysis, with on the x-axis all genes in the genome ranked according to their maximum ChIP-Seq peak (20 kb around TSS). The p53 targets (green curve) show higher enrichment than the total set of up-regulated genes (blue curve), approaching the previously known curated targets (red curve), while the non-predicted p53 targets (magenta curve) and the set of down-regulated genes (cyan curve) show no enrichment. The initial two steps in the magenta curve represent two false negative predictions of iRegulon (they fall just below the optimal cutoff), namely PLK3 and DDB2, which are up-regulated and have a ChIP peak. P-values in the legend are calculated by the hypergeometric formula of the leading edge determined by GSEA. C. Comparison between annotated up-regulated p53 targets and predicted p53 targets by iRegulon and ChIP-Seq, indicating the number of previously known p53 targets. See also Figure S6.
Mentions: To test the predicted p53 regulon we determined the genome-wide chromatin occupancy by p53 in Nutlin-3a stimulated MCF-7 cells using high-coverage ChIP-Seq (∼30 Million uniquely mapped reads). Fig. 4A shows the raw ChIP-Seq data for the known p53 target CDKN1A, with a very strong peak overlapping the known p53 binding site in the promoter of CDKN1A[69]. To avoid arbitrary thresholds on peak calling we used lenient peak calling settings to rank all genes in the genome according to their likelihood of being a p53 target based on ChIP peaks only (see Materials and Methods). To assess whether this ranking yields true p53 targets on top, we curated 223 bona fide p53 targets from the literature and public databases (Table S5), and indeed found these targets to be significantly enriched in the top of this ranking (Fig. 4B, p-value = 1.40E-24). Within the same ranking, the 307 predicted p53 targets by iRegulon are nearly as significantly enriched in the top as the curated targets (p-value = 2.60E-24), while the 494 remaining up-regulated genes are not significantly correlated with the ChIP peak data (p-value = 0.096). Importantly, this result shows that iRegulon is not only able to identify the master regulator, but is also able to correctly distinguish between direct and indirect targets from a set of co-expressed genes. Only two up-regulated genes with a high ChIP peak, namely PLK3 and DDB2, were missed by iRegulon. About 100 up-regulated genes have a small ChIP peak but have not been predicted by iRegulon as target genes. These peaks are likely false positive ChIP peaks because they do not show p53 motif enrichment when analyzed separately (Fig. S6A–C). Finally, to compare how many targets are missed by iRegulon, and how many by ChIP-Seq, we again used the set of curated targets, and found comparable numbers of false negatives, namely six for iRegulon and five for ChIP-Seq (Fig. 4C). In the previous section we had also found that gene repression downstream of p53 is indirect through E2F, which has been shown recently to be mediated by p21 and RB [63], [68]. If this is true, then the down-regulated genes should not contain p53 ChIP peaks. To test this, we plotted the recovery of the 790 down-regulated genes along the p53 ChIP-peak-based gene ranking generated above (Fig. 4B). Similar to the indirect up-regulated genes, the down-regulated genes are completely depleted of p53 ChIP peaks (p-value = 1.0). On the other hand, the down-regulated genes are positively correlated with E2F1 ChIP-Seq data in MCF-7 from ENCODE (Fig. S6D). When combining all the small p53 ChIP-Seq peaks that are detected amongst the down-regulated genes, the p53 motif is not found by de novo motif discovery, while the ChIP peaks of direct up-regulated targets are strongly enriched for de novo p53 motifs (Fig. S6A–C). From the ChIP-Seq validation data, we conclude that iRegulon predicts the correct master regulators (p53 and E2F) and that predicted target genes of these TFs significantly overlap with ChIP-Seq derived targets. By combining iRegulon and ChIP-Seq data, we propose a set of 110 “top targets” of p53 in MCF-7 that are directly and positively regulated. When further comparing these predicted targets to recent reports of several p53 targetomes based on combining gene expression profiles with p53 ChIP-Seq data under different experimental conditions [58], [59], [68], we could confirm many common targets, but also uncovered 56 new direct p53 target genes with our analysis (Table S6).

Bottom Line: Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology.Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data.Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY.

View Article: PubMed Central - PubMed

Affiliation: Laboratory of Computational Biology, KU Leuven Center for Human Genetics, Leuven, Belgium.

ABSTRACT
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Show MeSH
Related in: MedlinePlus