Limits...
Identification of active transcriptional regulatory elements from GRO-seq data.

Danko CG, Hyland SL, Core LJ, Martins AL, Waters CT, Lee HW, Cheung VG, Kraus WL, Lis JT, Siepel A - Nat. Methods (2015)

Bottom Line: Modifications to the global run-on and sequencing (GRO-seq) protocol that enrich for 5'-capped RNAs can be used to reveal active transcriptional regulatory elements (TREs) with high accuracy.Predicted TREs are more enriched for several marks of transcriptional activation—including expression quantitative trait loci, disease-associated polymorphisms, acetylated histone 3 lysine 27 (H3K27ac) and transcription factor binding—than those identified by alternative functional assays.Using dREG, we surveyed TREs in eight human cell types and provide new insights into global patterns of TRE function.

View Article: PubMed Central - PubMed

Affiliation: 1] Baker Institute for Animal Health, Cornell University, Ithaca, New York, USA. [2] Department of Biomedical Sciences, Cornell University, Ithaca, New York, USA. [3] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.

ABSTRACT
Modifications to the global run-on and sequencing (GRO-seq) protocol that enrich for 5'-capped RNAs can be used to reveal active transcriptional regulatory elements (TREs) with high accuracy. Here, we introduce discriminative regulatory-element detection from GRO-seq (dREG), a sensitive machine learning method that uses support vector regression to identify active TREs from GRO-seq data without requiring cap-based enrichment (https://github.com/Danko-Lab/dREG/). This approach allows TREs to be assayed together with gene expression levels and other transcriptional features in a single experiment. Predicted TREs are more enriched for several marks of transcriptional activation—including expression quantitative trait loci, disease-associated polymorphisms, acetylated histone 3 lysine 27 (H3K27ac) and transcription factor binding—than those identified by alternative functional assays. Using dREG, we surveyed TREs in eight human cell types and provide new insights into global patterns of TRE function.

Show MeSH

Related in: MedlinePlus

dREG schematic and validation. (a) High PRO-seq signal intensity marks TREs (highlighted with pink background) and gene bodies (yellow background). dREG is a shape detector trained to recognize the characteristic pattern of TREs in PRO-seq data (#1). After training, dREG can be used to identify TREs using a new PRO-seq data set (red peaks) (#2). Browser shot compares dREG-predicted TREs to ChromHMM-predicted promoters (red), enhancers (yellow), and insulators (green) (#3). (b) Bar charts (left) represent the genome-wide sensitivity of dREG for various classes of TRE at a 5% (line) or 10% (bar) false discovery rate in K562 (pink) and GM12878 (blue) cells. Classes of regulatory elements represent GRO-cap transcribed DHS (Transcribed DHS), transcription start sites identified by CAGE (CAGE TSS), histone acetylation associated with DHS (Acetyl DHS), GRO-cap transcribed ChromHMM promoters (Promoters), GRO-cap transcribed chromHMM enhancers (Enhancers), GRO-cap TSS inside annotated Gene Bodies (Gene Body), and GRO-cap pairs (GRO-cap Pairs). Pie charts (right) represent the fraction of sites aligning within RefSeq transcription start sites (TSS), introns, or intergenic regions in each validation set.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4507281&req=5

Figure 1: dREG schematic and validation. (a) High PRO-seq signal intensity marks TREs (highlighted with pink background) and gene bodies (yellow background). dREG is a shape detector trained to recognize the characteristic pattern of TREs in PRO-seq data (#1). After training, dREG can be used to identify TREs using a new PRO-seq data set (red peaks) (#2). Browser shot compares dREG-predicted TREs to ChromHMM-predicted promoters (red), enhancers (yellow), and insulators (green) (#3). (b) Bar charts (left) represent the genome-wide sensitivity of dREG for various classes of TRE at a 5% (line) or 10% (bar) false discovery rate in K562 (pink) and GM12878 (blue) cells. Classes of regulatory elements represent GRO-cap transcribed DHS (Transcribed DHS), transcription start sites identified by CAGE (CAGE TSS), histone acetylation associated with DHS (Acetyl DHS), GRO-cap transcribed ChromHMM promoters (Promoters), GRO-cap transcribed chromHMM enhancers (Enhancers), GRO-cap TSS inside annotated Gene Bodies (Gene Body), and GRO-cap pairs (GRO-cap Pairs). Pie charts (right) represent the fraction of sites aligning within RefSeq transcription start sites (TSS), introns, or intergenic regions in each validation set.

Mentions: We devised a machine-learning approach, called dREG, to identify TREs, including both promoters and enhancers, from standard GRO-seq or PRO-seq data (Fig. 1a and Supplementary Fig. 1). The key to our method is a feature vector that summarizes the patterns of aligned GRO-seq reads near each candidate element at multiple scales. This feature vector consists of read counts for windows ranging in size from 10 bp to 5 kbp, standardized using the logistic function (Supplementary Fig. 2a). The feature vector is passed to a SVR, which scores sites with high PRO-seq signal for similarity to a training set of TREs. To train our classifier, we used TREs identified from GRO-cap data19 as positive examples and regions of matched PRO-seq signal intensity lacking additional marks associated with TREs as negative examples. After training and optimization of several tuning parameters (Supplementary Tables 1 and 2), the program displayed excellent performance when applied to PRO-seq data for K562 cells (AUC= 0.99; Supplementary Fig. 2b).


Identification of active transcriptional regulatory elements from GRO-seq data.

Danko CG, Hyland SL, Core LJ, Martins AL, Waters CT, Lee HW, Cheung VG, Kraus WL, Lis JT, Siepel A - Nat. Methods (2015)

dREG schematic and validation. (a) High PRO-seq signal intensity marks TREs (highlighted with pink background) and gene bodies (yellow background). dREG is a shape detector trained to recognize the characteristic pattern of TREs in PRO-seq data (#1). After training, dREG can be used to identify TREs using a new PRO-seq data set (red peaks) (#2). Browser shot compares dREG-predicted TREs to ChromHMM-predicted promoters (red), enhancers (yellow), and insulators (green) (#3). (b) Bar charts (left) represent the genome-wide sensitivity of dREG for various classes of TRE at a 5% (line) or 10% (bar) false discovery rate in K562 (pink) and GM12878 (blue) cells. Classes of regulatory elements represent GRO-cap transcribed DHS (Transcribed DHS), transcription start sites identified by CAGE (CAGE TSS), histone acetylation associated with DHS (Acetyl DHS), GRO-cap transcribed ChromHMM promoters (Promoters), GRO-cap transcribed chromHMM enhancers (Enhancers), GRO-cap TSS inside annotated Gene Bodies (Gene Body), and GRO-cap pairs (GRO-cap Pairs). Pie charts (right) represent the fraction of sites aligning within RefSeq transcription start sites (TSS), introns, or intergenic regions in each validation set.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4507281&req=5

Figure 1: dREG schematic and validation. (a) High PRO-seq signal intensity marks TREs (highlighted with pink background) and gene bodies (yellow background). dREG is a shape detector trained to recognize the characteristic pattern of TREs in PRO-seq data (#1). After training, dREG can be used to identify TREs using a new PRO-seq data set (red peaks) (#2). Browser shot compares dREG-predicted TREs to ChromHMM-predicted promoters (red), enhancers (yellow), and insulators (green) (#3). (b) Bar charts (left) represent the genome-wide sensitivity of dREG for various classes of TRE at a 5% (line) or 10% (bar) false discovery rate in K562 (pink) and GM12878 (blue) cells. Classes of regulatory elements represent GRO-cap transcribed DHS (Transcribed DHS), transcription start sites identified by CAGE (CAGE TSS), histone acetylation associated with DHS (Acetyl DHS), GRO-cap transcribed ChromHMM promoters (Promoters), GRO-cap transcribed chromHMM enhancers (Enhancers), GRO-cap TSS inside annotated Gene Bodies (Gene Body), and GRO-cap pairs (GRO-cap Pairs). Pie charts (right) represent the fraction of sites aligning within RefSeq transcription start sites (TSS), introns, or intergenic regions in each validation set.
Mentions: We devised a machine-learning approach, called dREG, to identify TREs, including both promoters and enhancers, from standard GRO-seq or PRO-seq data (Fig. 1a and Supplementary Fig. 1). The key to our method is a feature vector that summarizes the patterns of aligned GRO-seq reads near each candidate element at multiple scales. This feature vector consists of read counts for windows ranging in size from 10 bp to 5 kbp, standardized using the logistic function (Supplementary Fig. 2a). The feature vector is passed to a SVR, which scores sites with high PRO-seq signal for similarity to a training set of TREs. To train our classifier, we used TREs identified from GRO-cap data19 as positive examples and regions of matched PRO-seq signal intensity lacking additional marks associated with TREs as negative examples. After training and optimization of several tuning parameters (Supplementary Tables 1 and 2), the program displayed excellent performance when applied to PRO-seq data for K562 cells (AUC= 0.99; Supplementary Fig. 2b).

Bottom Line: Modifications to the global run-on and sequencing (GRO-seq) protocol that enrich for 5'-capped RNAs can be used to reveal active transcriptional regulatory elements (TREs) with high accuracy.Predicted TREs are more enriched for several marks of transcriptional activation—including expression quantitative trait loci, disease-associated polymorphisms, acetylated histone 3 lysine 27 (H3K27ac) and transcription factor binding—than those identified by alternative functional assays.Using dREG, we surveyed TREs in eight human cell types and provide new insights into global patterns of TRE function.

View Article: PubMed Central - PubMed

Affiliation: 1] Baker Institute for Animal Health, Cornell University, Ithaca, New York, USA. [2] Department of Biomedical Sciences, Cornell University, Ithaca, New York, USA. [3] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA.

ABSTRACT
Modifications to the global run-on and sequencing (GRO-seq) protocol that enrich for 5'-capped RNAs can be used to reveal active transcriptional regulatory elements (TREs) with high accuracy. Here, we introduce discriminative regulatory-element detection from GRO-seq (dREG), a sensitive machine learning method that uses support vector regression to identify active TREs from GRO-seq data without requiring cap-based enrichment (https://github.com/Danko-Lab/dREG/). This approach allows TREs to be assayed together with gene expression levels and other transcriptional features in a single experiment. Predicted TREs are more enriched for several marks of transcriptional activation—including expression quantitative trait loci, disease-associated polymorphisms, acetylated histone 3 lysine 27 (H3K27ac) and transcription factor binding—than those identified by alternative functional assays. Using dREG, we surveyed TREs in eight human cell types and provide new insights into global patterns of TRE function.

Show MeSH
Related in: MedlinePlus