Limits...
Global analysis of CPSF2-mediated alternative splicing: Integration of global iCLIP and transcriptome profiling data.

Misra A, Ou J, Zhu LJ, Green MR - Genom Data (2015)

Bottom Line: Alternative splicing is a key mechanism for generating proteome diversity, however the mechanisms regulating alternative splicing are poorly understood.Using a genome-wide RNA interference screening strategy, we identified cleavage and polyadenylation specificity factor (CPSF) and symplekin (SYMPK) as cofactors of the well-known splicing regulator RBFOX2.Here, we describe the experimental design, and the quality control and data analyses that were performed on the dataset.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, USA ; Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA.

ABSTRACT
Alternative splicing is a key mechanism for generating proteome diversity, however the mechanisms regulating alternative splicing are poorly understood. Using a genome-wide RNA interference screening strategy, we identified cleavage and polyadenylation specificity factor (CPSF) and symplekin (SYMPK) as cofactors of the well-known splicing regulator RBFOX2. To determine the role of CPSF in alternative splicing on a genome-wide level, we performed paired-end RNA sequencing (RNA-seq) to compare splicing events in control cells and RBFOX2 or CPSF2 knockdown cells. We also performed individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) to identify direct binding targets of RBFOX2 and CPSF2. Here, we describe the experimental design, and the quality control and data analyses that were performed on the dataset. The raw sequencing data have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE60392.

No MeSH data available.


Related in: MedlinePlus

Quality assessment of RNA-seq data and alignment quality.A. Pie chart obtained with SAMstat depicting the distribution of sequence alignment quality.B. Scatter plots of RNA-seq data to assess consistency of exon level expression between biological replicates for all three knockdown samples.C. Multi-dimensional scaling (MDS) plot of RNA-seq data for all six samples.D. Boxplot of the exon expression levels (log10 transformed FKPM) for all six samples.E. Dispersion plot generated using DEXseq from randomly subsampled exons.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4664754&req=5

f0010: Quality assessment of RNA-seq data and alignment quality.A. Pie chart obtained with SAMstat depicting the distribution of sequence alignment quality.B. Scatter plots of RNA-seq data to assess consistency of exon level expression between biological replicates for all three knockdown samples.C. Multi-dimensional scaling (MDS) plot of RNA-seq data for all six samples.D. Boxplot of the exon expression levels (log10 transformed FKPM) for all six samples.E. Dispersion plot generated using DEXseq from randomly subsampled exons.

Mentions: RNA-seq samples were prepared using a TruSeq RNA Library Prep Kit v2 (Illumina) according to the manufacturer's instructions. Sequencing was done using an Illumina HiSeq 2000 with a paired-end length of 100 bp for duplicated NS, RBFOX2- knockdown and CPSF2-knockdown samples. The quality of the sequencing reads was assessed using fastQC (v 0.10.1). The fastQC score for all the bases was > 30, indicating that the reads were of high quality. The reads were aligned against human reference genome (GRCh37/hg19, Feb. 2009) using TopHat (v2.0.9, bowtie2/2.1.0) [3] with the following parameter setting: “-G [ucsc_hg19_knownGene] –mate-inner-dist 50 –b2-very-sensitive”. The ucsc_hg19_knownGene annotation file was downloaded from UCSC table browser and quality of alignment was assessed using SAMStat (v 1.09) [5]. A pie chart describing the quality of sequence alignment distribution showed that > 97% of the reads have MAPQ ≥ 30, indicating the high mapping quality of the reads (Fig. 2A). Gene expression level (FPKM) and differential gene expression analysis were performed using Cufflinks (v2.1.1) [6]. Python script, provided by the DEXSeq (v1.10.6) package [7], was used for exon level read count estimation with the following parameter setting: “-p yes -r pos -s no” to count the number of reads. Pearson correlation analysis of gene expression levels was performed to evaluate the reproducibility between biological replicates (NS: r = 0.991, p-value < 2.2e − 16; RBFOX2: r = 0.991, p-value < 2.2e − 16; CPSF2: r = 0.992, p-value < 2.2e − 16) (Fig. S2A in [4]) Pearson correlation analysis of exon level expression also demonstrated high reproducibility between biological replicates (NS: r = 0.997, p-value < 2.2e − 16; RBFOX2: r = 0.999, p-value < 2.2e − 16; CPSF2: r = 0.989, p-value < 2.2e − 16) (Fig. 2B). Multidimensional scaling (MDS) plot was generated to visualize the similarity of gene expression between biological replicates and dissimilarity among NS, RBFOX2 and CPSF2 knockdown samples using cummeRbund package (v 2.8.2) [8]. The results of the MDS plot showed that the biological replicates clustered closely while there was a clear segregation among NS, RBFOX2 and CPSF2 knockdown samples, indicating that biological replicates were similar to each other and different knockdown groups had different expression profiles (Fig. 2C). To visualize the distribution of gene expression level for each sample, a boxplot was generated for each of the samples using log10 transformed FPKM values from Cufflinks (Fig. 2D). The quartiles and overall range were consistent between biological replicates, indicating that the data were reproducible and of high quality.


Global analysis of CPSF2-mediated alternative splicing: Integration of global iCLIP and transcriptome profiling data.

Misra A, Ou J, Zhu LJ, Green MR - Genom Data (2015)

Quality assessment of RNA-seq data and alignment quality.A. Pie chart obtained with SAMstat depicting the distribution of sequence alignment quality.B. Scatter plots of RNA-seq data to assess consistency of exon level expression between biological replicates for all three knockdown samples.C. Multi-dimensional scaling (MDS) plot of RNA-seq data for all six samples.D. Boxplot of the exon expression levels (log10 transformed FKPM) for all six samples.E. Dispersion plot generated using DEXseq from randomly subsampled exons.
© Copyright Policy - CC BY-NC-ND
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4664754&req=5

f0010: Quality assessment of RNA-seq data and alignment quality.A. Pie chart obtained with SAMstat depicting the distribution of sequence alignment quality.B. Scatter plots of RNA-seq data to assess consistency of exon level expression between biological replicates for all three knockdown samples.C. Multi-dimensional scaling (MDS) plot of RNA-seq data for all six samples.D. Boxplot of the exon expression levels (log10 transformed FKPM) for all six samples.E. Dispersion plot generated using DEXseq from randomly subsampled exons.
Mentions: RNA-seq samples were prepared using a TruSeq RNA Library Prep Kit v2 (Illumina) according to the manufacturer's instructions. Sequencing was done using an Illumina HiSeq 2000 with a paired-end length of 100 bp for duplicated NS, RBFOX2- knockdown and CPSF2-knockdown samples. The quality of the sequencing reads was assessed using fastQC (v 0.10.1). The fastQC score for all the bases was > 30, indicating that the reads were of high quality. The reads were aligned against human reference genome (GRCh37/hg19, Feb. 2009) using TopHat (v2.0.9, bowtie2/2.1.0) [3] with the following parameter setting: “-G [ucsc_hg19_knownGene] –mate-inner-dist 50 –b2-very-sensitive”. The ucsc_hg19_knownGene annotation file was downloaded from UCSC table browser and quality of alignment was assessed using SAMStat (v 1.09) [5]. A pie chart describing the quality of sequence alignment distribution showed that > 97% of the reads have MAPQ ≥ 30, indicating the high mapping quality of the reads (Fig. 2A). Gene expression level (FPKM) and differential gene expression analysis were performed using Cufflinks (v2.1.1) [6]. Python script, provided by the DEXSeq (v1.10.6) package [7], was used for exon level read count estimation with the following parameter setting: “-p yes -r pos -s no” to count the number of reads. Pearson correlation analysis of gene expression levels was performed to evaluate the reproducibility between biological replicates (NS: r = 0.991, p-value < 2.2e − 16; RBFOX2: r = 0.991, p-value < 2.2e − 16; CPSF2: r = 0.992, p-value < 2.2e − 16) (Fig. S2A in [4]) Pearson correlation analysis of exon level expression also demonstrated high reproducibility between biological replicates (NS: r = 0.997, p-value < 2.2e − 16; RBFOX2: r = 0.999, p-value < 2.2e − 16; CPSF2: r = 0.989, p-value < 2.2e − 16) (Fig. 2B). Multidimensional scaling (MDS) plot was generated to visualize the similarity of gene expression between biological replicates and dissimilarity among NS, RBFOX2 and CPSF2 knockdown samples using cummeRbund package (v 2.8.2) [8]. The results of the MDS plot showed that the biological replicates clustered closely while there was a clear segregation among NS, RBFOX2 and CPSF2 knockdown samples, indicating that biological replicates were similar to each other and different knockdown groups had different expression profiles (Fig. 2C). To visualize the distribution of gene expression level for each sample, a boxplot was generated for each of the samples using log10 transformed FPKM values from Cufflinks (Fig. 2D). The quartiles and overall range were consistent between biological replicates, indicating that the data were reproducible and of high quality.

Bottom Line: Alternative splicing is a key mechanism for generating proteome diversity, however the mechanisms regulating alternative splicing are poorly understood.Using a genome-wide RNA interference screening strategy, we identified cleavage and polyadenylation specificity factor (CPSF) and symplekin (SYMPK) as cofactors of the well-known splicing regulator RBFOX2.Here, we describe the experimental design, and the quality control and data analyses that were performed on the dataset.

View Article: PubMed Central - PubMed

Affiliation: Howard Hughes Medical Institute, USA ; Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA.

ABSTRACT
Alternative splicing is a key mechanism for generating proteome diversity, however the mechanisms regulating alternative splicing are poorly understood. Using a genome-wide RNA interference screening strategy, we identified cleavage and polyadenylation specificity factor (CPSF) and symplekin (SYMPK) as cofactors of the well-known splicing regulator RBFOX2. To determine the role of CPSF in alternative splicing on a genome-wide level, we performed paired-end RNA sequencing (RNA-seq) to compare splicing events in control cells and RBFOX2 or CPSF2 knockdown cells. We also performed individual-nucleotide resolution UV cross-linking and immunoprecipitation (iCLIP) to identify direct binding targets of RBFOX2 and CPSF2. Here, we describe the experimental design, and the quality control and data analyses that were performed on the dataset. The raw sequencing data have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE60392.

No MeSH data available.


Related in: MedlinePlus