Limits...
Identification of allele-specific alternative mRNA processing via transcriptome sequencing.

Li G, Bahn JH, Lee JH, Peng G, Chen Z, Nelson SF, Xiao X - Nucleic Acids Res. (2012)

Bottom Line: Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era.Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies.Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology and Physiology, David Geffen School of Medicine and Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA.

ABSTRACT
Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single-nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26-45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

Show MeSH

Related in: MedlinePlus

Statistical power and read coverage for ASE analysis. (A) Number of reads per SNV required to reach levels of statistical power (Chi-square Goodness-of-Fit test, q ≤ 0.05) in the detection of allelic ratios of 0.7:0.3, 0.8:0.2 and 0.9:0.1 in the RNA-Seq reads. (B) Simulated results for percentage of SNVs with adequate power (N reads ≥ 10, 20 or 30, respectively) as a function of total mapped reads. The percentages were calculated against all exonic heterozygous SNVs with average coverage located in genes expressed at ≥ 1 RPKM in the original RNA-Seq data.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3401465&req=5

gks280-F2: Statistical power and read coverage for ASE analysis. (A) Number of reads per SNV required to reach levels of statistical power (Chi-square Goodness-of-Fit test, q ≤ 0.05) in the detection of allelic ratios of 0.7:0.3, 0.8:0.2 and 0.9:0.1 in the RNA-Seq reads. (B) Simulated results for percentage of SNVs with adequate power (N reads ≥ 10, 20 or 30, respectively) as a function of total mapped reads. The percentages were calculated against all exonic heterozygous SNVs with average coverage located in genes expressed at ≥ 1 RPKM in the original RNA-Seq data.

Mentions: To identify ASE events, we tested the hypothesis of equal expression of the alternative alleles of a heterozygous SNV. SNVs were excluded if they were potentially in regions with copy number variants determined by the read depth of the genome sequencing data (14,15). The power to detect a significant ASE event is dependent on the number of reads associated with an SNV, as shown in Figure 2A. For example, if our goal is to identify an allelic ratio of 0.8:0.2 (either reference/variant or variant/reference allele, two-sided test) with ∼75% power, then a minimum of 20 reads are needed for each SNV at an FDR of 5% (Figure 2A). Thus, a deeper RNA-Seq coverage can enable better power in detecting ASE patterns. To illustrate the dependence of this power requirement on the amount of available reads, we randomly sampled (with replacement) all the mapped reads and examined the read coverage of heterozygous SNVs (Figure 2B). This simulation offers a reasonable estimate of the requirement of sequencing depth since the available mapped reads in this study enabled coverage of most SNVs in expressed genes (≥1 RPKM) (Supplementary Figure S2B). As the number of reads increases, the number of SNVs that meet the power requirement approaches a plateau (at ∼200 million mapped reads for N ≥ 20) as a result of the limited number of expressed genes.Figure 2.


Identification of allele-specific alternative mRNA processing via transcriptome sequencing.

Li G, Bahn JH, Lee JH, Peng G, Chen Z, Nelson SF, Xiao X - Nucleic Acids Res. (2012)

Statistical power and read coverage for ASE analysis. (A) Number of reads per SNV required to reach levels of statistical power (Chi-square Goodness-of-Fit test, q ≤ 0.05) in the detection of allelic ratios of 0.7:0.3, 0.8:0.2 and 0.9:0.1 in the RNA-Seq reads. (B) Simulated results for percentage of SNVs with adequate power (N reads ≥ 10, 20 or 30, respectively) as a function of total mapped reads. The percentages were calculated against all exonic heterozygous SNVs with average coverage located in genes expressed at ≥ 1 RPKM in the original RNA-Seq data.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3401465&req=5

gks280-F2: Statistical power and read coverage for ASE analysis. (A) Number of reads per SNV required to reach levels of statistical power (Chi-square Goodness-of-Fit test, q ≤ 0.05) in the detection of allelic ratios of 0.7:0.3, 0.8:0.2 and 0.9:0.1 in the RNA-Seq reads. (B) Simulated results for percentage of SNVs with adequate power (N reads ≥ 10, 20 or 30, respectively) as a function of total mapped reads. The percentages were calculated against all exonic heterozygous SNVs with average coverage located in genes expressed at ≥ 1 RPKM in the original RNA-Seq data.
Mentions: To identify ASE events, we tested the hypothesis of equal expression of the alternative alleles of a heterozygous SNV. SNVs were excluded if they were potentially in regions with copy number variants determined by the read depth of the genome sequencing data (14,15). The power to detect a significant ASE event is dependent on the number of reads associated with an SNV, as shown in Figure 2A. For example, if our goal is to identify an allelic ratio of 0.8:0.2 (either reference/variant or variant/reference allele, two-sided test) with ∼75% power, then a minimum of 20 reads are needed for each SNV at an FDR of 5% (Figure 2A). Thus, a deeper RNA-Seq coverage can enable better power in detecting ASE patterns. To illustrate the dependence of this power requirement on the amount of available reads, we randomly sampled (with replacement) all the mapped reads and examined the read coverage of heterozygous SNVs (Figure 2B). This simulation offers a reasonable estimate of the requirement of sequencing depth since the available mapped reads in this study enabled coverage of most SNVs in expressed genes (≥1 RPKM) (Supplementary Figure S2B). As the number of reads increases, the number of SNVs that meet the power requirement approaches a plateau (at ∼200 million mapped reads for N ≥ 20) as a result of the limited number of expressed genes.Figure 2.

Bottom Line: Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era.Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies.Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology and Physiology, David Geffen School of Medicine and Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA.

ABSTRACT
Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single-nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26-45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

Show MeSH
Related in: MedlinePlus