Limits...
Identification of allele-specific alternative mRNA processing via transcriptome sequencing.

Li G, Bahn JH, Lee JH, Peng G, Chen Z, Nelson SF, Xiao X - Nucleic Acids Res. (2012)

Bottom Line: Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era.Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies.Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology and Physiology, David Geffen School of Medicine and Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA.

ABSTRACT
Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single-nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26-45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

Show MeSH

Related in: MedlinePlus

Evaluation of the allelic ratios calculated from RNA-Seq. (A) Distribution of allelic ratios (no. of reads containing the reference allele/total no. of reads) at heterozygous SNVs with reads from both alleles (mean: 0.500, median: 0.5, P = 0.11, binomial test). (B) Scatter plot of the allelic ratios of pairs of heterozygous SNVs (with ≥20 reads) located in the same constitutive exons (502 pairs with many overlaps in the plot). Only SNV pairs whose phase can be inferred from the RNA-Seq reads were included in this analysis. Pearson correlation coefficient and P-values are shown. The solid line shows the linear regression of the data points and the dashed line denotes the diagonal line. (C) Similar as (B), but for all heterozygous SNVs with ≥20 reads in both biological replicates.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3401465&req=5

gks280-F1: Evaluation of the allelic ratios calculated from RNA-Seq. (A) Distribution of allelic ratios (no. of reads containing the reference allele/total no. of reads) at heterozygous SNVs with reads from both alleles (mean: 0.500, median: 0.5, P = 0.11, binomial test). (B) Scatter plot of the allelic ratios of pairs of heterozygous SNVs (with ≥20 reads) located in the same constitutive exons (502 pairs with many overlaps in the plot). Only SNV pairs whose phase can be inferred from the RNA-Seq reads were included in this analysis. Pearson correlation coefficient and P-values are shown. The solid line shows the linear regression of the data points and the dashed line denotes the diagonal line. (C) Similar as (B), but for all heterozygous SNVs with ≥20 reads in both biological replicates.

Mentions: In analyzing ASE of genetic variants in RNA-Seq reads, previous work observed that significant bias exists in the read-mapping results that favors reads harboring the reference allele of heterozygous SNVs (4,20–22). To evaluate whether such bias exists in our mapping, we examined the allelic ratios (defined as the number of reads with the reference allele divided by the total number of reads per SNV) of heterozygous SNVs (Figure 1A). In the absence of mapping bias, the average allelic ratio is expected to be 0.5 assuming ASE is only present in a small fraction of SNVs. As shown in Figure 1A, our results confirmed an average allelic ratio of 0.5, supporting the effectiveness of the mapping strategy. In contrast, if read mapping were carried out by allowing two mismatches on each read, as in traditional methods, a statistically significant bias toward the reference allele was detected (Supplementary Figure S3A). Note that the local peaks in Figure 1A at allelic ratios of about 0.33 and 0.66 were due to the prevalence of SNVs with low read coverage (specifically, with 1:2 or 2:1 read counts for the two alleles). The corresponding peaks were not observed if SNVs with three reads in total were excluded (Supplementary Figure S3B).Figure 1.


Identification of allele-specific alternative mRNA processing via transcriptome sequencing.

Li G, Bahn JH, Lee JH, Peng G, Chen Z, Nelson SF, Xiao X - Nucleic Acids Res. (2012)

Evaluation of the allelic ratios calculated from RNA-Seq. (A) Distribution of allelic ratios (no. of reads containing the reference allele/total no. of reads) at heterozygous SNVs with reads from both alleles (mean: 0.500, median: 0.5, P = 0.11, binomial test). (B) Scatter plot of the allelic ratios of pairs of heterozygous SNVs (with ≥20 reads) located in the same constitutive exons (502 pairs with many overlaps in the plot). Only SNV pairs whose phase can be inferred from the RNA-Seq reads were included in this analysis. Pearson correlation coefficient and P-values are shown. The solid line shows the linear regression of the data points and the dashed line denotes the diagonal line. (C) Similar as (B), but for all heterozygous SNVs with ≥20 reads in both biological replicates.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3401465&req=5

gks280-F1: Evaluation of the allelic ratios calculated from RNA-Seq. (A) Distribution of allelic ratios (no. of reads containing the reference allele/total no. of reads) at heterozygous SNVs with reads from both alleles (mean: 0.500, median: 0.5, P = 0.11, binomial test). (B) Scatter plot of the allelic ratios of pairs of heterozygous SNVs (with ≥20 reads) located in the same constitutive exons (502 pairs with many overlaps in the plot). Only SNV pairs whose phase can be inferred from the RNA-Seq reads were included in this analysis. Pearson correlation coefficient and P-values are shown. The solid line shows the linear regression of the data points and the dashed line denotes the diagonal line. (C) Similar as (B), but for all heterozygous SNVs with ≥20 reads in both biological replicates.
Mentions: In analyzing ASE of genetic variants in RNA-Seq reads, previous work observed that significant bias exists in the read-mapping results that favors reads harboring the reference allele of heterozygous SNVs (4,20–22). To evaluate whether such bias exists in our mapping, we examined the allelic ratios (defined as the number of reads with the reference allele divided by the total number of reads per SNV) of heterozygous SNVs (Figure 1A). In the absence of mapping bias, the average allelic ratio is expected to be 0.5 assuming ASE is only present in a small fraction of SNVs. As shown in Figure 1A, our results confirmed an average allelic ratio of 0.5, supporting the effectiveness of the mapping strategy. In contrast, if read mapping were carried out by allowing two mismatches on each read, as in traditional methods, a statistically significant bias toward the reference allele was detected (Supplementary Figure S3A). Note that the local peaks in Figure 1A at allelic ratios of about 0.33 and 0.66 were due to the prevalence of SNVs with low read coverage (specifically, with 1:2 or 2:1 read counts for the two alleles). The corresponding peaks were not observed if SNVs with three reads in total were excluded (Supplementary Figure S3B).Figure 1.

Bottom Line: Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era.Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies.Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Integrative Biology and Physiology, David Geffen School of Medicine and Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA.

ABSTRACT
Establishing the functional roles of genetic variants remains a significant challenge in the post-genomic era. Here, we present a method, allele-specific alternative mRNA processing (ASARP), to identify genetically influenced mRNA processing events using transcriptome sequencing (RNA-Seq) data. The method examines RNA-Seq data at both single-nucleotide and whole-gene/isoform levels to identify allele-specific expression (ASE) and existence of allele-specific regulation of mRNA processing. We applied the methods to data obtained from the human glioblastoma cell line U87MG and primary breast cancer tissues and found that 26-45% of all genes with sufficient read coverage demonstrated ASE, with significant overlap between the two cell types. Our methods predicted potential mechanisms underlying ASE due to regulations affecting either whole-gene-level expression or alternative mRNA processing, including alternative splicing, alternative polyadenylation and alternative transcriptional initiation. Allele-specific alternative splicing and alternative polyadenylation may explain ASE in hundreds of genes in each cell type. Reporter studies following these predictions identified the causal single nucleotide variants (SNVs) for several allele-specific alternative splicing events. Finally, many genes identified in our study were also reported as disease/phenotype-associated genes in genome-wide association studies. Future applications of our approach may provide ample insights for a better understanding of the genetic basis of gene regulation underlying phenotypic diversity and disease mechanisms.

Show MeSH
Related in: MedlinePlus