Limits...
IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data.

Niu L, Huang W, Umbach DM, Li L - BMC Genomics (2014)

Bottom Line: Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease.Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze.When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues.

View Article: PubMed Central - PubMed

Affiliation: Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA. li3@niehs.nih.gov.

ABSTRACT

Background: Most genes in mammals generate several transcript isoforms that differ in stability and translational efficiency through alternative splicing. Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease. Thus, detecting differential isoform usage for a gene between tissues or cell lines/types (differences in the fraction of total expression of a gene represented by the expression of each of its isoforms) is potentially important for cell and developmental biology.

Results: We present a new method IUTA that is designed to test each gene in the genome for differential isoform usage between two groups of samples. IUTA also estimates isoform usage for each gene in each sample as well as averaged across samples within each group. IUTA is the first method to formulate the testing problem as testing for equal means of two probability distributions under the Aitchison geometry, which is widely recognized as the most appropriate geometry for compositional data (vectors that contain the relative amount of each component comprising the whole). Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze. When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues. IUTA is implemented as an R package and is available at http://www.niehs.nih.gov/research/resources/software/biostatistics/iuta/index.cfm.

Conclusions: Both simulation and real-data results suggest that IUTA accurately detects differential isoform usage. We believe that our analysis of RNA-seq data from six mouse tissues represents the first comprehensive characterization of isoform usage in these tissues. IUTA will be a valuable resource for those who study the roles of alternative transcripts in cell development and disease.

Show MeSH
Visualization of differential isoform usage forCd74in the mouse liver and spleen tissues. (Top) pie plot representations of tissue-specific isoform usage; (bottom) observed RNA-Seq reads coverage (in each sample of each tissue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4195885&req=5

Fig4: Visualization of differential isoform usage forCd74in the mouse liver and spleen tissues. (Top) pie plot representations of tissue-specific isoform usage; (bottom) observed RNA-Seq reads coverage (in each sample of each tissue).

Mentions: Among the 2385 mouse genes with more than one isoform and the same transcription start site (in the mouse RefSeq gene model) for all isoforms, IUTA was able to assess statistical significance of isoform usage difference for 1482 genes whereas Cuffdiff2 was able to do 1478 genes given the RNA-Seq data, of which 1268 genes were common between the two tools. IUTA detected more significant genes than Cuffdiff2 did under the same nominal False Discovery Rate (0.05): specifically, Cuffdiff2 reported 122 significant genes whereas IUTA identified 297 significant genes, among which 83 were in common. Visual examination of the isoform usage plots of the genes declared significant by IUTA but not by Cuffdiff2 suggests that the IUTA significant genes are credible. A good example is the Cd74 gene (Figure 4). Mouse Cd74 gene has two isoforms. The relative proportion of isoform NM_001042605 is higher in liver than in spleen, as supported by the read coverage plot (Figure 4). The unique (middle) exon in NM_001042605 has higher relative read coverage in liver than in spleen, whereas the read coverage on the common exons appears to be similar among the samples. Although Cuffdiff2 also gave a small p-value (0.020), it failed to reach significance after adjusting for false discovery rate (q-value = 0.14). It is worth pointing out that both IUTA and Cuffdiff2 gave similar isoform usage estimates for this gene. The estimates from IUTA are (0.25, 0.75) in liver and (0.11, 0.89) in spleen, whereas those from Cuffdiff2 are (0.22, 0.78) in liver and (0.11, 0.89) in spleen, respectively. Estimates of isoform usage were often similar between IUTA and Cuffdiff2 even when they differed in declaring a gene to have differential isoform usage between tissues.Figure 4


IUTA: a tool for effectively detecting differential isoform usage from RNA-Seq data.

Niu L, Huang W, Umbach DM, Li L - BMC Genomics (2014)

Visualization of differential isoform usage forCd74in the mouse liver and spleen tissues. (Top) pie plot representations of tissue-specific isoform usage; (bottom) observed RNA-Seq reads coverage (in each sample of each tissue).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4195885&req=5

Fig4: Visualization of differential isoform usage forCd74in the mouse liver and spleen tissues. (Top) pie plot representations of tissue-specific isoform usage; (bottom) observed RNA-Seq reads coverage (in each sample of each tissue).
Mentions: Among the 2385 mouse genes with more than one isoform and the same transcription start site (in the mouse RefSeq gene model) for all isoforms, IUTA was able to assess statistical significance of isoform usage difference for 1482 genes whereas Cuffdiff2 was able to do 1478 genes given the RNA-Seq data, of which 1268 genes were common between the two tools. IUTA detected more significant genes than Cuffdiff2 did under the same nominal False Discovery Rate (0.05): specifically, Cuffdiff2 reported 122 significant genes whereas IUTA identified 297 significant genes, among which 83 were in common. Visual examination of the isoform usage plots of the genes declared significant by IUTA but not by Cuffdiff2 suggests that the IUTA significant genes are credible. A good example is the Cd74 gene (Figure 4). Mouse Cd74 gene has two isoforms. The relative proportion of isoform NM_001042605 is higher in liver than in spleen, as supported by the read coverage plot (Figure 4). The unique (middle) exon in NM_001042605 has higher relative read coverage in liver than in spleen, whereas the read coverage on the common exons appears to be similar among the samples. Although Cuffdiff2 also gave a small p-value (0.020), it failed to reach significance after adjusting for false discovery rate (q-value = 0.14). It is worth pointing out that both IUTA and Cuffdiff2 gave similar isoform usage estimates for this gene. The estimates from IUTA are (0.25, 0.75) in liver and (0.11, 0.89) in spleen, whereas those from Cuffdiff2 are (0.22, 0.78) in liver and (0.11, 0.89) in spleen, respectively. Estimates of isoform usage were often similar between IUTA and Cuffdiff2 even when they differed in declaring a gene to have differential isoform usage between tissues.Figure 4

Bottom Line: Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease.Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze.When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues.

View Article: PubMed Central - PubMed

Affiliation: Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA. li3@niehs.nih.gov.

ABSTRACT

Background: Most genes in mammals generate several transcript isoforms that differ in stability and translational efficiency through alternative splicing. Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease. Thus, detecting differential isoform usage for a gene between tissues or cell lines/types (differences in the fraction of total expression of a gene represented by the expression of each of its isoforms) is potentially important for cell and developmental biology.

Results: We present a new method IUTA that is designed to test each gene in the genome for differential isoform usage between two groups of samples. IUTA also estimates isoform usage for each gene in each sample as well as averaged across samples within each group. IUTA is the first method to formulate the testing problem as testing for equal means of two probability distributions under the Aitchison geometry, which is widely recognized as the most appropriate geometry for compositional data (vectors that contain the relative amount of each component comprising the whole). Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze. When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues. IUTA is implemented as an R package and is available at http://www.niehs.nih.gov/research/resources/software/biostatistics/iuta/index.cfm.

Conclusions: Both simulation and real-data results suggest that IUTA accurately detects differential isoform usage. We believe that our analysis of RNA-seq data from six mouse tissues represents the first comprehensive characterization of isoform usage in these tissues. IUTA will be a valuable resource for those who study the roles of alternative transcripts in cell development and disease.

Show MeSH