Limits...
NSMAP: a method for spliced isoforms identification and quantification from RNA-Seq.

Xia Z, Wen J, Chang CC, Zhou X - BMC Bioinformatics (2011)

Bottom Line: In contrast to previous methods, NSMAP performs identification of the structures of expressed isoforms and estimation of the expression levels of those expressed isoforms simultaneously, which enables better identification of isoforms.In the simulations parameterized by two real RNA-Seq data sets, more than 77% expressed isoforms are correctly identified and quantified.NSMAP provides a good strategy to identify and quantify novel isoforms without the knowledge of annotated reference genome which can further realize the potential of RNA-Seq technique in transcriptome analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiology, The Methodist Hospital Research Institute, Houston, TX 77030, USA.

ABSTRACT

Background: The development of techniques for sequencing the messenger RNA (RNA-Seq) enables it to study the biological mechanisms such as alternative splicing and gene expression regulation more deeply and accurately. Most existing methods employ RNA-Seq to quantify the expression levels of already annotated isoforms from the reference genome. However, the current reference genome is very incomplete due to the complexity of the transcriptome which hiders the comprehensive investigation of transcriptome using RNA-Seq. Novel study on isoform inference and estimation purely from RNA-Seq without annotation information is desirable.

Results: A Nonnegativity and Sparsity constrained Maximum APosteriori (NSMAP) model has been proposed to estimate the expression levels of isoforms from RNA-Seq data without the annotation information. In contrast to previous methods, NSMAP performs identification of the structures of expressed isoforms and estimation of the expression levels of those expressed isoforms simultaneously, which enables better identification of isoforms. In the simulations parameterized by two real RNA-Seq data sets, more than 77% expressed isoforms are correctly identified and quantified. Then, we apply NSMAP on two RNA-Seq data sets of myelodysplastic syndromes (MDS) samples and one normal sample in order to identify differentially expressed known and novel isoforms in MDS disease.

Conclusions: NSMAP provides a good strategy to identify and quantify novel isoforms without the knowledge of annotated reference genome which can further realize the potential of RNA-Seq technique in transcriptome analysis. NSMAP package is freely available at https://sites.google.com/site/nsmapforrnaseq.

Show MeSH

Related in: MedlinePlus

An example of solution path {Φ(k) / k = 0,1,...,K} along the X axis. Each curve represents the expression level of one candidate isoform during the iterations. The solution path is plotted from left to right along the X axis. This figure shows the number of expressed isoforms varies along the solution path.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3113944&req=5

Figure 6: An example of solution path {Φ(k) / k = 0,1,...,K} along the X axis. Each curve represents the expression level of one candidate isoform during the iterations. The solution path is plotted from left to right along the X axis. This figure shows the number of expressed isoforms varies along the solution path.

Mentions: A sequence of solutions corresponded to the decreasing t during iterations where K is the number of iterations. Each regularization parameter t(k) has a solution with several isoforms selected as active. There is only one selected isoform in when t is on its largest value t(0). With the decreasing of t, more isoforms are selected into the active set and the number of expressed isoforms varies as shown in Figure 6. We should select the best model from these solutions which can explain as more observations as possible using as few expressed isoforms as possible. The number of expressed isoforms of a solution equals with the number of positive elements in . In order to control the model size, we group the sequential solutions into according to the number of positive elements in each solution , where is used to count the number of positive elements in solution and Hh is a subset of . The number of expressed isoforms of each solution in Hh equals h.


NSMAP: a method for spliced isoforms identification and quantification from RNA-Seq.

Xia Z, Wen J, Chang CC, Zhou X - BMC Bioinformatics (2011)

An example of solution path {Φ(k) / k = 0,1,...,K} along the X axis. Each curve represents the expression level of one candidate isoform during the iterations. The solution path is plotted from left to right along the X axis. This figure shows the number of expressed isoforms varies along the solution path.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3113944&req=5

Figure 6: An example of solution path {Φ(k) / k = 0,1,...,K} along the X axis. Each curve represents the expression level of one candidate isoform during the iterations. The solution path is plotted from left to right along the X axis. This figure shows the number of expressed isoforms varies along the solution path.
Mentions: A sequence of solutions corresponded to the decreasing t during iterations where K is the number of iterations. Each regularization parameter t(k) has a solution with several isoforms selected as active. There is only one selected isoform in when t is on its largest value t(0). With the decreasing of t, more isoforms are selected into the active set and the number of expressed isoforms varies as shown in Figure 6. We should select the best model from these solutions which can explain as more observations as possible using as few expressed isoforms as possible. The number of expressed isoforms of a solution equals with the number of positive elements in . In order to control the model size, we group the sequential solutions into according to the number of positive elements in each solution , where is used to count the number of positive elements in solution and Hh is a subset of . The number of expressed isoforms of each solution in Hh equals h.

Bottom Line: In contrast to previous methods, NSMAP performs identification of the structures of expressed isoforms and estimation of the expression levels of those expressed isoforms simultaneously, which enables better identification of isoforms.In the simulations parameterized by two real RNA-Seq data sets, more than 77% expressed isoforms are correctly identified and quantified.NSMAP provides a good strategy to identify and quantify novel isoforms without the knowledge of annotated reference genome which can further realize the potential of RNA-Seq technique in transcriptome analysis.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Radiology, The Methodist Hospital Research Institute, Houston, TX 77030, USA.

ABSTRACT

Background: The development of techniques for sequencing the messenger RNA (RNA-Seq) enables it to study the biological mechanisms such as alternative splicing and gene expression regulation more deeply and accurately. Most existing methods employ RNA-Seq to quantify the expression levels of already annotated isoforms from the reference genome. However, the current reference genome is very incomplete due to the complexity of the transcriptome which hiders the comprehensive investigation of transcriptome using RNA-Seq. Novel study on isoform inference and estimation purely from RNA-Seq without annotation information is desirable.

Results: A Nonnegativity and Sparsity constrained Maximum APosteriori (NSMAP) model has been proposed to estimate the expression levels of isoforms from RNA-Seq data without the annotation information. In contrast to previous methods, NSMAP performs identification of the structures of expressed isoforms and estimation of the expression levels of those expressed isoforms simultaneously, which enables better identification of isoforms. In the simulations parameterized by two real RNA-Seq data sets, more than 77% expressed isoforms are correctly identified and quantified. Then, we apply NSMAP on two RNA-Seq data sets of myelodysplastic syndromes (MDS) samples and one normal sample in order to identify differentially expressed known and novel isoforms in MDS disease.

Conclusions: NSMAP provides a good strategy to identify and quantify novel isoforms without the knowledge of annotated reference genome which can further realize the potential of RNA-Seq technique in transcriptome analysis. NSMAP package is freely available at https://sites.google.com/site/nsmapforrnaseq.

Show MeSH
Related in: MedlinePlus