Limits...
Spatially Enhanced Differential RNA Methylation Analysis from Affinity-Based Sequencing Data with Hidden Markov Model.

Zhang YC, Zhang SW, Liu L, Liu H, Zhang L, Cui X, Huang Y, Meng J - Biomed Res Int (2015)

Bottom Line: Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue.Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy.Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.

ABSTRACT
With the development of new sequencing technology, the entire N6-methyl-adenosine (m(6)A) RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq), making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.

No MeSH data available.


Comparison of different strategies. FHB strategy is the most naïve and straightforward; FHC is the most time consuming and performs better than FHB but is less robust. With FastFHC, the algorithm can now be applied to genome scale dataset in a timely and robust manner.
© Copyright Policy - open-access
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4537718&req=5

fig4: Comparison of different strategies. FHB strategy is the most naïve and straightforward; FHC is the most time consuming and performs better than FHB but is less robust. With FastFHC, the algorithm can now be applied to genome scale dataset in a timely and robust manner.

Mentions: When the proposed method is used in real MeRIP-Seq dataset, two problems would emerge. What comes first was some reads would be mapped into very short genes; thus the number of the bins would be quite small. In other words, the length of some Markov chains would be too short for accurate estimation of parameters and finally affects the results of DMRs detection. In addition, computational time was another important factor that we should take into consideration. Take the human hg19 data we were going to test as an example. If there were more than 30000 detected RNA methylation sites in total, the Baum-Welch algorithm would be performed more than 30000 times and the execution time might be too long. In order to solve these two limitations, we could combine the two strategies together. Firstly, the threshold used in FHB was used here again to switch the FDR into binary DMS. Then we could estimate transition matrix AIII directly from this DMS information as shown in(16)πIII=1−∑i=1NDMSiN,∑i=1NDMSiN,AIII=PSn+1=0 ∣ Sn=0PSn+1=1 ∣ Sn=0PSn+1=0 ∣ Sn=1PSn+1=1 ∣ Sn=1,where P(Sn+1∣Sn) denotes the conditional probability for the transition from Sn to Sn+1, which can be conveniently estimated by scanning all the states of differential methylation S = {s1, s2,…, sN} on all RNA methylation sites. For every single gene, the emission probability BIII has the same form as BII in FHC strategy. By doing this, the AIII matrix can be estimated in a single step instead of an iterative manner so as to save computation load. This result should be also more robust on short RNA methylation sites with less number of bins than previous strategy. Secondly, we chose the Estep in FHB strategy to compute the final expectation defined in formula (14) for every single bin on every RNA methylation sites of real RNA epigenetics data. FastFHC strategy applied Estep after estimating transition matrix and initial probability for all genes. πIII and AIII are considered the same on different RNA methylation sites and are estimated like FHB with binary converted observation. Although some information can be lost in the conversion step, since tens of thousands of RNA methylation sites are pooled together for estimation of πIII and AIII, it should be still relatively accurate. The 3 strategies are summarized in Figure 4.


Spatially Enhanced Differential RNA Methylation Analysis from Affinity-Based Sequencing Data with Hidden Markov Model.

Zhang YC, Zhang SW, Liu L, Liu H, Zhang L, Cui X, Huang Y, Meng J - Biomed Res Int (2015)

Comparison of different strategies. FHB strategy is the most naïve and straightforward; FHC is the most time consuming and performs better than FHB but is less robust. With FastFHC, the algorithm can now be applied to genome scale dataset in a timely and robust manner.
© Copyright Policy - open-access
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4537718&req=5

fig4: Comparison of different strategies. FHB strategy is the most naïve and straightforward; FHC is the most time consuming and performs better than FHB but is less robust. With FastFHC, the algorithm can now be applied to genome scale dataset in a timely and robust manner.
Mentions: When the proposed method is used in real MeRIP-Seq dataset, two problems would emerge. What comes first was some reads would be mapped into very short genes; thus the number of the bins would be quite small. In other words, the length of some Markov chains would be too short for accurate estimation of parameters and finally affects the results of DMRs detection. In addition, computational time was another important factor that we should take into consideration. Take the human hg19 data we were going to test as an example. If there were more than 30000 detected RNA methylation sites in total, the Baum-Welch algorithm would be performed more than 30000 times and the execution time might be too long. In order to solve these two limitations, we could combine the two strategies together. Firstly, the threshold used in FHB was used here again to switch the FDR into binary DMS. Then we could estimate transition matrix AIII directly from this DMS information as shown in(16)πIII=1−∑i=1NDMSiN,∑i=1NDMSiN,AIII=PSn+1=0 ∣ Sn=0PSn+1=1 ∣ Sn=0PSn+1=0 ∣ Sn=1PSn+1=1 ∣ Sn=1,where P(Sn+1∣Sn) denotes the conditional probability for the transition from Sn to Sn+1, which can be conveniently estimated by scanning all the states of differential methylation S = {s1, s2,…, sN} on all RNA methylation sites. For every single gene, the emission probability BIII has the same form as BII in FHC strategy. By doing this, the AIII matrix can be estimated in a single step instead of an iterative manner so as to save computation load. This result should be also more robust on short RNA methylation sites with less number of bins than previous strategy. Secondly, we chose the Estep in FHB strategy to compute the final expectation defined in formula (14) for every single bin on every RNA methylation sites of real RNA epigenetics data. FastFHC strategy applied Estep after estimating transition matrix and initial probability for all genes. πIII and AIII are considered the same on different RNA methylation sites and are estimated like FHB with binary converted observation. Although some information can be lost in the conversion step, since tens of thousands of RNA methylation sites are pooled together for estimation of πIII and AIII, it should be still relatively accurate. The 3 strategies are summarized in Figure 4.

Bottom Line: Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue.Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy.Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.

ABSTRACT
With the development of new sequencing technology, the entire N6-methyl-adenosine (m(6)A) RNA methylome can now be unbiased profiled with methylated RNA immune-precipitation sequencing technique (MeRIP-Seq), making it possible to detect differential methylation states of RNA between two conditions, for example, between normal and cancerous tissue. However, as an affinity-based method, MeRIP-Seq has yet provided base-pair resolution; that is, a single methylation site determined from MeRIP-Seq data can in practice contain multiple RNA methylation residuals, some of which can be regulated by different enzymes and thus differentially methylated between two conditions. Since existing peak-based methods could not effectively differentiate multiple methylation residuals located within a single methylation site, we propose a hidden Markov model (HMM) based approach to address this issue. Specifically, the detected RNA methylation site is further divided into multiple adjacent small bins and then scanned with higher resolution using a hidden Markov model to model the dependency between spatially adjacent bins for improved accuracy. We tested the proposed algorithm on both simulated data and real data. Result suggests that the proposed algorithm clearly outperforms existing peak-based approach on simulated systems and detects differential methylation regions with higher statistical significance on real dataset.

No MeSH data available.