Limits...
Non-parametric change-point method for differential gene expression detection.

Wang Y, Wu C, Ji Z, Wang B, Liang Y - PLoS ONE (2011)

Bottom Line: NPCPS is based on the change point theory to provide effective DGE detecting ability.An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE.Experiment results showed both good accuracy and reliability of NPCPS.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Jilin, China.

ABSTRACT

Background: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

Show MeSH

Related in: MedlinePlus

FPR and estimate of change-point position.(A) Monte Carlo simulation results of dataset with size                                n1 = n2 = 25                            and significance level α = 0.01. (B) Monte                            Carlo simulation results of dataset with size                                n1 = n2 = 50                            and significance level α = 0.01. The x-axis is                                k, the number of samples in simulated dataset that                            contained DGE. The trend of curves in (A) and (B) was similar. Both FPR                            and estimate of change-point enhanced with the increasing                                k. When k>9, the difference                            between the true change-point and the estimated change-point was very                            small, and the FPR of NPCPS became lower than the significance level                            α, which indicated that the hypothesis test of NPCPS passed the                            Monte Carlo simulation.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3104986&req=5

pone-0020060-g001: FPR and estimate of change-point position.(A) Monte Carlo simulation results of dataset with size n1 = n2 = 25 and significance level α = 0.01. (B) Monte Carlo simulation results of dataset with size n1 = n2 = 50 and significance level α = 0.01. The x-axis is k, the number of samples in simulated dataset that contained DGE. The trend of curves in (A) and (B) was similar. Both FPR and estimate of change-point enhanced with the increasing k. When k>9, the difference between the true change-point and the estimated change-point was very small, and the FPR of NPCPS became lower than the significance level α, which indicated that the hypothesis test of NPCPS passed the Monte Carlo simulation.

Mentions: Monte Carlo simulation can be used to evaluate the performance of a hypothesis test in terms of the ratio of Type I error, i.e. false positive rate (FPR). For each Monte Carlo simulation, NPCPS was applied to an artificial 7000-gene dataset (normal random numbers with mean = 0, standard deviation sd = 1) composed of n1 normal samples and n2 cancer samples, of which k (0<k<n2) cancer samples contained DGE simulated by adding a constant μ to the original normal random numbers. Multiple simulations were carried out according to different values of sample size n, DGE sample size k, and significance level α. The FPR (Table 1) and average estimate of change point (Table 2) were computed and the results of simulation with α = 0.01 were illustrated in Fig. 1. For data set n1 = n2 = 25 (Fig. 1A), the FPR was larger when k was smaller; FPR decreased with k increasing; when k was equal to or larger than 9, the detecting accuracy of NPCPS was sufficient to satisfy the significance level. For data set n1 = n2 = 50 (Fig. 1B), k should be not less than 9 to satisfy the significance level. The estimate of change point enhanced greatly when k increased; the estimated position became very close to the actual position at the same time as the FPR dropped below the significance level. This indicates that NPCPS is highly sensitive to left boundary and less sensitive to the right boundary, when the F2 information is not sufficient [21].


Non-parametric change-point method for differential gene expression detection.

Wang Y, Wu C, Ji Z, Wang B, Liang Y - PLoS ONE (2011)

FPR and estimate of change-point position.(A) Monte Carlo simulation results of dataset with size                                n1 = n2 = 25                            and significance level α = 0.01. (B) Monte                            Carlo simulation results of dataset with size                                n1 = n2 = 50                            and significance level α = 0.01. The x-axis is                                k, the number of samples in simulated dataset that                            contained DGE. The trend of curves in (A) and (B) was similar. Both FPR                            and estimate of change-point enhanced with the increasing                                k. When k>9, the difference                            between the true change-point and the estimated change-point was very                            small, and the FPR of NPCPS became lower than the significance level                            α, which indicated that the hypothesis test of NPCPS passed the                            Monte Carlo simulation.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3104986&req=5

pone-0020060-g001: FPR and estimate of change-point position.(A) Monte Carlo simulation results of dataset with size n1 = n2 = 25 and significance level α = 0.01. (B) Monte Carlo simulation results of dataset with size n1 = n2 = 50 and significance level α = 0.01. The x-axis is k, the number of samples in simulated dataset that contained DGE. The trend of curves in (A) and (B) was similar. Both FPR and estimate of change-point enhanced with the increasing k. When k>9, the difference between the true change-point and the estimated change-point was very small, and the FPR of NPCPS became lower than the significance level α, which indicated that the hypothesis test of NPCPS passed the Monte Carlo simulation.
Mentions: Monte Carlo simulation can be used to evaluate the performance of a hypothesis test in terms of the ratio of Type I error, i.e. false positive rate (FPR). For each Monte Carlo simulation, NPCPS was applied to an artificial 7000-gene dataset (normal random numbers with mean = 0, standard deviation sd = 1) composed of n1 normal samples and n2 cancer samples, of which k (0<k<n2) cancer samples contained DGE simulated by adding a constant μ to the original normal random numbers. Multiple simulations were carried out according to different values of sample size n, DGE sample size k, and significance level α. The FPR (Table 1) and average estimate of change point (Table 2) were computed and the results of simulation with α = 0.01 were illustrated in Fig. 1. For data set n1 = n2 = 25 (Fig. 1A), the FPR was larger when k was smaller; FPR decreased with k increasing; when k was equal to or larger than 9, the detecting accuracy of NPCPS was sufficient to satisfy the significance level. For data set n1 = n2 = 50 (Fig. 1B), k should be not less than 9 to satisfy the significance level. The estimate of change point enhanced greatly when k increased; the estimated position became very close to the actual position at the same time as the FPR dropped below the significance level. This indicates that NPCPS is highly sensitive to left boundary and less sensitive to the right boundary, when the F2 information is not sufficient [21].

Bottom Line: NPCPS is based on the change point theory to provide effective DGE detecting ability.An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE.Experiment results showed both good accuracy and reliability of NPCPS.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Jilin, China.

ABSTRACT

Background: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

Show MeSH
Related in: MedlinePlus