Limits...
Non-parametric change-point method for differential gene expression detection.

Wang Y, Wu C, Ji Z, Wang B, Liang Y - PLoS ONE (2011)

Bottom Line: NPCPS is based on the change point theory to provide effective DGE detecting ability.An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE.Experiment results showed both good accuracy and reliability of NPCPS.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Jilin, China.

ABSTRACT

Background: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

Show MeSH

Related in: MedlinePlus

ROC curves of NPCPS and other methods when inappropriate formula                                applied.(A) Over expression formula applied to simulated under expressed                                gene,                                    n1 = n2 = 25,                                μ = −2,                                k = 8. (B) Under expression                                formula applied to simulated over expressed gene,                                    n1 = n2 = 25,                                μ = 2,                                k = 8. The x-axis is FPR, and                                the y-axis is TPR. ▽ is T, × is COPA, ○ is OS,                                • is ORT, ◊ is MOST, dotted line is LRS, dashed line is                                PPST, and solid line is NPCPS. The significance level                                α = 0.01 for NPCPS. NPCPS maintained the                                same level of sensitivity when applied to both types of simulated                                over-expressions. The other methods were not able to give results as                                good as when appropriate functions were applied as in Fig. 2 and Fig. 3.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3104986&req=5

pone-0020060-g008: ROC curves of NPCPS and other methods when inappropriate formula applied.(A) Over expression formula applied to simulated under expressed gene, n1 = n2 = 25, μ = −2, k = 8. (B) Under expression formula applied to simulated over expressed gene, n1 = n2 = 25, μ = 2, k = 8. The x-axis is FPR, and the y-axis is TPR. ▽ is T, × is COPA, ○ is OS, • is ORT, ◊ is MOST, dotted line is LRS, dashed line is PPST, and solid line is NPCPS. The significance level α = 0.01 for NPCPS. NPCPS maintained the same level of sensitivity when applied to both types of simulated over-expressions. The other methods were not able to give results as good as when appropriate functions were applied as in Fig. 2 and Fig. 3.

Mentions: The gene expression profile generated from microarray data usually contains samples of thousands of genes. Genes in the cancer samples might be over or under expressed. Majority of the DGE detecting methods have different formulas for under expressed and over expressed genes, respectively. For example, OS and ORT use different percentile values for over-expression and under-expression, respectively, and apply both formulas to the same microarray data. If over expression formula is applied to under expressed data, the DGE can not be correctly recognized. However, the detected results might contains false alarms, since both over-expression and under-expression formulas are applied to the same gene, and might be detected as DGE significant for twice. Unlike the other methods, NPCPS works for both types of DGE by using the same calculating formula, which would reduce the FDR, and do not require further analysis and computation aiming to clean the false alarms. When over expression formula was applied to under-expressed gene data (Fig. 8A), and vice versa (Fig. 8B), NPCPS presented stable performance in both situations, while other compared methods gave inferior ROC curves. According to the characteristic of ROC, T and MOST could have good ROC if the prediction result was inversed. The ROC curves of LRS were in the zone of random guess, which was close to the line-of-no-discrimination. Using LRS for under-expresson, user could turn under-expression into over-expression by inversing the dataset. This indicated that when over-expression formula of LRS was applied to under-expression, the random detecting result would be given.


Non-parametric change-point method for differential gene expression detection.

Wang Y, Wu C, Ji Z, Wang B, Liang Y - PLoS ONE (2011)

ROC curves of NPCPS and other methods when inappropriate formula                                applied.(A) Over expression formula applied to simulated under expressed                                gene,                                    n1 = n2 = 25,                                μ = −2,                                k = 8. (B) Under expression                                formula applied to simulated over expressed gene,                                    n1 = n2 = 25,                                μ = 2,                                k = 8. The x-axis is FPR, and                                the y-axis is TPR. ▽ is T, × is COPA, ○ is OS,                                • is ORT, ◊ is MOST, dotted line is LRS, dashed line is                                PPST, and solid line is NPCPS. The significance level                                α = 0.01 for NPCPS. NPCPS maintained the                                same level of sensitivity when applied to both types of simulated                                over-expressions. The other methods were not able to give results as                                good as when appropriate functions were applied as in Fig. 2 and Fig. 3.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3104986&req=5

pone-0020060-g008: ROC curves of NPCPS and other methods when inappropriate formula applied.(A) Over expression formula applied to simulated under expressed gene, n1 = n2 = 25, μ = −2, k = 8. (B) Under expression formula applied to simulated over expressed gene, n1 = n2 = 25, μ = 2, k = 8. The x-axis is FPR, and the y-axis is TPR. ▽ is T, × is COPA, ○ is OS, • is ORT, ◊ is MOST, dotted line is LRS, dashed line is PPST, and solid line is NPCPS. The significance level α = 0.01 for NPCPS. NPCPS maintained the same level of sensitivity when applied to both types of simulated over-expressions. The other methods were not able to give results as good as when appropriate functions were applied as in Fig. 2 and Fig. 3.
Mentions: The gene expression profile generated from microarray data usually contains samples of thousands of genes. Genes in the cancer samples might be over or under expressed. Majority of the DGE detecting methods have different formulas for under expressed and over expressed genes, respectively. For example, OS and ORT use different percentile values for over-expression and under-expression, respectively, and apply both formulas to the same microarray data. If over expression formula is applied to under expressed data, the DGE can not be correctly recognized. However, the detected results might contains false alarms, since both over-expression and under-expression formulas are applied to the same gene, and might be detected as DGE significant for twice. Unlike the other methods, NPCPS works for both types of DGE by using the same calculating formula, which would reduce the FDR, and do not require further analysis and computation aiming to clean the false alarms. When over expression formula was applied to under-expressed gene data (Fig. 8A), and vice versa (Fig. 8B), NPCPS presented stable performance in both situations, while other compared methods gave inferior ROC curves. According to the characteristic of ROC, T and MOST could have good ROC if the prediction result was inversed. The ROC curves of LRS were in the zone of random guess, which was close to the line-of-no-discrimination. Using LRS for under-expresson, user could turn under-expression into over-expression by inversing the dataset. This indicated that when over-expression formula of LRS was applied to under-expression, the random detecting result would be given.

Bottom Line: NPCPS is based on the change point theory to provide effective DGE detecting ability.An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE.Experiment results showed both good accuracy and reliability of NPCPS.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Jilin, China.

ABSTRACT

Background: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

Show MeSH
Related in: MedlinePlus