Limits...
Non-parametric change-point method for differential gene expression detection.

Wang Y, Wu C, Ji Z, Wang B, Liang Y - PLoS ONE (2011)

Bottom Line: NPCPS is based on the change point theory to provide effective DGE detecting ability.An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE.Experiment results showed both good accuracy and reliability of NPCPS.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Jilin, China.

ABSTRACT

Background: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

Show MeSH

Related in: MedlinePlus

Data distributions of genes top-ranked by two non-parametric                            methods.(A) Gene MGP, LRS rank: 11, NPCPS rank: 7049. (B) Gene IGF2, LRS rank:                            21, NPCPS rank: 7094. (C) Gene TNNC1, PPST rank: 21, NPCPS rank: 3697.                            (D) Gene E4F1, PPST rank: 47, NPCPS rank: 2595. Top-ranked genes by the                            two non-parametric methods did not have significant difference between                            the data distributions of cancer and normal groups.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3104986&req=5

pone-0020060-g012: Data distributions of genes top-ranked by two non-parametric methods.(A) Gene MGP, LRS rank: 11, NPCPS rank: 7049. (B) Gene IGF2, LRS rank: 21, NPCPS rank: 7094. (C) Gene TNNC1, PPST rank: 21, NPCPS rank: 3697. (D) Gene E4F1, PPST rank: 47, NPCPS rank: 2595. Top-ranked genes by the two non-parametric methods did not have significant difference between the data distributions of cancer and normal groups.

Mentions: As comparison, Fig. 11A–11J shows the data distributions of those top-ranked genes by the parametric methods, and Fig. 12A–12D by LRS and PPST. The data distributions were more similar to genes that were bottom-ranked by NPCPS in that small percent of the samples bring significant increase to data range. These few samples would greatly impact the cancer-group mean or median, which consequently result in a high test statistic of parametric methods. For example, in Fig. 11B and 12B, 96% of the two curves were close to each other while 4% data points in the normal group valued much greater, which equals to one outlier sample out of the 25 normal samples. Considering that the outliers were in the normal group, it was reasonable to assume that these outliers might be caused by microarray noise. For the rest of Fig. 11 and 12, except for T-statistic, the cancer group had one outlier. Fig. 11 and 12 indicate that the comparing methods are sensitive to significant change in mean and median, even when the change is introduced by a single sample which might be outliers. NPCPS is less prone to report a DGE as such few outliers are not sufficient to produce a large Dn.


Non-parametric change-point method for differential gene expression detection.

Wang Y, Wu C, Ji Z, Wang B, Liang Y - PLoS ONE (2011)

Data distributions of genes top-ranked by two non-parametric                            methods.(A) Gene MGP, LRS rank: 11, NPCPS rank: 7049. (B) Gene IGF2, LRS rank:                            21, NPCPS rank: 7094. (C) Gene TNNC1, PPST rank: 21, NPCPS rank: 3697.                            (D) Gene E4F1, PPST rank: 47, NPCPS rank: 2595. Top-ranked genes by the                            two non-parametric methods did not have significant difference between                            the data distributions of cancer and normal groups.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3104986&req=5

pone-0020060-g012: Data distributions of genes top-ranked by two non-parametric methods.(A) Gene MGP, LRS rank: 11, NPCPS rank: 7049. (B) Gene IGF2, LRS rank: 21, NPCPS rank: 7094. (C) Gene TNNC1, PPST rank: 21, NPCPS rank: 3697. (D) Gene E4F1, PPST rank: 47, NPCPS rank: 2595. Top-ranked genes by the two non-parametric methods did not have significant difference between the data distributions of cancer and normal groups.
Mentions: As comparison, Fig. 11A–11J shows the data distributions of those top-ranked genes by the parametric methods, and Fig. 12A–12D by LRS and PPST. The data distributions were more similar to genes that were bottom-ranked by NPCPS in that small percent of the samples bring significant increase to data range. These few samples would greatly impact the cancer-group mean or median, which consequently result in a high test statistic of parametric methods. For example, in Fig. 11B and 12B, 96% of the two curves were close to each other while 4% data points in the normal group valued much greater, which equals to one outlier sample out of the 25 normal samples. Considering that the outliers were in the normal group, it was reasonable to assume that these outliers might be caused by microarray noise. For the rest of Fig. 11 and 12, except for T-statistic, the cancer group had one outlier. Fig. 11 and 12 indicate that the comparing methods are sensitive to significant change in mean and median, even when the change is introduced by a single sample which might be outliers. NPCPS is less prone to report a DGE as such few outliers are not sufficient to produce a large Dn.

Bottom Line: NPCPS is based on the change point theory to provide effective DGE detecting ability.An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE.Experiment results showed both good accuracy and reliability of NPCPS.

View Article: PubMed Central - PubMed

Affiliation: Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Jilin, China.

ABSTRACT

Background: We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.

Methodology: NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.

Conclusions: Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods.

Show MeSH
Related in: MedlinePlus