Limits...
Joint L 1/2 -Norm Constraint and Graph-Laplacian PCA Method for Feature Extraction

View Article: PubMed Central - PubMed

ABSTRACT

Principal Component Analysis (PCA) as a tool for dimensionality reduction is widely used in many areas. In the area of bioinformatics, each involved variable corresponds to a specific gene. In order to improve the robustness of PCA-based method, this paper proposes a novel graph-Laplacian PCA algorithm by adopting L1/2 constraint (L1/2 gLPCA) on error function for feature (gene) extraction. The error function based on L1/2-norm helps to reduce the influence of outliers and noise. Augmented Lagrange Multipliers (ALM) method is applied to solve the subproblem. This method gets better results in feature extraction than other state-of-the-art PCA-based methods. Extensive experimental results on simulation data and gene expression data sets demonstrate that our method can get higher identification accuracies than others.

No MeSH data available.


The accuracy of different methods on simulation data with different numbers of samples.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC5392409&req=5

fig2: The accuracy of different methods on simulation data with different numbers of samples.

Mentions: In order to give more accurate experiment results, the average values of the results of 30 times are adopted. For fairness and uniformity, 200 genes are selected by the five methods with their unique parameters. Here, we show the accuracy (%) of these methods. In Figure 1, two factors as two different axes are in the figure. In Figure 2, x-axis is the number of samples. x-axis is the value of parameter μ. The accuracy is defined as follows:(28)Accuracy=1t∑i=1tAcci×100%,where t is the iterative times and Acci is the identification accuracy of the ith time. We define Acc as follows:(29)Acc=1r∑j=1rδIj,mapIj,where r denotes the number of genes, δ(m, n) is a function that equals to 0 if m ≠ n and equals to 1 if m = n. We use the function map(I) to map the identification of labels. In Figure 1, we show the average accuracies of the seven methods with different sparse parameters while the simulation data is 2000 × 10 and the average accuracy with all parameters is listed in Table 1. In general, if the algorithm is more sensitive to noise and outliers, the deviation will be greater and the accuracy will be greatly reduced. It is worthy to notice that L1/2 gLPCA works better than other six methods with higher identification accuracies. This means that our algorithm has lower sensitivity to noise and outliers. This table clearly displays the detail of the identification accuracies in different sparse parameters; our method indicates the superiority when the parameter is larger than 0.4 and the curve is more stable. The accuracy of L0 PCA and L1 PCA starts a precipitous decline when the parameter is larger than 0.7 and 0.8. Compared with L0 PCA and L1 PCA, the methods of L1/2 gLPCA, RgLPCA, gLPCA, PCA, and LE are not sensitive to the parameter, so there is no substantial change. The stability and average accuracy of various methods can be seen from Table 1.


Joint L 1/2 -Norm Constraint and Graph-Laplacian PCA Method for Feature Extraction
The accuracy of different methods on simulation data with different numbers of samples.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC5392409&req=5

fig2: The accuracy of different methods on simulation data with different numbers of samples.
Mentions: In order to give more accurate experiment results, the average values of the results of 30 times are adopted. For fairness and uniformity, 200 genes are selected by the five methods with their unique parameters. Here, we show the accuracy (%) of these methods. In Figure 1, two factors as two different axes are in the figure. In Figure 2, x-axis is the number of samples. x-axis is the value of parameter μ. The accuracy is defined as follows:(28)Accuracy=1t∑i=1tAcci×100%,where t is the iterative times and Acci is the identification accuracy of the ith time. We define Acc as follows:(29)Acc=1r∑j=1rδIj,mapIj,where r denotes the number of genes, δ(m, n) is a function that equals to 0 if m ≠ n and equals to 1 if m = n. We use the function map(I) to map the identification of labels. In Figure 1, we show the average accuracies of the seven methods with different sparse parameters while the simulation data is 2000 × 10 and the average accuracy with all parameters is listed in Table 1. In general, if the algorithm is more sensitive to noise and outliers, the deviation will be greater and the accuracy will be greatly reduced. It is worthy to notice that L1/2 gLPCA works better than other six methods with higher identification accuracies. This means that our algorithm has lower sensitivity to noise and outliers. This table clearly displays the detail of the identification accuracies in different sparse parameters; our method indicates the superiority when the parameter is larger than 0.4 and the curve is more stable. The accuracy of L0 PCA and L1 PCA starts a precipitous decline when the parameter is larger than 0.7 and 0.8. Compared with L0 PCA and L1 PCA, the methods of L1/2 gLPCA, RgLPCA, gLPCA, PCA, and LE are not sensitive to the parameter, so there is no substantial change. The stability and average accuracy of various methods can be seen from Table 1.

View Article: PubMed Central - PubMed

ABSTRACT

Principal Component Analysis (PCA) as a tool for dimensionality reduction is widely used in many areas. In the area of bioinformatics, each involved variable corresponds to a specific gene. In order to improve the robustness of PCA-based method, this paper proposes a novel graph-Laplacian PCA algorithm by adopting L1/2 constraint (L1/2 gLPCA) on error function for feature (gene) extraction. The error function based on L1/2-norm helps to reduce the influence of outliers and noise. Augmented Lagrange Multipliers (ALM) method is applied to solve the subproblem. This method gets better results in feature extraction than other state-of-the-art PCA-based methods. Extensive experimental results on simulation data and gene expression data sets demonstrate that our method can get higher identification accuracies than others.

No MeSH data available.