Limits...
An extension of PPLS-DA for classification and comparison to ordinary PLS-DA.

Telaar A, Liland KH, Repsilber D, Nürnberg G - PLoS ONE (2013)

Bottom Line: For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA.A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error.Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genetics and Biometry, Department of Bioinformatics and Biomathematics, Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany.

ABSTRACT
Classification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups. The goal is to find a classification function/rule which assigns each object/patient to a unique group with the greatest possible accuracy (classification error). Especially in gene expression experiments often a lot of variables (genes) are measured for only few objects/patients. A suitable approach is the well-known method PLS-DA, which searches for a transformation to a lower dimensional space. Resulting new components are linear combinations of the original variables. An advancement of PLS-DA leads to PPLS-DA, introducing a so called 'power parameter', which is maximized towards the correlation between the components and the group-membership. We introduce an extension of PPLS-DA for optimizing this power parameter towards the final aim, namely towards a minimal classification error. We compare this new extension with the original PPLS-DA and also with the ordinary PLS-DA using simulated and experimental datasets. For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA. A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error. On the contrary, for the data set with strong between-feature collinearity and a low proportion of differentially expressed genes and a large total number of genes, the prediction error of PPLS-DA and the extensions is clearly lower than for PLS-DA. Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.

Show MeSH

Related in: MedlinePlus

Extension of PPLS-DA - for stepsize  and .The power parameter is denoted by , the prediction error (number of wrongly classified samples of the inner test set) is abbreviated with PE.  varied in 11 steps (). Cj, j = 15 is short for the jth component. The function min(f) takes the minimum of function . The cross-validation procedures consist of random samples of the outer training set to the proportions of 0.7 (training set) and 0.3 (test set). The cross-validation steps are conform to sampling with replacement. The optimal -value and the optimal number of components are determined after 50 repeats.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3569448&req=5

pone-0055267-g002: Extension of PPLS-DA - for stepsize and .The power parameter is denoted by , the prediction error (number of wrongly classified samples of the inner test set) is abbreviated with PE. varied in 11 steps (). Cj, j = 15 is short for the jth component. The function min(f) takes the minimum of function . The cross-validation procedures consist of random samples of the outer training set to the proportions of 0.7 (training set) and 0.3 (test set). The cross-validation steps are conform to sampling with replacement. The optimal -value and the optimal number of components are determined after 50 repeats.

Mentions: In the optimization, for each fixed value () the optimal number of components of PPLS-DA is determined as follows (see Figure 2): For the different inner test sets, the PE for one up to five components Cj, j is calculated, all using the same resulting in a matrix , . Then the average inner PE is calculated for each component. Therewith we select for each the smallest mean PE over the -test sets with a corresponding optimal number of components (). We search for the minimal PE leading to with the optimal number of components . For these components and the power parameter , we calculate the corresponding loading weights vectors on the outer training set, and we finally determine the PE of the outer test set.


An extension of PPLS-DA for classification and comparison to ordinary PLS-DA.

Telaar A, Liland KH, Repsilber D, Nürnberg G - PLoS ONE (2013)

Extension of PPLS-DA - for stepsize  and .The power parameter is denoted by , the prediction error (number of wrongly classified samples of the inner test set) is abbreviated with PE.  varied in 11 steps (). Cj, j = 15 is short for the jth component. The function min(f) takes the minimum of function . The cross-validation procedures consist of random samples of the outer training set to the proportions of 0.7 (training set) and 0.3 (test set). The cross-validation steps are conform to sampling with replacement. The optimal -value and the optimal number of components are determined after 50 repeats.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3569448&req=5

pone-0055267-g002: Extension of PPLS-DA - for stepsize and .The power parameter is denoted by , the prediction error (number of wrongly classified samples of the inner test set) is abbreviated with PE. varied in 11 steps (). Cj, j = 15 is short for the jth component. The function min(f) takes the minimum of function . The cross-validation procedures consist of random samples of the outer training set to the proportions of 0.7 (training set) and 0.3 (test set). The cross-validation steps are conform to sampling with replacement. The optimal -value and the optimal number of components are determined after 50 repeats.
Mentions: In the optimization, for each fixed value () the optimal number of components of PPLS-DA is determined as follows (see Figure 2): For the different inner test sets, the PE for one up to five components Cj, j is calculated, all using the same resulting in a matrix , . Then the average inner PE is calculated for each component. Therewith we select for each the smallest mean PE over the -test sets with a corresponding optimal number of components (). We search for the minimal PE leading to with the optimal number of components . For these components and the power parameter , we calculate the corresponding loading weights vectors on the outer training set, and we finally determine the PE of the outer test set.

Bottom Line: For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA.A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error.Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genetics and Biometry, Department of Bioinformatics and Biomathematics, Leibniz Institute for Farm Animal Biology, Dummerstorf, Germany.

ABSTRACT
Classification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups. The goal is to find a classification function/rule which assigns each object/patient to a unique group with the greatest possible accuracy (classification error). Especially in gene expression experiments often a lot of variables (genes) are measured for only few objects/patients. A suitable approach is the well-known method PLS-DA, which searches for a transformation to a lower dimensional space. Resulting new components are linear combinations of the original variables. An advancement of PLS-DA leads to PPLS-DA, introducing a so called 'power parameter', which is maximized towards the correlation between the components and the group-membership. We introduce an extension of PPLS-DA for optimizing this power parameter towards the final aim, namely towards a minimal classification error. We compare this new extension with the original PPLS-DA and also with the ordinary PLS-DA using simulated and experimental datasets. For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA. A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error. On the contrary, for the data set with strong between-feature collinearity and a low proportion of differentially expressed genes and a large total number of genes, the prediction error of PPLS-DA and the extensions is clearly lower than for PLS-DA. Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.

Show MeSH
Related in: MedlinePlus