Limits...
The application of sparse estimation of covariance matrix to quadratic discriminant analysis.

Sun J, Zhao H - BMC Bioinformatics (2015)

Bottom Line: Different versions of sparse LDA have been proposed to address this significant challenge.SQDA provides more accurate classification results than other methods for both simulated and real data.Our method should prove useful for classification in genomics studies and other research settings, where covariances differ among classes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatitics, Yale School of Publich Health, 60 College Street, New Haven, 06511, CT, USA. jiehuan.sun@yale.edu.

ABSTRACT

Background: Although Linear Discriminant Analysis (LDA) is commonly used for classification, it may not be directly applied in genomics studies due to the large p, small n problem in these studies. Different versions of sparse LDA have been proposed to address this significant challenge. One implicit assumption of various LDA-based methods is that the covariance matrices are the same across different classes. However, rewiring of genetic networks (therefore different covariance matrices) across different diseases has been observed in many genomics studies, which suggests that LDA and its variations may be suboptimal for disease classifications. However, it is not clear whether considering differing genetic networks across diseases can improve classification in genomics studies.

Results: We propose a sparse version of Quadratic Discriminant Analysis (SQDA) to explicitly consider the differences of the genetic networks across diseases. Both simulation and real data analysis are performed to compare the performance of SQDA with six commonly used classification methods.

Conclusions: SQDA provides more accurate classification results than other methods for both simulated and real data. Our method should prove useful for classification in genomics studies and other research settings, where covariances differ among classes.

Show MeSH
The effect of sample size on the seven classification methods. The effect of sample size on SQDA and six other classificaiton methods is shown in this figure based on the simulated data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4355996&req=5

Fig2: The effect of sample size on the seven classification methods. The effect of sample size on SQDA and six other classificaiton methods is shown in this figure based on the simulated data.

Mentions: The average misclassification rates are shown in Figure 2 for all methods. We consider variable selection by blocks for DLDA2 and DQDA2 the same as in our method except that the sparse estimation of covariance matrix is replaced with diagonalized estimator for covariance matrix. By comparing the performance of SQDA to DLDA2 and DQDA2, we can see the benefit of sparse estimation of different covariance matrices for different classes in addition to that from the variable selection by blocks procedure. It is clear that the performance of all methods is equally poor when the sample size is small whereas the improvement is largest for our method when the sample size increases.Figure 2


The application of sparse estimation of covariance matrix to quadratic discriminant analysis.

Sun J, Zhao H - BMC Bioinformatics (2015)

The effect of sample size on the seven classification methods. The effect of sample size on SQDA and six other classificaiton methods is shown in this figure based on the simulated data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4355996&req=5

Fig2: The effect of sample size on the seven classification methods. The effect of sample size on SQDA and six other classificaiton methods is shown in this figure based on the simulated data.
Mentions: The average misclassification rates are shown in Figure 2 for all methods. We consider variable selection by blocks for DLDA2 and DQDA2 the same as in our method except that the sparse estimation of covariance matrix is replaced with diagonalized estimator for covariance matrix. By comparing the performance of SQDA to DLDA2 and DQDA2, we can see the benefit of sparse estimation of different covariance matrices for different classes in addition to that from the variable selection by blocks procedure. It is clear that the performance of all methods is equally poor when the sample size is small whereas the improvement is largest for our method when the sample size increases.Figure 2

Bottom Line: Different versions of sparse LDA have been proposed to address this significant challenge.SQDA provides more accurate classification results than other methods for both simulated and real data.Our method should prove useful for classification in genomics studies and other research settings, where covariances differ among classes.

View Article: PubMed Central - PubMed

Affiliation: Department of Biostatitics, Yale School of Publich Health, 60 College Street, New Haven, 06511, CT, USA. jiehuan.sun@yale.edu.

ABSTRACT

Background: Although Linear Discriminant Analysis (LDA) is commonly used for classification, it may not be directly applied in genomics studies due to the large p, small n problem in these studies. Different versions of sparse LDA have been proposed to address this significant challenge. One implicit assumption of various LDA-based methods is that the covariance matrices are the same across different classes. However, rewiring of genetic networks (therefore different covariance matrices) across different diseases has been observed in many genomics studies, which suggests that LDA and its variations may be suboptimal for disease classifications. However, it is not clear whether considering differing genetic networks across diseases can improve classification in genomics studies.

Results: We propose a sparse version of Quadratic Discriminant Analysis (SQDA) to explicitly consider the differences of the genetic networks across diseases. Both simulation and real data analysis are performed to compare the performance of SQDA with six commonly used classification methods.

Conclusions: SQDA provides more accurate classification results than other methods for both simulated and real data. Our method should prove useful for classification in genomics studies and other research settings, where covariances differ among classes.

Show MeSH