Limits...
Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.

Guan P, Huang D, He M, Zhou B - J. Exp. Clin. Cancer Res. (2009)

Bottom Line: Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips, but also avoid one-sided results of separate experiment.The standard deviations of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set.The method that incorporates prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology, School of Public Health, China Medical University, Shenyang 110001, PR China. pguan@mail.cmu.edu.cn

ABSTRACT

Background: A reliable and precise classification is essential for successful diagnosis and treatment of cancer. Gene expression microarrays have provided the high-throughput platform to discover genomic biomarkers for cancer diagnosis and prognosis. Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips, but also avoid one-sided results of separate experiment. However, only some studies have been aware of the importance of prior information in cancer classification.

Methods: Together with the application of support vector machine as the discriminant approach, we proposed one modified method that incorporated prior knowledge into cancer classification based on gene expression data to improve accuracy. A public well-known dataset, Malignant pleural mesothelioma and lung adenocarcinoma gene expression database, was used in this study. Prior knowledge is viewed here as a means of directing the classifier using known lung adenocarcinoma related genes. The procedures were performed by software R 2.80.

Results: The modified method performed better after incorporating prior knowledge. Accuracy of the modified method improved from 98.86% to 100% in training set and from 98.51% to 99.06% in test set. The standard deviations of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set.

Conclusion: The method that incorporates prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise. This idea may have good future not only in practice but also in methodology.

Show MeSH

Related in: MedlinePlus

Accuracy comparisons, no prior knowledge vs. with prior knowledge. Note: * Accuracy is significantly higher when compared to no prior knowledge at the 0.05 level (2-tailed). ** Accuracy is significantly higher when compared to no prior knowledge at the 0.01 level (2-tailed).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2719616&req=5

Figure 3: Accuracy comparisons, no prior knowledge vs. with prior knowledge. Note: * Accuracy is significantly higher when compared to no prior knowledge at the 0.05 level (2-tailed). ** Accuracy is significantly higher when compared to no prior knowledge at the 0.01 level (2-tailed).

Mentions: Our proposed method performed better after incorporating prior knowledge (Figure 3). Accuracy of the modified method improved from 98.86% to 100% in training set and from 98.51% to 99.06% in test set. The standard deviation of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set.


Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.

Guan P, Huang D, He M, Zhou B - J. Exp. Clin. Cancer Res. (2009)

Accuracy comparisons, no prior knowledge vs. with prior knowledge. Note: * Accuracy is significantly higher when compared to no prior knowledge at the 0.05 level (2-tailed). ** Accuracy is significantly higher when compared to no prior knowledge at the 0.01 level (2-tailed).
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2719616&req=5

Figure 3: Accuracy comparisons, no prior knowledge vs. with prior knowledge. Note: * Accuracy is significantly higher when compared to no prior knowledge at the 0.05 level (2-tailed). ** Accuracy is significantly higher when compared to no prior knowledge at the 0.01 level (2-tailed).
Mentions: Our proposed method performed better after incorporating prior knowledge (Figure 3). Accuracy of the modified method improved from 98.86% to 100% in training set and from 98.51% to 99.06% in test set. The standard deviation of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set.

Bottom Line: Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips, but also avoid one-sided results of separate experiment.The standard deviations of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set.The method that incorporates prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Epidemiology, School of Public Health, China Medical University, Shenyang 110001, PR China. pguan@mail.cmu.edu.cn

ABSTRACT

Background: A reliable and precise classification is essential for successful diagnosis and treatment of cancer. Gene expression microarrays have provided the high-throughput platform to discover genomic biomarkers for cancer diagnosis and prognosis. Rational use of the available bioinformation can not only effectively remove or suppress noise in gene chips, but also avoid one-sided results of separate experiment. However, only some studies have been aware of the importance of prior information in cancer classification.

Methods: Together with the application of support vector machine as the discriminant approach, we proposed one modified method that incorporated prior knowledge into cancer classification based on gene expression data to improve accuracy. A public well-known dataset, Malignant pleural mesothelioma and lung adenocarcinoma gene expression database, was used in this study. Prior knowledge is viewed here as a means of directing the classifier using known lung adenocarcinoma related genes. The procedures were performed by software R 2.80.

Results: The modified method performed better after incorporating prior knowledge. Accuracy of the modified method improved from 98.86% to 100% in training set and from 98.51% to 99.06% in test set. The standard deviations of the modified method decreased from 0.26% to 0 in training set and from 3.04% to 2.10% in test set.

Conclusion: The method that incorporates prior knowledge into discriminant analysis could effectively improve the capacity and reduce the impact of noise. This idea may have good future not only in practice but also in methodology.

Show MeSH
Related in: MedlinePlus