Limits...
Support Vector Machine classifier for estrogen receptor positive and negative early-onset breast cancer.

Upstill-Goddard R, Eccles D, Ennis S, Rafiq S, Tapper W, Fliege J, Collins A - PLoS ONE (2013)

Bottom Line: Using a linear kernel Support Vector Machine, we achieved classification accuracy exceeding 93%.The model indicates that polygenic variation in more than 100 genes is likely to underlie the estrogen receptor phenotype in early-onset breast cancer.Functional classification of the genes involved identifies enrichment of functions linked to the immune system, which is consistent with the current understanding of the biological role of estrogen receptors in breast cancer.

View Article: PubMed Central - PubMed

Affiliation: Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom.

ABSTRACT
Two major breast cancer sub-types are defined by the expression of estrogen receptors on tumour cells. Cancers with large numbers of receptors are termed estrogen receptor positive and those with few are estrogen receptor negative. Using genome-wide single nucleotide polymorphism genotype data for a sample of early-onset breast cancer patients we developed a Support Vector Machine (SVM) classifier from 200 germline variants associated with estrogen receptor status (p<0.0005). Using a linear kernel Support Vector Machine, we achieved classification accuracy exceeding 93%. The model indicates that polygenic variation in more than 100 genes is likely to underlie the estrogen receptor phenotype in early-onset breast cancer. Functional classification of the genes involved identifies enrichment of functions linked to the immune system, which is consistent with the current understanding of the biological role of estrogen receptors in breast cancer.

Show MeSH

Related in: MedlinePlus

Relationship between weights under a linear classifier and chi-square values used in feature selection.SVM models were constructed on 542 study samples with genotype data for a subset of 200 SNPs chosen based on ER+/− association, determined from the chi-square statistic. SNP feature weights were obtained from the linear SVM model and used as an indicator of the importance of each feature for classification; SNPs with the largest absolute weight values are the most important for classification. Chi-square values used in feature selection and SVM classifier weight values are uncorrelated; Pearson’s correlation coefficient r = −0.026. SNPs with absolute weight values > 0.5 are annotated with the name of the gene in which they reside or are in closest proximity to.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3716652&req=5

pone-0068606-g002: Relationship between weights under a linear classifier and chi-square values used in feature selection.SVM models were constructed on 542 study samples with genotype data for a subset of 200 SNPs chosen based on ER+/− association, determined from the chi-square statistic. SNP feature weights were obtained from the linear SVM model and used as an indicator of the importance of each feature for classification; SNPs with the largest absolute weight values are the most important for classification. Chi-square values used in feature selection and SVM classifier weight values are uncorrelated; Pearson’s correlation coefficient r = −0.026. SNPs with absolute weight values > 0.5 are annotated with the name of the gene in which they reside or are in closest proximity to.

Mentions: Classifier performance was further evaluated using the receiver operating characteristic (ROC) area under curve (AUC) values which indicate these models have excellent accuracy: all exceed 0.9 (Table 1). ROC curves were produced for the linear model and RBF kernel model for both ER+ and ER− cases (Figure 1) based on true and false positive/negative values. Figure 2 shows the relationship between chi-squares for individual SNPs derived from PLINK [18], [19] and weights from the linear classification model. Variants with the largest (absolute value) weights are the most discriminating in the classifier. The input chi-squares used in feature selection (see methods) are uncorrelated with the linear SVM model weights (r = −0.026).


Support Vector Machine classifier for estrogen receptor positive and negative early-onset breast cancer.

Upstill-Goddard R, Eccles D, Ennis S, Rafiq S, Tapper W, Fliege J, Collins A - PLoS ONE (2013)

Relationship between weights under a linear classifier and chi-square values used in feature selection.SVM models were constructed on 542 study samples with genotype data for a subset of 200 SNPs chosen based on ER+/− association, determined from the chi-square statistic. SNP feature weights were obtained from the linear SVM model and used as an indicator of the importance of each feature for classification; SNPs with the largest absolute weight values are the most important for classification. Chi-square values used in feature selection and SVM classifier weight values are uncorrelated; Pearson’s correlation coefficient r = −0.026. SNPs with absolute weight values > 0.5 are annotated with the name of the gene in which they reside or are in closest proximity to.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3716652&req=5

pone-0068606-g002: Relationship between weights under a linear classifier and chi-square values used in feature selection.SVM models were constructed on 542 study samples with genotype data for a subset of 200 SNPs chosen based on ER+/− association, determined from the chi-square statistic. SNP feature weights were obtained from the linear SVM model and used as an indicator of the importance of each feature for classification; SNPs with the largest absolute weight values are the most important for classification. Chi-square values used in feature selection and SVM classifier weight values are uncorrelated; Pearson’s correlation coefficient r = −0.026. SNPs with absolute weight values > 0.5 are annotated with the name of the gene in which they reside or are in closest proximity to.
Mentions: Classifier performance was further evaluated using the receiver operating characteristic (ROC) area under curve (AUC) values which indicate these models have excellent accuracy: all exceed 0.9 (Table 1). ROC curves were produced for the linear model and RBF kernel model for both ER+ and ER− cases (Figure 1) based on true and false positive/negative values. Figure 2 shows the relationship between chi-squares for individual SNPs derived from PLINK [18], [19] and weights from the linear classification model. Variants with the largest (absolute value) weights are the most discriminating in the classifier. The input chi-squares used in feature selection (see methods) are uncorrelated with the linear SVM model weights (r = −0.026).

Bottom Line: Using a linear kernel Support Vector Machine, we achieved classification accuracy exceeding 93%.The model indicates that polygenic variation in more than 100 genes is likely to underlie the estrogen receptor phenotype in early-onset breast cancer.Functional classification of the genes involved identifies enrichment of functions linked to the immune system, which is consistent with the current understanding of the biological role of estrogen receptors in breast cancer.

View Article: PubMed Central - PubMed

Affiliation: Human Genetics and Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, United Kingdom.

ABSTRACT
Two major breast cancer sub-types are defined by the expression of estrogen receptors on tumour cells. Cancers with large numbers of receptors are termed estrogen receptor positive and those with few are estrogen receptor negative. Using genome-wide single nucleotide polymorphism genotype data for a sample of early-onset breast cancer patients we developed a Support Vector Machine (SVM) classifier from 200 germline variants associated with estrogen receptor status (p<0.0005). Using a linear kernel Support Vector Machine, we achieved classification accuracy exceeding 93%. The model indicates that polygenic variation in more than 100 genes is likely to underlie the estrogen receptor phenotype in early-onset breast cancer. Functional classification of the genes involved identifies enrichment of functions linked to the immune system, which is consistent with the current understanding of the biological role of estrogen receptors in breast cancer.

Show MeSH
Related in: MedlinePlus