Limits...
A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard.

Keith JM, Davey CM, Boyd SE - BMC Bioinformatics (2012)

Bottom Line: In all cases, run times were feasible, and results precise.In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria.We propose as a general rule of thumb that the union of classifiers will be close to optimal.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Mathematical Sciences, Monash University, Victoria 3800, Australia. jonathan.keith@monash.edu

ABSTRACT

Background: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers.

Results: We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/.

Conclusions: The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria. We propose as a general rule of thumb that the union of classifiers will be close to optimal.

Show MeSH

Related in: MedlinePlus

Swine flu results. Density plots of model variables for the swine flu data. A: Sensitivity of the NPA classifier. B: Sensitivity of the NS classifier. C: Specificity of the NPA classifier. D: Specificity of the NS classifier. E: Prevalence of the disease.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3473310&req=5

Figure 2: Swine flu results. Density plots of model variables for the swine flu data. A: Sensitivity of the NPA classifier. B: Sensitivity of the NS classifier. C: Specificity of the NPA classifier. D: Specificity of the NS classifier. E: Prevalence of the disease.

Mentions: Density plots were produced using the last 5000 iterations of the time-series, as shown in Figure 2. The inferred densities exhibited low standard deviations, with an average standard deviation of 0.099 and a maximum of 0.1568, indicating surprisingly good confidence in determining the parameters with a small data set (see Additional file 1: Section S4.1 for means and standard deviations of all parameters). Notice in Figure 2 that the sensitivity of the NPA test (A) is substantially higher than the sensitivity of the NS test (B). However, the specificity of NPA (C) is marginally lower than the specificity of NS (D). On balance, the NPA appears to be the better test, and this conclusion is supported by the ranking criteria that we introduce below (“Inference of the best combination of classifiers”). Note that, in Additional file 1: Section S7.2.3, the NPA test (C1) scores higher than NS (C2) according to all four ranking criteria.


A Bayesian method for comparing and combining binary classifiers in the absence of a gold standard.

Keith JM, Davey CM, Boyd SE - BMC Bioinformatics (2012)

Swine flu results. Density plots of model variables for the swine flu data. A: Sensitivity of the NPA classifier. B: Sensitivity of the NS classifier. C: Specificity of the NPA classifier. D: Specificity of the NS classifier. E: Prevalence of the disease.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3473310&req=5

Figure 2: Swine flu results. Density plots of model variables for the swine flu data. A: Sensitivity of the NPA classifier. B: Sensitivity of the NS classifier. C: Specificity of the NPA classifier. D: Specificity of the NS classifier. E: Prevalence of the disease.
Mentions: Density plots were produced using the last 5000 iterations of the time-series, as shown in Figure 2. The inferred densities exhibited low standard deviations, with an average standard deviation of 0.099 and a maximum of 0.1568, indicating surprisingly good confidence in determining the parameters with a small data set (see Additional file 1: Section S4.1 for means and standard deviations of all parameters). Notice in Figure 2 that the sensitivity of the NPA test (A) is substantially higher than the sensitivity of the NS test (B). However, the specificity of NPA (C) is marginally lower than the specificity of NS (D). On balance, the NPA appears to be the better test, and this conclusion is supported by the ranking criteria that we introduce below (“Inference of the best combination of classifiers”). Note that, in Additional file 1: Section S7.2.3, the NPA test (C1) scores higher than NS (C2) according to all four ranking criteria.

Bottom Line: In all cases, run times were feasible, and results precise.In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria.We propose as a general rule of thumb that the union of classifiers will be close to optimal.

View Article: PubMed Central - HTML - PubMed

Affiliation: School of Mathematical Sciences, Monash University, Victoria 3800, Australia. jonathan.keith@monash.edu

ABSTRACT

Background: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers.

Results: We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/.

Conclusions: The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria. We propose as a general rule of thumb that the union of classifiers will be close to optimal.

Show MeSH
Related in: MedlinePlus