Limits...
Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Ricci L, Del Vescovo V, Cantaloni C, Grasso M, Barbareschi M, Denti MA - BMC Bioinformatics (2015)

Bottom Line: We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test.Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, University of Trento, Trento, I-38123, Italy. leonardo.ricci@unitn.it.

ABSTRACT

Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas.

Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.

Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

No MeSH data available.


Related in: MedlinePlus

Histograms (left) of yopt=Δx205−0.8·Δx21 for samples belonging to the target class ADC (blue) and to the versus class SQC (red). Overlapping regions are in magenta. The bin width is equal to 1. Each histogram is normalized to the respective set size. The green bold line represents the discrimination threshold χ=3.6, whereas the green dashed lines represent the threshold displaced by its uncertainty, i.e. χ±dχ, with dχ=0.4 (see Table 4). ROC curves (right) of the classifier based on Δx205 (green line) and of the classifier based on yopt (red line) [20]. The increase of the AUC (area under the curve) from 0.9815 to 0.9926, respectively, is another marker of the improvement of the classifier
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4559882&req=5

Fig3: Histograms (left) of yopt=Δx205−0.8·Δx21 for samples belonging to the target class ADC (blue) and to the versus class SQC (red). Overlapping regions are in magenta. The bin width is equal to 1. Each histogram is normalized to the respective set size. The green bold line represents the discrimination threshold χ=3.6, whereas the green dashed lines represent the threshold displaced by its uncertainty, i.e. χ±dχ, with dχ=0.4 (see Table 4). ROC curves (right) of the classifier based on Δx205 (green line) and of the classifier based on yopt (red line) [20]. The increase of the AUC (area under the curve) from 0.9815 to 0.9926, respectively, is another marker of the improvement of the classifier

Mentions: Testing the same-parent-distribution hypothesis via Student’s t statistic provides p=2.6·10−11, half of the value obtained by testing t on the histograms generated by using the linear combination yDV. Figure 3 shows the histograms of yopt for samples belonging either to the target class ADC or to the versus class SQC. The Shapiro-Wilk test of normality yielded p-values of 0.78 (target class) and 0.24 (versus class).Fig. 3


Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Ricci L, Del Vescovo V, Cantaloni C, Grasso M, Barbareschi M, Denti MA - BMC Bioinformatics (2015)

Histograms (left) of yopt=Δx205−0.8·Δx21 for samples belonging to the target class ADC (blue) and to the versus class SQC (red). Overlapping regions are in magenta. The bin width is equal to 1. Each histogram is normalized to the respective set size. The green bold line represents the discrimination threshold χ=3.6, whereas the green dashed lines represent the threshold displaced by its uncertainty, i.e. χ±dχ, with dχ=0.4 (see Table 4). ROC curves (right) of the classifier based on Δx205 (green line) and of the classifier based on yopt (red line) [20]. The increase of the AUC (area under the curve) from 0.9815 to 0.9926, respectively, is another marker of the improvement of the classifier
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4559882&req=5

Fig3: Histograms (left) of yopt=Δx205−0.8·Δx21 for samples belonging to the target class ADC (blue) and to the versus class SQC (red). Overlapping regions are in magenta. The bin width is equal to 1. Each histogram is normalized to the respective set size. The green bold line represents the discrimination threshold χ=3.6, whereas the green dashed lines represent the threshold displaced by its uncertainty, i.e. χ±dχ, with dχ=0.4 (see Table 4). ROC curves (right) of the classifier based on Δx205 (green line) and of the classifier based on yopt (red line) [20]. The increase of the AUC (area under the curve) from 0.9815 to 0.9926, respectively, is another marker of the improvement of the classifier
Mentions: Testing the same-parent-distribution hypothesis via Student’s t statistic provides p=2.6·10−11, half of the value obtained by testing t on the histograms generated by using the linear combination yDV. Figure 3 shows the histograms of yopt for samples belonging either to the target class ADC or to the versus class SQC. The Shapiro-Wilk test of normality yielded p-values of 0.78 (target class) and 0.24 (versus class).Fig. 3

Bottom Line: We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test.Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, University of Trento, Trento, I-38123, Italy. leonardo.ricci@unitn.it.

ABSTRACT

Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas.

Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.

Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

No MeSH data available.


Related in: MedlinePlus