Limits...
Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Ricci L, Del Vescovo V, Cantaloni C, Grasso M, Barbareschi M, Denti MA - BMC Bioinformatics (2015)

Bottom Line: We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test.Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, University of Trento, Trento, I-38123, Italy. leonardo.ricci@unitn.it.

ABSTRACT

Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas.

Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.

Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

No MeSH data available.


Related in: MedlinePlus

Scatter plot of yopt=Δx205−0.8·Δx21 applied to an independent set of data. See Section “A classifier for ADC vs. SQC” and the caption of Fig. 2 for the color code of dots, lines and shaded areas. The values of the thresholds are reported in Table 4. The empty, red dot and the square, blue dot refer to a standard variability outlier and a “bias” outlier, respectively (see main text)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4559882&req=5

Fig5: Scatter plot of yopt=Δx205−0.8·Δx21 applied to an independent set of data. See Section “A classifier for ADC vs. SQC” and the caption of Fig. 2 for the color code of dots, lines and shaded areas. The values of the thresholds are reported in Table 4. The empty, red dot and the square, blue dot refer to a standard variability outlier and a “bias” outlier, respectively (see main text)

Mentions: Figure 5 shows the results of the application on a set of 9 additional samples of the classifier based on yopt and using the population mean, population standard deviation, and thresholds expressed in Tables 3 and 4. With the exception of one single case, all values of the triplicate standard deviations comply with the respective σmax requirements explained above. The single outlier is a miR-21 triplicate whose standard deviation of 0.52 slightly exceeds the maximum value of 0.46 (see Table 1) given by the significance level α=0.05. Of the remaining 8 samples, the classification provided by the classifier of Eq. (1) coincides with the immunohistochemical diagnosis for 7 samples; in all these cases, the odds are at least 90:10 (the same would happen for the sample containing the miR-21 outlier).Fig. 5


Statistical analysis of a Bayesian classifier based on the expression of miRNAs.

Ricci L, Del Vescovo V, Cantaloni C, Grasso M, Barbareschi M, Denti MA - BMC Bioinformatics (2015)

Scatter plot of yopt=Δx205−0.8·Δx21 applied to an independent set of data. See Section “A classifier for ADC vs. SQC” and the caption of Fig. 2 for the color code of dots, lines and shaded areas. The values of the thresholds are reported in Table 4. The empty, red dot and the square, blue dot refer to a standard variability outlier and a “bias” outlier, respectively (see main text)
© Copyright Policy - OpenAccess
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4559882&req=5

Fig5: Scatter plot of yopt=Δx205−0.8·Δx21 applied to an independent set of data. See Section “A classifier for ADC vs. SQC” and the caption of Fig. 2 for the color code of dots, lines and shaded areas. The values of the thresholds are reported in Table 4. The empty, red dot and the square, blue dot refer to a standard variability outlier and a “bias” outlier, respectively (see main text)
Mentions: Figure 5 shows the results of the application on a set of 9 additional samples of the classifier based on yopt and using the population mean, population standard deviation, and thresholds expressed in Tables 3 and 4. With the exception of one single case, all values of the triplicate standard deviations comply with the respective σmax requirements explained above. The single outlier is a miR-21 triplicate whose standard deviation of 0.52 slightly exceeds the maximum value of 0.46 (see Table 1) given by the significance level α=0.05. Of the remaining 8 samples, the classification provided by the classifier of Eq. (1) coincides with the immunohistochemical diagnosis for 7 samples; in all these cases, the odds are at least 90:10 (the same would happen for the sample containing the miR-21 outlier).Fig. 5

Bottom Line: We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test.Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

View Article: PubMed Central - PubMed

Affiliation: Department of Physics, University of Trento, Trento, I-38123, Italy. leonardo.ricci@unitn.it.

ABSTRACT

Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas.

Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers' performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA.

Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student's t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

No MeSH data available.


Related in: MedlinePlus