Limits...
Inter-species inference of gene set enrichment in lung epithelial cells from proteomic and large transcriptomic datasets.

Hormoz S, Bhanot G, Biehl M, Bilal E, Meyer P, Norel R, Rhrissorrakrai K, Dayarian A - Bioinformatics (2014)

Bottom Line: Translating findings in rodent models to human models has been a cornerstone of modern biology and drug development.However, in many cases, a naive 'extrapolation' between the two species has not succeeded.In spite of this difference, we were able to develop a robust algorithm to predict gene set activation in NHBE with high accuracy using simple analytical methods.

View Article: PubMed Central - PubMed

Affiliation: Kavli Institute for Theoretical Physics, Kohn Hall, University of California, Santa Barbara, CA 93106, USA, Department of Physics, Department of Molecular Biology and Biochemistry, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA, Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, 9700 AK Groningen, The Netherlands and IBM T.J. Watson Research Center, Computational Biology, Yorktown Heights, NY 10003, USA.

Show MeSH

Related in: MedlinePlus

Optimizing the number of principal components used for prediction. The classifier used was a naive Bayes classifier, applied to 25 of the 26 known stimuli (A set) to predict the 26th. All metrics: area under the ROC curve (A), Pearson correlation coefficient (B) and Matthews correlation coefficient (not shown) are maximum near N = 8, suggesting that 8 leading principal components are optimal. (B) The blue line corresponds to correlating the prediction to the binarized known FDR values (ON when FDR < 0.25) and the red line to correlation with the FDR values converted to a continuous scale (using 1-FDR value)
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4325538&req=5

btu569-F9: Optimizing the number of principal components used for prediction. The classifier used was a naive Bayes classifier, applied to 25 of the 26 known stimuli (A set) to predict the 26th. All metrics: area under the ROC curve (A), Pearson correlation coefficient (B) and Matthews correlation coefficient (not shown) are maximum near N = 8, suggesting that 8 leading principal components are optimal. (B) The blue line corresponds to correlating the prediction to the binarized known FDR values (ON when FDR < 0.25) and the red line to correlation with the FDR values converted to a continuous scale (using 1-FDR value)

Mentions: An important free parameter in the above algorithm is the number of principal components used in the prediction. To optimize this, we applied the algorithm to the 26 known stimuli (set A) by predicting one of the stimuli from training on the remaining 25. The output of the classifier was then compared with the actual measurement. Various metrics such as the area under the receiver-operating characteristic (ROC) curve (Davis and Goadrich, 2006; Fawcett, 2006), Pearson correlation coefficient and Matthews correlation coefficient were used to quantify the performance of the classifier (see Fig. 9). The classifier was optimized when eight leading principal components were used. We also repeated the analysis using a linear discriminant analysis algorithm (Methods); the performance did not improve significantly. For the final prediction, we used N = 8 and applied a leave-one-out procedure to the naive Bayes classifier.Fig. 9.


Inter-species inference of gene set enrichment in lung epithelial cells from proteomic and large transcriptomic datasets.

Hormoz S, Bhanot G, Biehl M, Bilal E, Meyer P, Norel R, Rhrissorrakrai K, Dayarian A - Bioinformatics (2014)

Optimizing the number of principal components used for prediction. The classifier used was a naive Bayes classifier, applied to 25 of the 26 known stimuli (A set) to predict the 26th. All metrics: area under the ROC curve (A), Pearson correlation coefficient (B) and Matthews correlation coefficient (not shown) are maximum near N = 8, suggesting that 8 leading principal components are optimal. (B) The blue line corresponds to correlating the prediction to the binarized known FDR values (ON when FDR < 0.25) and the red line to correlation with the FDR values converted to a continuous scale (using 1-FDR value)
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4325538&req=5

btu569-F9: Optimizing the number of principal components used for prediction. The classifier used was a naive Bayes classifier, applied to 25 of the 26 known stimuli (A set) to predict the 26th. All metrics: area under the ROC curve (A), Pearson correlation coefficient (B) and Matthews correlation coefficient (not shown) are maximum near N = 8, suggesting that 8 leading principal components are optimal. (B) The blue line corresponds to correlating the prediction to the binarized known FDR values (ON when FDR < 0.25) and the red line to correlation with the FDR values converted to a continuous scale (using 1-FDR value)
Mentions: An important free parameter in the above algorithm is the number of principal components used in the prediction. To optimize this, we applied the algorithm to the 26 known stimuli (set A) by predicting one of the stimuli from training on the remaining 25. The output of the classifier was then compared with the actual measurement. Various metrics such as the area under the receiver-operating characteristic (ROC) curve (Davis and Goadrich, 2006; Fawcett, 2006), Pearson correlation coefficient and Matthews correlation coefficient were used to quantify the performance of the classifier (see Fig. 9). The classifier was optimized when eight leading principal components were used. We also repeated the analysis using a linear discriminant analysis algorithm (Methods); the performance did not improve significantly. For the final prediction, we used N = 8 and applied a leave-one-out procedure to the naive Bayes classifier.Fig. 9.

Bottom Line: Translating findings in rodent models to human models has been a cornerstone of modern biology and drug development.However, in many cases, a naive 'extrapolation' between the two species has not succeeded.In spite of this difference, we were able to develop a robust algorithm to predict gene set activation in NHBE with high accuracy using simple analytical methods.

View Article: PubMed Central - PubMed

Affiliation: Kavli Institute for Theoretical Physics, Kohn Hall, University of California, Santa Barbara, CA 93106, USA, Department of Physics, Department of Molecular Biology and Biochemistry, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA, Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, 9700 AK Groningen, The Netherlands and IBM T.J. Watson Research Center, Computational Biology, Yorktown Heights, NY 10003, USA.

Show MeSH
Related in: MedlinePlus