Limits...
Inter-species inference of gene set enrichment in lung epithelial cells from proteomic and large transcriptomic datasets.

Hormoz S, Bhanot G, Biehl M, Bilal E, Meyer P, Norel R, Rhrissorrakrai K, Dayarian A - Bioinformatics (2014)

Bottom Line: Translating findings in rodent models to human models has been a cornerstone of modern biology and drug development.However, in many cases, a naive 'extrapolation' between the two species has not succeeded.In spite of this difference, we were able to develop a robust algorithm to predict gene set activation in NHBE with high accuracy using simple analytical methods.

View Article: PubMed Central - PubMed

Affiliation: Kavli Institute for Theoretical Physics, Kohn Hall, University of California, Santa Barbara, CA 93106, USA, Department of Physics, Department of Molecular Biology and Biochemistry, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA, Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, 9700 AK Groningen, The Netherlands and IBM T.J. Watson Research Center, Computational Biology, Yorktown Heights, NY 10003, USA.

Show MeSH

Related in: MedlinePlus

Schematic of the classification protocol. (Top) Training the algorithm to predict gene set g in human under stimulus s. First, represent the rat NES scores under the 26 stimuli of set A as 26 points in N-dimensional space, where N is the number of principal components used. The figure shows a diagram for N = 2. We used N = 8 for the actual prediction. Label each of the 26 points as either 0 (off) or 1 (on) based on the human FDR value of gene set g under the same 26 stimuli. Next, identify the hyperplane that best separates the two types of labels. (Bottom) Predicting gene set g under stimulus s of set B. Introduce a new point corresponding to the reduced representation of rat NES score under stimulus s. Depending on which side of the hyperplane the point falls on and its separation distance from the plane, classify as either 0 and 1 and associate a statistical significance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4325538&req=5

btu569-F8: Schematic of the classification protocol. (Top) Training the algorithm to predict gene set g in human under stimulus s. First, represent the rat NES scores under the 26 stimuli of set A as 26 points in N-dimensional space, where N is the number of principal components used. The figure shows a diagram for N = 2. We used N = 8 for the actual prediction. Label each of the 26 points as either 0 (off) or 1 (on) based on the human FDR value of gene set g under the same 26 stimuli. Next, identify the hyperplane that best separates the two types of labels. (Bottom) Predicting gene set g under stimulus s of set B. Introduce a new point corresponding to the reduced representation of rat NES score under stimulus s. Depending on which side of the hyperplane the point falls on and its separation distance from the plane, classify as either 0 and 1 and associate a statistical significance

Mentions: Figure 8 is a diagram for the classifier algorithm. The rat NES scores in sets A and B were combined into a 246 by 52 dimensional matrix, corresponding to the 246 gene sets and the 52 stimuli in sets A and B. PCA was performed to find linear combinations of gene sets that exhibit the largest variation over the 52 stimuli (Fig. 3). For the classification, the N leading principal components (which are linear combination of gene sets) were used. To predict gene set g under stimulus s (from set B) in human, the following protocol was used. After PCA, the rat NES A data were reduced to 26 points in an N-dimensional space. To each point, we associated the label 1 if gene set g was on in human (FDR < 0.25) and the label 0 otherwise. The naive Bayes classifier was used to find a hyperplane that separated the 0 s from the 1 s. In general, an error-free linear separation cannot be achieved. To make the prediction, the rat NES score under stimulus s was expressed in terms of the principal components and added as new point into the N-dimensional space. Depending on which side of the hyperplane the point falls on, a classification of 0 or 1 was assigned.Fig. 8.


Inter-species inference of gene set enrichment in lung epithelial cells from proteomic and large transcriptomic datasets.

Hormoz S, Bhanot G, Biehl M, Bilal E, Meyer P, Norel R, Rhrissorrakrai K, Dayarian A - Bioinformatics (2014)

Schematic of the classification protocol. (Top) Training the algorithm to predict gene set g in human under stimulus s. First, represent the rat NES scores under the 26 stimuli of set A as 26 points in N-dimensional space, where N is the number of principal components used. The figure shows a diagram for N = 2. We used N = 8 for the actual prediction. Label each of the 26 points as either 0 (off) or 1 (on) based on the human FDR value of gene set g under the same 26 stimuli. Next, identify the hyperplane that best separates the two types of labels. (Bottom) Predicting gene set g under stimulus s of set B. Introduce a new point corresponding to the reduced representation of rat NES score under stimulus s. Depending on which side of the hyperplane the point falls on and its separation distance from the plane, classify as either 0 and 1 and associate a statistical significance
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4325538&req=5

btu569-F8: Schematic of the classification protocol. (Top) Training the algorithm to predict gene set g in human under stimulus s. First, represent the rat NES scores under the 26 stimuli of set A as 26 points in N-dimensional space, where N is the number of principal components used. The figure shows a diagram for N = 2. We used N = 8 for the actual prediction. Label each of the 26 points as either 0 (off) or 1 (on) based on the human FDR value of gene set g under the same 26 stimuli. Next, identify the hyperplane that best separates the two types of labels. (Bottom) Predicting gene set g under stimulus s of set B. Introduce a new point corresponding to the reduced representation of rat NES score under stimulus s. Depending on which side of the hyperplane the point falls on and its separation distance from the plane, classify as either 0 and 1 and associate a statistical significance
Mentions: Figure 8 is a diagram for the classifier algorithm. The rat NES scores in sets A and B were combined into a 246 by 52 dimensional matrix, corresponding to the 246 gene sets and the 52 stimuli in sets A and B. PCA was performed to find linear combinations of gene sets that exhibit the largest variation over the 52 stimuli (Fig. 3). For the classification, the N leading principal components (which are linear combination of gene sets) were used. To predict gene set g under stimulus s (from set B) in human, the following protocol was used. After PCA, the rat NES A data were reduced to 26 points in an N-dimensional space. To each point, we associated the label 1 if gene set g was on in human (FDR < 0.25) and the label 0 otherwise. The naive Bayes classifier was used to find a hyperplane that separated the 0 s from the 1 s. In general, an error-free linear separation cannot be achieved. To make the prediction, the rat NES score under stimulus s was expressed in terms of the principal components and added as new point into the N-dimensional space. Depending on which side of the hyperplane the point falls on, a classification of 0 or 1 was assigned.Fig. 8.

Bottom Line: Translating findings in rodent models to human models has been a cornerstone of modern biology and drug development.However, in many cases, a naive 'extrapolation' between the two species has not succeeded.In spite of this difference, we were able to develop a robust algorithm to predict gene set activation in NHBE with high accuracy using simple analytical methods.

View Article: PubMed Central - PubMed

Affiliation: Kavli Institute for Theoretical Physics, Kohn Hall, University of California, Santa Barbara, CA 93106, USA, Department of Physics, Department of Molecular Biology and Biochemistry, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA, Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen, 9700 AK Groningen, The Netherlands and IBM T.J. Watson Research Center, Computational Biology, Yorktown Heights, NY 10003, USA.

Show MeSH
Related in: MedlinePlus