Limits...
Learning pair-wise gene functional similarity by multiplex gene expression maps.

An L, Ling H, Obradovic Z, Smith DJ, Megalooikonomou V - BMC Bioinformatics (2012)

Bottom Line: We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain.This work is very important for predicting functions of unknown genes.It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Data Engineering Laboratory, Department of Computer and Information Sciences, Temple University, PA, USA. anli@temple.edu

ABSTRACT

Background: The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute.

Results: Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction.

Conclusions: By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.

Show MeSH

Related in: MedlinePlus

Cumulated weight of selected features on the restricted subset - Molecular Function. 1 - 42: wavelet features; 43 - 110: original voxels; 111: the correlation coefficient; 112: the p-value of the correlation coefficients; 113: the Euclidean distance between pair-wise gene maps; 114 - 284: average of neighbouring voxels; 285 - 455: absolute value of difference between neighbouring voxels.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3375633&req=5

Figure 11: Cumulated weight of selected features on the restricted subset - Molecular Function. 1 - 42: wavelet features; 43 - 110: original voxels; 111: the correlation coefficient; 112: the p-value of the correlation coefficients; 113: the Euclidean distance between pair-wise gene maps; 114 - 284: average of neighbouring voxels; 285 - 455: absolute value of difference between neighbouring voxels.

Mentions: There were a total of 455 features for each sample in the experiment. For the weak classifier, we chose the value 20 for the number of bins. For the AdaBoost algorithm, we performed 5000 iterations to reach the best performance of the prediction. Using these settings, we achieved the minimum error on training and control data with respect to all three different gene ontologies (Table 1). The results show that the accuracy of predicting gene functional similarities is better on the restricted subset. Figures 10, 11, 12 show the cumulated weight of selected features of Cellular Component, Molecular Function, and Biological Process respectively. The histogram bar of a certain feature is the sum of the weights of the feature which are selected during the 5000 iterations. Additional file 1 shows the top selected original voxels and the features extracted from neighbouring voxels. Wavelet features were not among the top 10 selected significant features.


Learning pair-wise gene functional similarity by multiplex gene expression maps.

An L, Ling H, Obradovic Z, Smith DJ, Megalooikonomou V - BMC Bioinformatics (2012)

Cumulated weight of selected features on the restricted subset - Molecular Function. 1 - 42: wavelet features; 43 - 110: original voxels; 111: the correlation coefficient; 112: the p-value of the correlation coefficients; 113: the Euclidean distance between pair-wise gene maps; 114 - 284: average of neighbouring voxels; 285 - 455: absolute value of difference between neighbouring voxels.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3375633&req=5

Figure 11: Cumulated weight of selected features on the restricted subset - Molecular Function. 1 - 42: wavelet features; 43 - 110: original voxels; 111: the correlation coefficient; 112: the p-value of the correlation coefficients; 113: the Euclidean distance between pair-wise gene maps; 114 - 284: average of neighbouring voxels; 285 - 455: absolute value of difference between neighbouring voxels.
Mentions: There were a total of 455 features for each sample in the experiment. For the weak classifier, we chose the value 20 for the number of bins. For the AdaBoost algorithm, we performed 5000 iterations to reach the best performance of the prediction. Using these settings, we achieved the minimum error on training and control data with respect to all three different gene ontologies (Table 1). The results show that the accuracy of predicting gene functional similarities is better on the restricted subset. Figures 10, 11, 12 show the cumulated weight of selected features of Cellular Component, Molecular Function, and Biological Process respectively. The histogram bar of a certain feature is the sum of the weights of the feature which are selected during the 5000 iterations. Additional file 1 shows the top selected original voxels and the features extracted from neighbouring voxels. Wavelet features were not among the top 10 selected significant features.

Bottom Line: We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain.This work is very important for predicting functions of unknown genes.It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.

View Article: PubMed Central - HTML - PubMed

Affiliation: Data Engineering Laboratory, Department of Computer and Information Sciences, Temple University, PA, USA. anli@temple.edu

ABSTRACT

Background: The relationships between the gene functional similarity and gene expression profile, and between gene function annotation and gene sequence have been studied extensively. However, not much work has considered the connection between gene functions and location of a gene's expression in the mammalian tissues. On the other hand, although unsupervised learning methods have been commonly used in functional genomics, supervised learning cannot be directly applied to a set of normal genes without having a target (class) attribute.

Results: Here, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps that provide information about the location of gene expression. The features are extracted from expression maps and the labels denote the functional similarities of pairs of genes. We make use of wavelet features, original expression values, difference and average values of neighboring voxels and other features to perform boosting analysis. The experimental results show that with increasing similarities of gene expression maps, the functional similarities are increased too. The model predicts the functional similarities between genes to a certain degree. The weights of the features in the model indicate the features that are more significant for this prediction.

Conclusions: By considering pairs of genes, we propose a supervised learning methodology to predict pair-wise gene functional similarity from multiplex gene expression maps. We also explore the relationship between similarities of gene maps and gene functions. By using AdaBoost coupled with our proposed weak classifier we analyze a large-scale gene expression dataset and predict gene functional similarities. We also detect the most significant single voxels and pairs of neighboring voxels and visualize them in the expression map image of a mouse brain. This work is very important for predicting functions of unknown genes. It also has broader applicability since the methodology can be applied to analyze any large-scale dataset without a target attribute and is not restricted to gene expressions.

Show MeSH
Related in: MedlinePlus