Limits...
Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.

Costa IG, Krause R, Opitz L, Schliep A - BMC Bioinformatics (2007)

Bottom Line: We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data.Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. ivan.filho@molgen.mpg.de

ABSTRACT

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.

Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.

Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

Show MeSH
Proportion of ImaGO term enrichment. For each threshold τ (x-axis), we depict the proportion of ImaGO terms for which we observer a smaller p-value in cMoG than in MoG (y-axis). The threshold τ discards ImaGO terms, where the difference in the log of the p-value of cMoG and MoG in smaller then τ. As can be observed, the proportion is higher then 0.5 for all τ values, which indicates an advantage of cMoG. Furthermore, the proportion has an increasing tendency for higher τ values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2230504&req=5

Figure 6: Proportion of ImaGO term enrichment. For each threshold τ (x-axis), we depict the proportion of ImaGO terms for which we observer a smaller p-value in cMoG than in MoG (y-axis). The threshold τ discards ImaGO terms, where the difference in the log of the p-value of cMoG and MoG in smaller then τ. As can be observed, the proportion is higher then 0.5 for all τ values, which indicates an advantage of cMoG. Furthermore, the proportion has an increasing tendency for higher τ values.

Mentions: Another helpful analysis is the comparison of enrichment of in situ image annotations (ImaGO), as described in Section Evaluation (see [35] for complete results). We display in Fig. 5 a scatter plot with the p-values of all ImaGO terms, which had an enrichment p-value below 0.01 in one either cMoG or MoG clusters. In summary, cMoG has a higher enrichment in 67 out of 112 relevant ImaGO terms. A binomial test for testing the event of having 67 successes in 112 trials is rejected with a p-value of 0.0232, which indicates that the counts of ImaGO terms with higher enrichment for cMoG is significantly higher than expected by chance. Furthermore, if we take only ImaGO terms with a higher enrichment gain for one of the methods into account (points distant from the diagonal line in Fig. 5), the advantage of cMoG is even greater (see Fig. 6 and Fig. 7). This indicates that even without direct use of the annotation information from ImaGO, cMoG has a greater sensitivity in grouping syn-expressed genes.


Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.

Costa IG, Krause R, Opitz L, Schliep A - BMC Bioinformatics (2007)

Proportion of ImaGO term enrichment. For each threshold τ (x-axis), we depict the proportion of ImaGO terms for which we observer a smaller p-value in cMoG than in MoG (y-axis). The threshold τ discards ImaGO terms, where the difference in the log of the p-value of cMoG and MoG in smaller then τ. As can be observed, the proportion is higher then 0.5 for all τ values, which indicates an advantage of cMoG. Furthermore, the proportion has an increasing tendency for higher τ values.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2230504&req=5

Figure 6: Proportion of ImaGO term enrichment. For each threshold τ (x-axis), we depict the proportion of ImaGO terms for which we observer a smaller p-value in cMoG than in MoG (y-axis). The threshold τ discards ImaGO terms, where the difference in the log of the p-value of cMoG and MoG in smaller then τ. As can be observed, the proportion is higher then 0.5 for all τ values, which indicates an advantage of cMoG. Furthermore, the proportion has an increasing tendency for higher τ values.
Mentions: Another helpful analysis is the comparison of enrichment of in situ image annotations (ImaGO), as described in Section Evaluation (see [35] for complete results). We display in Fig. 5 a scatter plot with the p-values of all ImaGO terms, which had an enrichment p-value below 0.01 in one either cMoG or MoG clusters. In summary, cMoG has a higher enrichment in 67 out of 112 relevant ImaGO terms. A binomial test for testing the event of having 67 successes in 112 trials is rejected with a p-value of 0.0232, which indicates that the counts of ImaGO terms with higher enrichment for cMoG is significantly higher than expected by chance. Furthermore, if we take only ImaGO terms with a higher enrichment gain for one of the methods into account (points distant from the diagonal line in Fig. 5), the advantage of cMoG is even greater (see Fig. 6 and Fig. 7). This indicates that even without direct use of the annotation information from ImaGO, cMoG has a greater sensitivity in grouping syn-expressed genes.

Bottom Line: We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data.Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. ivan.filho@molgen.mpg.de

ABSTRACT

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.

Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.

Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

Show MeSH