Limits...
Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.

Costa IG, Krause R, Opitz L, Schliep A - BMC Bioinformatics (2007)

Bottom Line: We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data.Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. ivan.filho@molgen.mpg.de

ABSTRACT

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.

Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.

Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

Show MeSH
Obtaining constraints from images. We depict the time course expression (top) and registered in situ images (middle) of genes twi, G12177, Ef2 and RhoGAP71E, which indicate their temporal and spatial expression patterns. From left to right, the embryo images are categorized into the time periods 0–3 h, 3–6 h, 6–9 h, 9–12 h, 12–15 h and 15–18 h. The microarray expression displays a similar expression pattern with maximal expression after 3 hours for all genes but weakly diverging at later time points. The in situ images indicates that twi and CG12177 have syn-expression at time periods 3–6, 6–9 and 9–12; while Ef2 and RhoGAP71E at periods 0–3, 3–6, 6–9, 9–12 and 15–18. At the bottom, we display how positive constraints are derived from in situ hybridization patterns. A heat-map displays the correlation coefficients between all pairs of in situ images of the corresponding time period (red values indicate positive correlations). After thresholding the correlation matrices, a constraint matrix for each time period is obtained. For example, constraint matrices from periods 3–6 and 6–9 indicates syn-expression of pairs (twi, CG1217) and (Ef2, RhoGAP71E), while the constraint matrix from period 9–12 also indicate that (CG1217, RhoGAP71E) are syn-expressed. The matrices are combined into one constraining genes that display syn-expression in at least three periods, as indicated in the matrix at the bottom.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2230504&req=5

Figure 2: Obtaining constraints from images. We depict the time course expression (top) and registered in situ images (middle) of genes twi, G12177, Ef2 and RhoGAP71E, which indicate their temporal and spatial expression patterns. From left to right, the embryo images are categorized into the time periods 0–3 h, 3–6 h, 6–9 h, 9–12 h, 12–15 h and 15–18 h. The microarray expression displays a similar expression pattern with maximal expression after 3 hours for all genes but weakly diverging at later time points. The in situ images indicates that twi and CG12177 have syn-expression at time periods 3–6, 6–9 and 9–12; while Ef2 and RhoGAP71E at periods 0–3, 3–6, 6–9, 9–12 and 15–18. At the bottom, we display how positive constraints are derived from in situ hybridization patterns. A heat-map displays the correlation coefficients between all pairs of in situ images of the corresponding time period (red values indicate positive correlations). After thresholding the correlation matrices, a constraint matrix for each time period is obtained. For example, constraint matrices from periods 3–6 and 6–9 indicates syn-expression of pairs (twi, CG1217) and (Ef2, RhoGAP71E), while the constraint matrix from period 9–12 also indicate that (CG1217, RhoGAP71E) are syn-expressed. The matrices are combined into one constraining genes that display syn-expression in at least three periods, as indicated in the matrix at the bottom.

Mentions: We obtain clusters of syn-expressed genes during the development of Drosophila. We propose to automatically infer positive constraints (spatial co-expression) and negative constraints (expression in distinct tissues) from the in situ image data and use them in a mixture model for the complementary, higher quality, DNA microarray time-course data as shown in Fig. 2.


Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.

Costa IG, Krause R, Opitz L, Schliep A - BMC Bioinformatics (2007)

Obtaining constraints from images. We depict the time course expression (top) and registered in situ images (middle) of genes twi, G12177, Ef2 and RhoGAP71E, which indicate their temporal and spatial expression patterns. From left to right, the embryo images are categorized into the time periods 0–3 h, 3–6 h, 6–9 h, 9–12 h, 12–15 h and 15–18 h. The microarray expression displays a similar expression pattern with maximal expression after 3 hours for all genes but weakly diverging at later time points. The in situ images indicates that twi and CG12177 have syn-expression at time periods 3–6, 6–9 and 9–12; while Ef2 and RhoGAP71E at periods 0–3, 3–6, 6–9, 9–12 and 15–18. At the bottom, we display how positive constraints are derived from in situ hybridization patterns. A heat-map displays the correlation coefficients between all pairs of in situ images of the corresponding time period (red values indicate positive correlations). After thresholding the correlation matrices, a constraint matrix for each time period is obtained. For example, constraint matrices from periods 3–6 and 6–9 indicates syn-expression of pairs (twi, CG1217) and (Ef2, RhoGAP71E), while the constraint matrix from period 9–12 also indicate that (CG1217, RhoGAP71E) are syn-expressed. The matrices are combined into one constraining genes that display syn-expression in at least three periods, as indicated in the matrix at the bottom.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2230504&req=5

Figure 2: Obtaining constraints from images. We depict the time course expression (top) and registered in situ images (middle) of genes twi, G12177, Ef2 and RhoGAP71E, which indicate their temporal and spatial expression patterns. From left to right, the embryo images are categorized into the time periods 0–3 h, 3–6 h, 6–9 h, 9–12 h, 12–15 h and 15–18 h. The microarray expression displays a similar expression pattern with maximal expression after 3 hours for all genes but weakly diverging at later time points. The in situ images indicates that twi and CG12177 have syn-expression at time periods 3–6, 6–9 and 9–12; while Ef2 and RhoGAP71E at periods 0–3, 3–6, 6–9, 9–12 and 15–18. At the bottom, we display how positive constraints are derived from in situ hybridization patterns. A heat-map displays the correlation coefficients between all pairs of in situ images of the corresponding time period (red values indicate positive correlations). After thresholding the correlation matrices, a constraint matrix for each time period is obtained. For example, constraint matrices from periods 3–6 and 6–9 indicates syn-expression of pairs (twi, CG1217) and (Ef2, RhoGAP71E), while the constraint matrix from period 9–12 also indicate that (CG1217, RhoGAP71E) are syn-expressed. The matrices are combined into one constraining genes that display syn-expression in at least three periods, as indicated in the matrix at the bottom.
Mentions: We obtain clusters of syn-expressed genes during the development of Drosophila. We propose to automatically infer positive constraints (spatial co-expression) and negative constraints (expression in distinct tissues) from the in situ image data and use them in a mixture model for the complementary, higher quality, DNA microarray time-course data as shown in Fig. 2.

Bottom Line: We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data.Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. ivan.filho@molgen.mpg.de

ABSTRACT

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.

Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.

Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

Show MeSH