Limits...
Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.

Costa IG, Krause R, Opitz L, Schliep A - BMC Bioinformatics (2007)

Bottom Line: We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data.Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. ivan.filho@molgen.mpg.de

ABSTRACT

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.

Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.

Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

Show MeSH
Clustering result: Mixture of Gaussians. The similarity of overall patterns in the clustering result of the MoG is explained by the developmental stages investigated. The major phenomena are depletion of maternal mRNA (maternal genes) and start of the embryonic transcriptional machinery during embryogenesis at time point 3 hours (zigotically expressed genes). In the clusters with zigotically expressed genes, we observe two main periods of activation: 3–4 hours for cluster U1 to U5, and 7–8 h for clusters U8 to U11. In the clusters with maternal genes, we observe under-expression of genes at several time periods: 3–4 h in clusters U21 to U28; 4–5 h for clusters U17 to U20; 6–7 h for cluster U16; 7–8 h for clusters U12 and U13; and 9–10 h for cluster U15.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2230504&req=5

Figure 3: Clustering result: Mixture of Gaussians. The similarity of overall patterns in the clustering result of the MoG is explained by the developmental stages investigated. The major phenomena are depletion of maternal mRNA (maternal genes) and start of the embryonic transcriptional machinery during embryogenesis at time point 3 hours (zigotically expressed genes). In the clusters with zigotically expressed genes, we observe two main periods of activation: 3–4 hours for cluster U1 to U5, and 7–8 h for clusters U8 to U11. In the clusters with maternal genes, we observe under-expression of genes at several time periods: 3–4 h in clusters U21 to U28; 4–5 h for clusters U17 to U20; 6–7 h for cluster U16; 7–8 h for clusters U12 and U13; and 9–10 h for cluster U15.

Mentions: The gene expression time-courses cover the period from 1 to 12 hours of the embryo development and expression values are given as log-ratios (See Section Data for details). Overall, our clustering results reflect two typical classes (see Fig. 3), the maternal and zygotic transcripts [33]. Maternal genes appear strongly expressed in the first three hours, usually followed by a decline. The clusters 18 to 28 clearly follow a maternal pattern. These transcripts are deposited in the oocyte; typically the embryo does not transcribe these genes in early development. They are responsible for the determination of body axes and the first phases of the cell cycle and other functions. The period from 2 to 3 hours coincides with the cellularization and the formation of three germ layers following gastrulation, when primary tissues start to develop [34]. Conversely, genes actively transcribed in the embryo are not expressed in the early time points and expression rises to significant levels only in later stages (3 hours and later). Many of these genes are important to organogenesis. Transcripts in the clusters 1 to 4 and 8 to 11 follow the pattern of embryonic activation unambiguously. The functional association can be observed in the overrepresented Gene Ontology terms (see Supplementary Material [35]). For other clusters, shapes cannot be matched to such simple schemes. Several have maximal expression in the midst of embryonic development. Note that the clusters that show varying levels are less populated than the ones in the maternal and in the activated class.


Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data.

Costa IG, Krause R, Opitz L, Schliep A - BMC Bioinformatics (2007)

Clustering result: Mixture of Gaussians. The similarity of overall patterns in the clustering result of the MoG is explained by the developmental stages investigated. The major phenomena are depletion of maternal mRNA (maternal genes) and start of the embryonic transcriptional machinery during embryogenesis at time point 3 hours (zigotically expressed genes). In the clusters with zigotically expressed genes, we observe two main periods of activation: 3–4 hours for cluster U1 to U5, and 7–8 h for clusters U8 to U11. In the clusters with maternal genes, we observe under-expression of genes at several time periods: 3–4 h in clusters U21 to U28; 4–5 h for clusters U17 to U20; 6–7 h for cluster U16; 7–8 h for clusters U12 and U13; and 9–10 h for cluster U15.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2230504&req=5

Figure 3: Clustering result: Mixture of Gaussians. The similarity of overall patterns in the clustering result of the MoG is explained by the developmental stages investigated. The major phenomena are depletion of maternal mRNA (maternal genes) and start of the embryonic transcriptional machinery during embryogenesis at time point 3 hours (zigotically expressed genes). In the clusters with zigotically expressed genes, we observe two main periods of activation: 3–4 hours for cluster U1 to U5, and 7–8 h for clusters U8 to U11. In the clusters with maternal genes, we observe under-expression of genes at several time periods: 3–4 h in clusters U21 to U28; 4–5 h for clusters U17 to U20; 6–7 h for cluster U16; 7–8 h for clusters U12 and U13; and 9–10 h for cluster U15.
Mentions: The gene expression time-courses cover the period from 1 to 12 hours of the embryo development and expression values are given as log-ratios (See Section Data for details). Overall, our clustering results reflect two typical classes (see Fig. 3), the maternal and zygotic transcripts [33]. Maternal genes appear strongly expressed in the first three hours, usually followed by a decline. The clusters 18 to 28 clearly follow a maternal pattern. These transcripts are deposited in the oocyte; typically the embryo does not transcribe these genes in early development. They are responsible for the determination of body axes and the first phases of the cell cycle and other functions. The period from 2 to 3 hours coincides with the cellularization and the formation of three germ layers following gastrulation, when primary tissues start to develop [34]. Conversely, genes actively transcribed in the embryo are not expressed in the early time points and expression rises to significant levels only in later stages (3 hours and later). Many of these genes are important to organogenesis. Transcripts in the clusters 1 to 4 and 8 to 11 follow the pattern of embryonic activation unambiguously. The functional association can be observed in the overrepresented Gene Ontology terms (see Supplementary Material [35]). For other clusters, shapes cannot be matched to such simple schemes. Several have maximal expression in the midst of embryonic development. Note that the clusters that show varying levels are less populated than the ones in the maternal and in the activated class.

Bottom Line: We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data.Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany. ivan.filho@molgen.mpg.de

ABSTRACT

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.

Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.

Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.

Show MeSH