Limits...
Automatic annotation of spatial expression patterns via sparse Bayesian factor models.

Pruteanu-Malinici I, Mace DL, Ohler U - PLoS Comput. Biol. (2011)

Bottom Line: We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions.On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features.Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America.

ABSTRACT
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D-4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

Show MeSH

Related in: MedlinePlus

SMLR and SVM comparison on (A) data set  and (B) data set : the AUC of individual annotation terms from the time window of developmental stages 4–6.(A) We consider two different scenarios: using the factors corresponding to the highest resolution,  (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last  annotation terms correspond to  genes or less, too few to count for a strong statistical evaluation (shaded area). (B) We consider two different scenarios: using the factors corresponding to the highest resolution,  (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last  annotation terms correspond to  genes or less, and results are less reliable due to the stronger variance and impact of results on individual samples (shaded area).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3140966&req=5

pcbi-1002098-g008: SMLR and SVM comparison on (A) data set and (B) data set : the AUC of individual annotation terms from the time window of developmental stages 4–6.(A) We consider two different scenarios: using the factors corresponding to the highest resolution, (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last annotation terms correspond to genes or less, too few to count for a strong statistical evaluation (shaded area). (B) We consider two different scenarios: using the factors corresponding to the highest resolution, (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last annotation terms correspond to genes or less, and results are less reliable due to the stronger variance and impact of results on individual samples (shaded area).

Mentions: To evaluate the success of annotation prediction, we computed AUC values achieved by the SMLR framework on data set using LOO-CV (Figure 8A). To assess the influence of a particular classifier, we compared the SMLR results to those achieved by polynomial SVMs. The AUC value for each annotation term was computed using majority voting across all genes (see ‘Materials and Methods’). We see that on average, the annotation process reached similar performances with both classifiers, above across all terms (exception are the ‘pole cell’ and ‘ventral ectoderm anlage’ annotation terms; the ‘pole cell’ lower performance can be explained by the fact that these germline precursor cells migrate and may have little overlapping spatial expression during stage ).


Automatic annotation of spatial expression patterns via sparse Bayesian factor models.

Pruteanu-Malinici I, Mace DL, Ohler U - PLoS Comput. Biol. (2011)

SMLR and SVM comparison on (A) data set  and (B) data set : the AUC of individual annotation terms from the time window of developmental stages 4–6.(A) We consider two different scenarios: using the factors corresponding to the highest resolution,  (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last  annotation terms correspond to  genes or less, too few to count for a strong statistical evaluation (shaded area). (B) We consider two different scenarios: using the factors corresponding to the highest resolution,  (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last  annotation terms correspond to  genes or less, and results are less reliable due to the stronger variance and impact of results on individual samples (shaded area).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3140966&req=5

pcbi-1002098-g008: SMLR and SVM comparison on (A) data set and (B) data set : the AUC of individual annotation terms from the time window of developmental stages 4–6.(A) We consider two different scenarios: using the factors corresponding to the highest resolution, (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last annotation terms correspond to genes or less, too few to count for a strong statistical evaluation (shaded area). (B) We consider two different scenarios: using the factors corresponding to the highest resolution, (SVM- and SMLR-), or using the entire set of factors available (SVM- and SMLR-). The last annotation terms correspond to genes or less, and results are less reliable due to the stronger variance and impact of results on individual samples (shaded area).
Mentions: To evaluate the success of annotation prediction, we computed AUC values achieved by the SMLR framework on data set using LOO-CV (Figure 8A). To assess the influence of a particular classifier, we compared the SMLR results to those achieved by polynomial SVMs. The AUC value for each annotation term was computed using majority voting across all genes (see ‘Materials and Methods’). We see that on average, the annotation process reached similar performances with both classifiers, above across all terms (exception are the ‘pole cell’ and ‘ventral ectoderm anlage’ annotation terms; the ‘pole cell’ lower performance can be explained by the fact that these germline precursor cells migrate and may have little overlapping spatial expression during stage ).

Bottom Line: We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions.On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features.Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America.

ABSTRACT
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D-4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

Show MeSH
Related in: MedlinePlus