Limits...
Automatic annotation of spatial expression patterns via sparse Bayesian factor models.

Pruteanu-Malinici I, Mace DL, Ohler U - PLoS Comput. Biol. (2011)

Bottom Line: We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions.On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features.Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America.

ABSTRACT
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D-4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

Show MeSH

Related in: MedlinePlus

SMLR analysis on the estimated sBFA factors on data set , for two randomly selected annotation terms.The top row shows the SMLR mixing weights on the factors, for a regularization parameter ; the x-axis represents the FA factors: the first  factors for a grid size of ×, the next  factors for a grid size of × and the last  factors for a grid size of ×. The bottom row contains histograms with the number of factors selected as relevant over  LOO-CV trials, with a cut-off value at . Each feature appears once in the graph. The more mass concentrated at the two ends, the more consistent the classifier is in identifying relevant factors.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3140966&req=5

pcbi-1002098-g007: SMLR analysis on the estimated sBFA factors on data set , for two randomly selected annotation terms.The top row shows the SMLR mixing weights on the factors, for a regularization parameter ; the x-axis represents the FA factors: the first factors for a grid size of ×, the next factors for a grid size of × and the last factors for a grid size of ×. The bottom row contains histograms with the number of factors selected as relevant over LOO-CV trials, with a cut-off value at . Each feature appears once in the graph. The more mass concentrated at the two ends, the more consistent the classifier is in identifying relevant factors.

Mentions: We started with data set , which contained 1,231 genes annotated with a total of terms, and the SMLR classifier, which allows one to assess the importance of features for a classification task by the weights assigned to each feature. We first analyzed the SMLR weights on the entire set of features (three different resolutions with corresponding number of factors of , and leading to a combined factors), and examined the number of times factors were selected as relevant by the SMLR algorithm during leave-one-out cross-validation (LOO-CV). During cross-validation, all images corresponding to a single gene were left out and the model was trained on the remaining set of images. A few common factors were not selected as relevant by any annotation term model, which confirmed our initial belief that some factors were uninformative for at least some annotations. In addition, there is strong consistency in factor selection, and most factors are either always or never included. Figure 7 shows the mixing weights on the factors for two randomly selected annotation terms, as well as a histogram of the number of times each factor is selected as relevant over the entire set of trials, with a cut-off value for feature selection at . Specifically, for the ‘amnioserosa anlage in statu nascendi’ annotation term, factors were never selected while were always selected.


Automatic annotation of spatial expression patterns via sparse Bayesian factor models.

Pruteanu-Malinici I, Mace DL, Ohler U - PLoS Comput. Biol. (2011)

SMLR analysis on the estimated sBFA factors on data set , for two randomly selected annotation terms.The top row shows the SMLR mixing weights on the factors, for a regularization parameter ; the x-axis represents the FA factors: the first  factors for a grid size of ×, the next  factors for a grid size of × and the last  factors for a grid size of ×. The bottom row contains histograms with the number of factors selected as relevant over  LOO-CV trials, with a cut-off value at . Each feature appears once in the graph. The more mass concentrated at the two ends, the more consistent the classifier is in identifying relevant factors.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3140966&req=5

pcbi-1002098-g007: SMLR analysis on the estimated sBFA factors on data set , for two randomly selected annotation terms.The top row shows the SMLR mixing weights on the factors, for a regularization parameter ; the x-axis represents the FA factors: the first factors for a grid size of ×, the next factors for a grid size of × and the last factors for a grid size of ×. The bottom row contains histograms with the number of factors selected as relevant over LOO-CV trials, with a cut-off value at . Each feature appears once in the graph. The more mass concentrated at the two ends, the more consistent the classifier is in identifying relevant factors.
Mentions: We started with data set , which contained 1,231 genes annotated with a total of terms, and the SMLR classifier, which allows one to assess the importance of features for a classification task by the weights assigned to each feature. We first analyzed the SMLR weights on the entire set of features (three different resolutions with corresponding number of factors of , and leading to a combined factors), and examined the number of times factors were selected as relevant by the SMLR algorithm during leave-one-out cross-validation (LOO-CV). During cross-validation, all images corresponding to a single gene were left out and the model was trained on the remaining set of images. A few common factors were not selected as relevant by any annotation term model, which confirmed our initial belief that some factors were uninformative for at least some annotations. In addition, there is strong consistency in factor selection, and most factors are either always or never included. Figure 7 shows the mixing weights on the factors for two randomly selected annotation terms, as well as a histogram of the number of times each factor is selected as relevant over the entire set of trials, with a cut-off value for feature selection at . Specifically, for the ‘amnioserosa anlage in statu nascendi’ annotation term, factors were never selected while were always selected.

Bottom Line: We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions.On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features.Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

View Article: PubMed Central - PubMed

Affiliation: Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America.

ABSTRACT
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D-4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.

Show MeSH
Related in: MedlinePlus