Limits...
Adaptive informatics for multifactorial and high-content biological data.

Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK - Nat. Methods (2011)

Bottom Line: Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types.We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy.We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

View Article: PubMed Central - PubMed

Affiliation: Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

ABSTRACT
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Show MeSH

Related in: MedlinePlus

Annotated and simplified screen shots from ImageRail software (also see Supplementary Note 2). (a) (1) General experiment metadata and (2) computable information derived from image analysis across perturbations and measurements are associated with (3) selected wells of a microtiter plate. (4) White document icons represent the number of image fields that have single-cell data stored in the HDF5 file available for analysis, and (5) numbers represent imaged fields and wavelengths. (b) Dynamic linking of extracted data to the source images shows which cells gave rise to which measurements and is implemented using an image viewer and scatter plot (red box in c). (c) Data visualization includes single-cell scatter plots with flow cytometry-style gating (left) and plate heat maps of population averages along with a representation of the underlying single-cell distributions (right). (d) Results of image segmentation can be stored in different ways, including centroid, outline and bounding box. Scale bar = 100µm.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3105758&req=5

Figure 3: Annotated and simplified screen shots from ImageRail software (also see Supplementary Note 2). (a) (1) General experiment metadata and (2) computable information derived from image analysis across perturbations and measurements are associated with (3) selected wells of a microtiter plate. (4) White document icons represent the number of image fields that have single-cell data stored in the HDF5 file available for analysis, and (5) numbers represent imaged fields and wavelengths. (b) Dynamic linking of extracted data to the source images shows which cells gave rise to which measurements and is implemented using an image viewer and scatter plot (red box in c). (c) Data visualization includes single-cell scatter plots with flow cytometry-style gating (left) and plate heat maps of population averages along with a representation of the underlying single-cell distributions (right). (d) Results of image segmentation can be stored in different ways, including centroid, outline and bounding box. Scale bar = 100µm.

Mentions: ImageRail is a standalone program for high-throughput image analysis that creates and manipulates SDCubes and serves as a test of the concepts outlined above. ImageRail has four software components. First, formatting tools create and modify SDCubes so that the Children group is formatted to create a five-level data hierarchy comprising project, plate, well, (image) field and cell and (cellular) compartment (conforming to the entity-relationship model in Fig. 1b and 2e). Drop-down lists and a GUI for highlighting wells make it possible to specify which experimental conditions map to which wells, thereby specifying the experimental design and SDCube dimensionality and creating XML annotation (Fig. 3a). Second, image analysis tools create and store segmentation masks based on standard algorithms for cell monolayers, which can be extended using existing software such as ImageJ13 (Fig. 3b). Third, data viewers display raw data and computed features as images, line plots, histograms, scatter plots and multi-well plate views. Scatter plotting includes multi-dimensional gating similar to that used for analysis of flow cytometry data (Fig. 3c). Finally, embedded routines enable dynamic linking of data points to specific image features. Dynamic linking allows users to highlight cells in an image that correspond to selected data points in a scatter plot (Fig. 3b and Fig. 3c), facilitating the identification of outliers and experimental artifacts such as bubbles, tissue culture debris or edge effects (Supplementary Fig. 3). Users choose the level of detail at which to store the link between segmentation and data; at one extreme, pixel-by-pixel information can be stored, but we generally find it more useful to store either the centroid of each cell or a bounding box (Fig. 3d).


Adaptive informatics for multifactorial and high-content biological data.

Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK - Nat. Methods (2011)

Annotated and simplified screen shots from ImageRail software (also see Supplementary Note 2). (a) (1) General experiment metadata and (2) computable information derived from image analysis across perturbations and measurements are associated with (3) selected wells of a microtiter plate. (4) White document icons represent the number of image fields that have single-cell data stored in the HDF5 file available for analysis, and (5) numbers represent imaged fields and wavelengths. (b) Dynamic linking of extracted data to the source images shows which cells gave rise to which measurements and is implemented using an image viewer and scatter plot (red box in c). (c) Data visualization includes single-cell scatter plots with flow cytometry-style gating (left) and plate heat maps of population averages along with a representation of the underlying single-cell distributions (right). (d) Results of image segmentation can be stored in different ways, including centroid, outline and bounding box. Scale bar = 100µm.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3105758&req=5

Figure 3: Annotated and simplified screen shots from ImageRail software (also see Supplementary Note 2). (a) (1) General experiment metadata and (2) computable information derived from image analysis across perturbations and measurements are associated with (3) selected wells of a microtiter plate. (4) White document icons represent the number of image fields that have single-cell data stored in the HDF5 file available for analysis, and (5) numbers represent imaged fields and wavelengths. (b) Dynamic linking of extracted data to the source images shows which cells gave rise to which measurements and is implemented using an image viewer and scatter plot (red box in c). (c) Data visualization includes single-cell scatter plots with flow cytometry-style gating (left) and plate heat maps of population averages along with a representation of the underlying single-cell distributions (right). (d) Results of image segmentation can be stored in different ways, including centroid, outline and bounding box. Scale bar = 100µm.
Mentions: ImageRail is a standalone program for high-throughput image analysis that creates and manipulates SDCubes and serves as a test of the concepts outlined above. ImageRail has four software components. First, formatting tools create and modify SDCubes so that the Children group is formatted to create a five-level data hierarchy comprising project, plate, well, (image) field and cell and (cellular) compartment (conforming to the entity-relationship model in Fig. 1b and 2e). Drop-down lists and a GUI for highlighting wells make it possible to specify which experimental conditions map to which wells, thereby specifying the experimental design and SDCube dimensionality and creating XML annotation (Fig. 3a). Second, image analysis tools create and store segmentation masks based on standard algorithms for cell monolayers, which can be extended using existing software such as ImageJ13 (Fig. 3b). Third, data viewers display raw data and computed features as images, line plots, histograms, scatter plots and multi-well plate views. Scatter plotting includes multi-dimensional gating similar to that used for analysis of flow cytometry data (Fig. 3c). Finally, embedded routines enable dynamic linking of data points to specific image features. Dynamic linking allows users to highlight cells in an image that correspond to selected data points in a scatter plot (Fig. 3b and Fig. 3c), facilitating the identification of outliers and experimental artifacts such as bubbles, tissue culture debris or edge effects (Supplementary Fig. 3). Users choose the level of detail at which to store the link between segmentation and data; at one extreme, pixel-by-pixel information can be stored, but we generally find it more useful to store either the centroid of each cell or a bounding box (Fig. 3d).

Bottom Line: Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types.We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy.We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

View Article: PubMed Central - PubMed

Affiliation: Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

ABSTRACT
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Show MeSH
Related in: MedlinePlus