Limits...
Adaptive informatics for multifactorial and high-content biological data.

Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK - Nat. Methods (2011)

Bottom Line: Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types.We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy.Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes.

View Article: PubMed Central - PubMed

Affiliation: Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

ABSTRACT
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Show MeSH

Related in: MedlinePlus

SDCubes are built from a collection of linked data modules that can encode diverse experimental data with varying requirements. (a) The XML metadata maps the experimental sampling procedure onto the HDF5 data space; data from each cell are represented by colored boxes and different numbers of cells are collected for each condition. (b) The SDCube data module is composed of four HDF5 groups, each storing a different type of data. (c) The Children group in each module can contain additional data modules, generating an arbitrarily complex data tree. (d) A previously defined SDCube can be modified to append a new piece of data to the end of an existing series (orange), insert data into the middle of a series (blue) or add a new type of data that requires addition of a new dimension (red) (in this case, use of lapatinib rather than gefitinib). All three operations are performed by modifying the XML file while recording the data in the appropriate place in the HDF5 file hierarchy. (e) ImageRail uses a five-level SDCube encoding high-throughput fixed-cell imaging data and progressively increasing levels of detail (project, well, field, cell and compartment).
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3105758&req=5

Figure 2: SDCubes are built from a collection of linked data modules that can encode diverse experimental data with varying requirements. (a) The XML metadata maps the experimental sampling procedure onto the HDF5 data space; data from each cell are represented by colored boxes and different numbers of cells are collected for each condition. (b) The SDCube data module is composed of four HDF5 groups, each storing a different type of data. (c) The Children group in each module can contain additional data modules, generating an arbitrarily complex data tree. (d) A previously defined SDCube can be modified to append a new piece of data to the end of an existing series (orange), insert data into the middle of a series (blue) or add a new type of data that requires addition of a new dimension (red) (in this case, use of lapatinib rather than gefitinib). All three operations are performed by modifying the XML file while recording the data in the appropriate place in the HDF5 file hierarchy. (e) ImageRail uses a five-level SDCube encoding high-throughput fixed-cell imaging data and progressively increasing levels of detail (project, well, field, cell and compartment).

Mentions: In this paper we propose a potential solution to the challenge of managing high-dimensionality biomedical data based on the use of semantically-typed data hypercubes (SDCubes) in which binary data are stored in Hierarchical Data Format 5 (HDF5; http://www.hdfgroup.org/HDF5/) and metadata and data ontologies are stored in Extensible Markup Language (XML; http://www.w3.org/standards/xml). We have created a new open-source Java library, the SDCube Programming Library (Supplementary Software 1, http://www.semanticbiology.com/software/sdcube) that can create SDCubes with appropriate dimensionality, encode the data model in a machine-readable XML ontology, and reformat SDCubes as needed when experiments change (Fig. 2a). To illustrate the use of SDCubes, we have created a second program ImageRail (Supplementary Software 2, http://www.semanticbiology.com/software/imagerail) for high-content microscopy that (i) segments images of cells grown in 96- and 384-well plates to extract features such as cell shape or nuclear fluorescence, (ii) stores experimental metadata and results of image analysis in SDCubes, (iii) computes sets of cellular features from the image (e.g., fluorescence and localization metrics), and (iv) displays metadata, images and analysis in various formats9. By using SDCubes, ImageRail is able to organize data according to the design of an experiment and its day-to-day evolution rather than an inflexible, predetermined schema. We use these tools to characterize the responses of tumor cells to therapeutic small molecules and show that the apparent IC50 for receptor inhibitors varies with ligand dose, that cell-to-cell variability is maximal as ligands and drugs approach concentrations likely to be encountered in vivo, and that variance impacts the shape of dose-response curves. Our results suggest that monitoring variance will be broadly useful in pre-clinical pharmacology. Moreover, because flow cytometry and multiplex biochemistry have similar workflows to imaging4,10, ImageRail and the SDCube Programming Library represent starting points for managing diverse experimental data.


Adaptive informatics for multifactorial and high-content biological data.

Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK - Nat. Methods (2011)

SDCubes are built from a collection of linked data modules that can encode diverse experimental data with varying requirements. (a) The XML metadata maps the experimental sampling procedure onto the HDF5 data space; data from each cell are represented by colored boxes and different numbers of cells are collected for each condition. (b) The SDCube data module is composed of four HDF5 groups, each storing a different type of data. (c) The Children group in each module can contain additional data modules, generating an arbitrarily complex data tree. (d) A previously defined SDCube can be modified to append a new piece of data to the end of an existing series (orange), insert data into the middle of a series (blue) or add a new type of data that requires addition of a new dimension (red) (in this case, use of lapatinib rather than gefitinib). All three operations are performed by modifying the XML file while recording the data in the appropriate place in the HDF5 file hierarchy. (e) ImageRail uses a five-level SDCube encoding high-throughput fixed-cell imaging data and progressively increasing levels of detail (project, well, field, cell and compartment).
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3105758&req=5

Figure 2: SDCubes are built from a collection of linked data modules that can encode diverse experimental data with varying requirements. (a) The XML metadata maps the experimental sampling procedure onto the HDF5 data space; data from each cell are represented by colored boxes and different numbers of cells are collected for each condition. (b) The SDCube data module is composed of four HDF5 groups, each storing a different type of data. (c) The Children group in each module can contain additional data modules, generating an arbitrarily complex data tree. (d) A previously defined SDCube can be modified to append a new piece of data to the end of an existing series (orange), insert data into the middle of a series (blue) or add a new type of data that requires addition of a new dimension (red) (in this case, use of lapatinib rather than gefitinib). All three operations are performed by modifying the XML file while recording the data in the appropriate place in the HDF5 file hierarchy. (e) ImageRail uses a five-level SDCube encoding high-throughput fixed-cell imaging data and progressively increasing levels of detail (project, well, field, cell and compartment).
Mentions: In this paper we propose a potential solution to the challenge of managing high-dimensionality biomedical data based on the use of semantically-typed data hypercubes (SDCubes) in which binary data are stored in Hierarchical Data Format 5 (HDF5; http://www.hdfgroup.org/HDF5/) and metadata and data ontologies are stored in Extensible Markup Language (XML; http://www.w3.org/standards/xml). We have created a new open-source Java library, the SDCube Programming Library (Supplementary Software 1, http://www.semanticbiology.com/software/sdcube) that can create SDCubes with appropriate dimensionality, encode the data model in a machine-readable XML ontology, and reformat SDCubes as needed when experiments change (Fig. 2a). To illustrate the use of SDCubes, we have created a second program ImageRail (Supplementary Software 2, http://www.semanticbiology.com/software/imagerail) for high-content microscopy that (i) segments images of cells grown in 96- and 384-well plates to extract features such as cell shape or nuclear fluorescence, (ii) stores experimental metadata and results of image analysis in SDCubes, (iii) computes sets of cellular features from the image (e.g., fluorescence and localization metrics), and (iv) displays metadata, images and analysis in various formats9. By using SDCubes, ImageRail is able to organize data according to the design of an experiment and its day-to-day evolution rather than an inflexible, predetermined schema. We use these tools to characterize the responses of tumor cells to therapeutic small molecules and show that the apparent IC50 for receptor inhibitors varies with ligand dose, that cell-to-cell variability is maximal as ligands and drugs approach concentrations likely to be encountered in vivo, and that variance impacts the shape of dose-response curves. Our results suggest that monitoring variance will be broadly useful in pre-clinical pharmacology. Moreover, because flow cytometry and multiplex biochemistry have similar workflows to imaging4,10, ImageRail and the SDCube Programming Library represent starting points for managing diverse experimental data.

Bottom Line: Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types.We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy.Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes.

View Article: PubMed Central - PubMed

Affiliation: Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

ABSTRACT
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Show MeSH
Related in: MedlinePlus