Limits...
Adaptive informatics for multifactorial and high-content biological data.

Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK - Nat. Methods (2011)

Bottom Line: Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types.We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy.We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

View Article: PubMed Central - PubMed

Affiliation: Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

ABSTRACT
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Show MeSH

Related in: MedlinePlus

Exploring different dimensions of a multivariate drug and ligand dose-response series using SDCubes. (a) Well-mean values are computed from single-cell data recorded from cultured SKBR3 cells exposed to exogenous EGF for 10 min over a range of concentrations and then stained with antibodies specific for ppERK. Data are plotted to show a series of conventional drug dose-response relationships at different ligand concentrations (top). Inverting the axes allows the same data to be plotted as a ligand dose-response curve at different drug doses (middle). For each mean value in either plot, the underlying single-cell distribution can be visualized as a series of dot-plots (bottom panel shows gefitinib dose-response at 1 ng/mL EGF). (b) The ppERK response surface for SKBR3 cells treated as in (a) and colored according to the degree of cell-to-cell variation; darker blue represents a high coefficient of variation. (c) Whisker plots of gefitinib IC10, IC50 and IC90 values for the inhibition of ERK phosphorylation by gefitinib in SKBR3, T47D and MCF7 cells treated for 10 min with a range of EGF concentrations.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3105758&req=5

Figure 4: Exploring different dimensions of a multivariate drug and ligand dose-response series using SDCubes. (a) Well-mean values are computed from single-cell data recorded from cultured SKBR3 cells exposed to exogenous EGF for 10 min over a range of concentrations and then stained with antibodies specific for ppERK. Data are plotted to show a series of conventional drug dose-response relationships at different ligand concentrations (top). Inverting the axes allows the same data to be plotted as a ligand dose-response curve at different drug doses (middle). For each mean value in either plot, the underlying single-cell distribution can be visualized as a series of dot-plots (bottom panel shows gefitinib dose-response at 1 ng/mL EGF). (b) The ppERK response surface for SKBR3 cells treated as in (a) and colored according to the degree of cell-to-cell variation; darker blue represents a high coefficient of variation. (c) Whisker plots of gefitinib IC10, IC50 and IC90 values for the inhibition of ERK phosphorylation by gefitinib in SKBR3, T47D and MCF7 cells treated for 10 min with a range of EGF concentrations.

Mentions: Here, we exposed cells to EGF at 10 doses over a 104 range in combination with gefitinib at 8 doses over a 103 range using a simple adaptive design in which each 96-well plate was subjected to a different and changeable set of treatments and measurements. To enable image segmentation with a standard watershed algorithm, we treated cells with nuclear and cytoplasmic stains (Supplementary Fig. 4). The dataset comprised 160 conditions, 1.4×106 individual cells and an SDCube with 2.8×106 entries (data are available in the supplemental materials in SDCube and CSV formats; a 10-fold larger dataset involving more proteins is shown in Supplementary Fig. 5). By accessing different slices of the cube, we can view data as a series of IC50 curves at differing EGF concentrations ([EGF]), or as a set of EGF dose-response curves at different drug concentrations ([drug]); cell-to-cell variability can also be visualized at any point (Fig. 4a). We observed that average levels of ppERK increased with [EGF] and decreased with [gefitinib], and that the apparent IC50 was sensitive to EGF concentration, varying ~20-fold as exogenous EGF varied from 0 to 100 ng/mL (Fig. 4b). Well-average data computed from images closely matched dose-response data obtained using conventional biochemical assays (Supplementary Fig. 6). The relationship between IC50 and [EGF] varied substantially with cell type (Fig. 4c): whereas IC50 was strongly sensitive to [EGF] in SKBR3 and T47D cells, it was less so in MCF7 cells (Supplementary Fig. 7). Data exploration of this type is intuitively simple, but involves the manipulation of many data entries; because HDF5 successively loads data, there is no limit a priori to the number of entries, and ImageRail has been validated with ~108–109 data points.


Adaptive informatics for multifactorial and high-content biological data.

Millard BL, Niepel M, Menden MP, Muhlich JL, Sorger PK - Nat. Methods (2011)

Exploring different dimensions of a multivariate drug and ligand dose-response series using SDCubes. (a) Well-mean values are computed from single-cell data recorded from cultured SKBR3 cells exposed to exogenous EGF for 10 min over a range of concentrations and then stained with antibodies specific for ppERK. Data are plotted to show a series of conventional drug dose-response relationships at different ligand concentrations (top). Inverting the axes allows the same data to be plotted as a ligand dose-response curve at different drug doses (middle). For each mean value in either plot, the underlying single-cell distribution can be visualized as a series of dot-plots (bottom panel shows gefitinib dose-response at 1 ng/mL EGF). (b) The ppERK response surface for SKBR3 cells treated as in (a) and colored according to the degree of cell-to-cell variation; darker blue represents a high coefficient of variation. (c) Whisker plots of gefitinib IC10, IC50 and IC90 values for the inhibition of ERK phosphorylation by gefitinib in SKBR3, T47D and MCF7 cells treated for 10 min with a range of EGF concentrations.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3105758&req=5

Figure 4: Exploring different dimensions of a multivariate drug and ligand dose-response series using SDCubes. (a) Well-mean values are computed from single-cell data recorded from cultured SKBR3 cells exposed to exogenous EGF for 10 min over a range of concentrations and then stained with antibodies specific for ppERK. Data are plotted to show a series of conventional drug dose-response relationships at different ligand concentrations (top). Inverting the axes allows the same data to be plotted as a ligand dose-response curve at different drug doses (middle). For each mean value in either plot, the underlying single-cell distribution can be visualized as a series of dot-plots (bottom panel shows gefitinib dose-response at 1 ng/mL EGF). (b) The ppERK response surface for SKBR3 cells treated as in (a) and colored according to the degree of cell-to-cell variation; darker blue represents a high coefficient of variation. (c) Whisker plots of gefitinib IC10, IC50 and IC90 values for the inhibition of ERK phosphorylation by gefitinib in SKBR3, T47D and MCF7 cells treated for 10 min with a range of EGF concentrations.
Mentions: Here, we exposed cells to EGF at 10 doses over a 104 range in combination with gefitinib at 8 doses over a 103 range using a simple adaptive design in which each 96-well plate was subjected to a different and changeable set of treatments and measurements. To enable image segmentation with a standard watershed algorithm, we treated cells with nuclear and cytoplasmic stains (Supplementary Fig. 4). The dataset comprised 160 conditions, 1.4×106 individual cells and an SDCube with 2.8×106 entries (data are available in the supplemental materials in SDCube and CSV formats; a 10-fold larger dataset involving more proteins is shown in Supplementary Fig. 5). By accessing different slices of the cube, we can view data as a series of IC50 curves at differing EGF concentrations ([EGF]), or as a set of EGF dose-response curves at different drug concentrations ([drug]); cell-to-cell variability can also be visualized at any point (Fig. 4a). We observed that average levels of ppERK increased with [EGF] and decreased with [gefitinib], and that the apparent IC50 was sensitive to EGF concentration, varying ~20-fold as exogenous EGF varied from 0 to 100 ng/mL (Fig. 4b). Well-average data computed from images closely matched dose-response data obtained using conventional biochemical assays (Supplementary Fig. 6). The relationship between IC50 and [EGF] varied substantially with cell type (Fig. 4c): whereas IC50 was strongly sensitive to [EGF] in SKBR3 and T47D cells, it was less so in MCF7 cells (Supplementary Fig. 7). Data exploration of this type is intuitively simple, but involves the manipulation of many data entries; because HDF5 successively loads data, there is no limit a priori to the number of entries, and ImageRail has been validated with ~108–109 data points.

Bottom Line: Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types.We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy.We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

View Article: PubMed Central - PubMed

Affiliation: Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

ABSTRACT
Whereas genomic data are universally machine-readable, data from imaging, multiplex biochemistry, flow cytometry and other cell- and tissue-based assays usually reside in loosely organized files of poorly documented provenance. This arises because the relational databases used in genomic research are difficult to adapt to rapidly evolving experimental designs, data formats and analytic algorithms. Here we describe an adaptive approach to managing experimental data based on semantically typed data hypercubes (SDCubes) that combine hierarchical data format 5 (HDF5) and extensible markup language (XML) file types. We demonstrate the application of SDCube-based storage using ImageRail, a software package for high-throughput microscopy. Experimental design and its day-to-day evolution, not rigid standards, determine how ImageRail data are organized in SDCubes. We applied ImageRail to collect and analyze drug dose-response landscapes in human cell lines at single-cell resolution.

Show MeSH
Related in: MedlinePlus