Limits...
Traits and types of health data repositories.

Wade TD - Health Inf Sci Syst (2014)

Bottom Line: One imagined ideal structure for research progress has been called an "Information Commons".It would have longitudinal, multi-leveled (environmental through molecular) data on a large population of identified, consenting individuals.These are qualities whose achievement would require long term commitment on the part of many data donors, including a willingness to make their data public.

View Article: PubMed Central - PubMed

Affiliation: Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206-2761 USA.

ABSTRACT
We review traits of reusable clinical data and offer a typology of clinical repositories with a range of known examples. Sources of clinical data suitable for research can be classified into types reflecting the data's institutional origin, original purpose, level of integration and governance. Primary data nearly always come from research studies and electronic medical records. Registries collect data on focused populations primarily to track outcomes, often using observational research methods. Warehouses are institutional information utilities repackaging clinical care data. Collections organize data from more organizations than a data warehouse, and more original data sources than a registry. Therefore even if they are heavily curated, their level of internal integration, and thus ease of use, can be less than other types. Federations are like collections except that physical control over data is distributed among donor organizations. Federations sometimes federate, giving a second level of organization. While the size, in number of patients, varies widely within each type of data source, populations over 10 K are relatively numerous, and much larger populations can be seen in warehouses and federations. One imagined ideal structure for research progress has been called an "Information Commons". It would have longitudinal, multi-leveled (environmental through molecular) data on a large population of identified, consenting individuals. These are qualities whose achievement would require long term commitment on the part of many data donors, including a willingness to make their data public.

No MeSH data available.


Biomedical repository types and sizes. Each type has exemplars with size or range of sizes shown as the log10 of the number of distinct patients represented. When a cell has a number, it is the coefficient of the log: e.g., a 2.7 in the 2 column means 2.7 × 102. A filled cell with no number is either part of a known range, or part of an order of magnitude estimated range. Generations from source refers to generations of modification of data or access methods, where the original source data is generation 1. Types and exemplars are discussed in the text. Specific exemplars only appear if data for estimating their size are available.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4340801&req=5

Fig1: Biomedical repository types and sizes. Each type has exemplars with size or range of sizes shown as the log10 of the number of distinct patients represented. When a cell has a number, it is the coefficient of the log: e.g., a 2.7 in the 2 column means 2.7 × 102. A filled cell with no number is either part of a known range, or part of an order of magnitude estimated range. Generations from source refers to generations of modification of data or access methods, where the original source data is generation 1. Types and exemplars are discussed in the text. Specific exemplars only appear if data for estimating their size are available.

Mentions: Descriptions of the sources will refer to repository traits (Table 1) that make them more or less useful and available for research. The first two traits are quantitative ones that we use later (Figure 1) to compare all the repository types. The first trait is the number of patients or research subjects observed. Users of our own data warehouse have made it clear that the number of potential patients is a primary concern for researchers when judging whether a data source is useful. Though in Figure 1 these numbers vary across eight orders of magnitude, they still fail to capture much of the variability of data volume because the numbers of observations made on each patient also vary widely. For strictly clinical databases, the number of observations per patient might vary from the 10’s to the low 1000’s. For biomolecular data, typical data points per patient are often in the 10 K’s to 100 K’s. The range is much wider: from a low end of highly focused assays to a high end of sequencing entire genomes. Even being conservative, the number of data objects one might find in a re-usable clinical database right now could vary from roughly 102 for a clinical pilot study to 1011 (our estimate for DbGap [11]) -- a staggering range.Table 1


Traits and types of health data repositories.

Wade TD - Health Inf Sci Syst (2014)

Biomedical repository types and sizes. Each type has exemplars with size or range of sizes shown as the log10 of the number of distinct patients represented. When a cell has a number, it is the coefficient of the log: e.g., a 2.7 in the 2 column means 2.7 × 102. A filled cell with no number is either part of a known range, or part of an order of magnitude estimated range. Generations from source refers to generations of modification of data or access methods, where the original source data is generation 1. Types and exemplars are discussed in the text. Specific exemplars only appear if data for estimating their size are available.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4340801&req=5

Fig1: Biomedical repository types and sizes. Each type has exemplars with size or range of sizes shown as the log10 of the number of distinct patients represented. When a cell has a number, it is the coefficient of the log: e.g., a 2.7 in the 2 column means 2.7 × 102. A filled cell with no number is either part of a known range, or part of an order of magnitude estimated range. Generations from source refers to generations of modification of data or access methods, where the original source data is generation 1. Types and exemplars are discussed in the text. Specific exemplars only appear if data for estimating their size are available.
Mentions: Descriptions of the sources will refer to repository traits (Table 1) that make them more or less useful and available for research. The first two traits are quantitative ones that we use later (Figure 1) to compare all the repository types. The first trait is the number of patients or research subjects observed. Users of our own data warehouse have made it clear that the number of potential patients is a primary concern for researchers when judging whether a data source is useful. Though in Figure 1 these numbers vary across eight orders of magnitude, they still fail to capture much of the variability of data volume because the numbers of observations made on each patient also vary widely. For strictly clinical databases, the number of observations per patient might vary from the 10’s to the low 1000’s. For biomolecular data, typical data points per patient are often in the 10 K’s to 100 K’s. The range is much wider: from a low end of highly focused assays to a high end of sequencing entire genomes. Even being conservative, the number of data objects one might find in a re-usable clinical database right now could vary from roughly 102 for a clinical pilot study to 1011 (our estimate for DbGap [11]) -- a staggering range.Table 1

Bottom Line: One imagined ideal structure for research progress has been called an "Information Commons".It would have longitudinal, multi-leveled (environmental through molecular) data on a large population of identified, consenting individuals.These are qualities whose achievement would require long term commitment on the part of many data donors, including a willingness to make their data public.

View Article: PubMed Central - PubMed

Affiliation: Division of Biostatistics and Bioinformatics, National Jewish Health, Denver, CO 80206-2761 USA.

ABSTRACT
We review traits of reusable clinical data and offer a typology of clinical repositories with a range of known examples. Sources of clinical data suitable for research can be classified into types reflecting the data's institutional origin, original purpose, level of integration and governance. Primary data nearly always come from research studies and electronic medical records. Registries collect data on focused populations primarily to track outcomes, often using observational research methods. Warehouses are institutional information utilities repackaging clinical care data. Collections organize data from more organizations than a data warehouse, and more original data sources than a registry. Therefore even if they are heavily curated, their level of internal integration, and thus ease of use, can be less than other types. Federations are like collections except that physical control over data is distributed among donor organizations. Federations sometimes federate, giving a second level of organization. While the size, in number of patients, varies widely within each type of data source, populations over 10 K are relatively numerous, and much larger populations can be seen in warehouses and federations. One imagined ideal structure for research progress has been called an "Information Commons". It would have longitudinal, multi-leveled (environmental through molecular) data on a large population of identified, consenting individuals. These are qualities whose achievement would require long term commitment on the part of many data donors, including a willingness to make their data public.

No MeSH data available.