Limits...
The database for aggregate analysis of ClinicalTrials.gov (AACT) and subsequent regrouping by clinical specialty.

Tasneem A, Aberle L, Ananth H, Chakraborty S, Chiswell K, McCourt BJ, Pietrobon R - PLoS ONE (2012)

Bottom Line: Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations.False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties.The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format).

View Article: PubMed Central - PubMed

Affiliation: Duke Clinical Research Institute, Durham, North Carolina, United States of America. asba.tasneem@duke.edu

ABSTRACT

Background: The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However, issues related to data structure, nomenclature, and changes in data collection over time present challenges to the aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource.

Methods/principal results: The purpose of our project was twofold. First, we sought to extend the usability of ClinicalTrials.gov for research purposes by developing a database for aggregate analysis of ClinicalTrials.gov (AACT) that contains data from the 96,346 clinical trials registered as of September 27, 2010. Second, we developed and validated a methodology for annotating studies by clinical specialty, using a custom taxonomy employing Medical Subject Heading (MeSH) terms applied by an NLM algorithm, as well as MeSH terms and other disease condition terms provided by study sponsors. Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations. False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties.

Conclusions/significance: The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format). This publicly-accessible dataset will facilitate analysis of studies and permit detailed characterization and analysis of the U.S. clinical trials enterprise as a whole. In addition, the methodology we present for creating specialty datasets may facilitate other efforts to analyze studies by specialty groups.

Show MeSH
High-level Entity-Relationship Diagram (ERD) for AACT.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3306288&req=5

pone-0033677-g002: High-level Entity-Relationship Diagram (ERD) for AACT.

Mentions: ClinicalTrials.gov data element definitions, xsd specifications for registry data submission, and downloaded study XML files were used to represent data specifications for the downloaded data. A physical data model was designed using Enterprise Architect (Sparx Systems Pty Ltd, Creswick, Victoria, Australia); this model depicted data tables and their data columns, as well as relationships between and among tables. An optimal structure was achieved through normalization, which was used to organize data efficiently, eliminate redundancy, and ensure logical data dependencies by storing only related data within a given table [11]. The database (Figure 2) was normalized to the Second Normal Form (2NF), a set of criteria designed to prevent logical inconsistencies while reducing data redundancy [12].


The database for aggregate analysis of ClinicalTrials.gov (AACT) and subsequent regrouping by clinical specialty.

Tasneem A, Aberle L, Ananth H, Chakraborty S, Chiswell K, McCourt BJ, Pietrobon R - PLoS ONE (2012)

High-level Entity-Relationship Diagram (ERD) for AACT.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3306288&req=5

pone-0033677-g002: High-level Entity-Relationship Diagram (ERD) for AACT.
Mentions: ClinicalTrials.gov data element definitions, xsd specifications for registry data submission, and downloaded study XML files were used to represent data specifications for the downloaded data. A physical data model was designed using Enterprise Architect (Sparx Systems Pty Ltd, Creswick, Victoria, Australia); this model depicted data tables and their data columns, as well as relationships between and among tables. An optimal structure was achieved through normalization, which was used to organize data efficiently, eliminate redundancy, and ensure logical data dependencies by storing only related data within a given table [11]. The database (Figure 2) was normalized to the Second Normal Form (2NF), a set of criteria designed to prevent logical inconsistencies while reducing data redundancy [12].

Bottom Line: Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations.False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties.The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format).

View Article: PubMed Central - PubMed

Affiliation: Duke Clinical Research Institute, Durham, North Carolina, United States of America. asba.tasneem@duke.edu

ABSTRACT

Background: The ClinicalTrials.gov registry provides information regarding characteristics of past, current, and planned clinical studies to patients, clinicians, and researchers; in addition, registry data are available for bulk download. However, issues related to data structure, nomenclature, and changes in data collection over time present challenges to the aggregate analysis and interpretation of these data in general and to the analysis of trials according to clinical specialty in particular. Improving usability of these data could enhance the utility of ClinicalTrials.gov as a research resource.

Methods/principal results: The purpose of our project was twofold. First, we sought to extend the usability of ClinicalTrials.gov for research purposes by developing a database for aggregate analysis of ClinicalTrials.gov (AACT) that contains data from the 96,346 clinical trials registered as of September 27, 2010. Second, we developed and validated a methodology for annotating studies by clinical specialty, using a custom taxonomy employing Medical Subject Heading (MeSH) terms applied by an NLM algorithm, as well as MeSH terms and other disease condition terms provided by study sponsors. Clinical specialists reviewed and annotated MeSH and non-MeSH disease condition terms, and an algorithm was created to classify studies into clinical specialties based on both MeSH and non-MeSH annotations. False positives and false negatives were evaluated by comparing algorithmic classification with manual classification for three specialties.

Conclusions/significance: The resulting AACT database features study design attributes parsed into discrete fields, integrated metadata, and an integrated MeSH thesaurus, and is available for download as Oracle extracts (.dmp file and text format). This publicly-accessible dataset will facilitate analysis of studies and permit detailed characterization and analysis of the U.S. clinical trials enterprise as a whole. In addition, the methodology we present for creating specialty datasets may facilitate other efforts to analyze studies by specialty groups.

Show MeSH