Limits...
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

Bagewadi S, Adhikari S, Dhrangadhariya A, Irin AK, Ebeling C, Namasivayam AA, Page M, Hofmann-Apitius M, Senger P - Database (Oxford) (2015)

Bottom Line: Much of the information to complete, or refine meta-annotations are distributed in the associated publications.Curated metadata for Alzheimer's disease gene expression studies are available for download.Database URL: www.scai.fraunhofer.de/NeuroTransDB.html.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany, Rheinische Friedrich-Wilhelms-Universitaet Bonn, Bonn-Aachen International Center for Information Technology, 53113, Bonn, Germany, shweta.bagewadi@scai.fraunhofer.de.

No MeSH data available.


Related in: MedlinePlus

Coverage of basic metadata annotation fields for human AD priority 1 samples with automated retrieval and manual curation. Automated retrieval involved downloading the metadata information from ArrayExpress and GEO, programmatically. For missing meta-annotations, we applied manual curation step to harvest information from the published articles and their associated Supplementary materials. It is clear from the above statistics that manual curation accuracy for basic annotations, such as patient’s clinical manifestations, and raw file information, is highly dependent on data availability.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4608514&req=5

bav099-F7: Coverage of basic metadata annotation fields for human AD priority 1 samples with automated retrieval and manual curation. Automated retrieval involved downloading the metadata information from ArrayExpress and GEO, programmatically. For missing meta-annotations, we applied manual curation step to harvest information from the published articles and their associated Supplementary materials. It is clear from the above statistics that manual curation accuracy for basic annotations, such as patient’s clinical manifestations, and raw file information, is highly dependent on data availability.

Mentions: The underlying metadata information for any gene expression study has been underrepresented and thus is largely under-utilized. To perform large-scale analysis, associated annotations are of utmost importance. With the availability of detailed annotation information, one is capable of selecting studies that focus on a particular attribute, such as stage or gender. Each priority class has a specific set of fields for curation; some fields are organism dependent. After prioritization of experiments (cf. Experiment Prioritization section), we expect to have ∼100% coverage of essential clinical and relational parameters during manual metadata curation for priority 1 studies. For example, age, gender, phenotype and stage are basic experimental variables for human studies. Additionally, in case of animal models, mouse and rat strain names are important for translational pipelines, as some strains are highly specific models for human NDD while others not (38). Irrespective of the organisms, samples mapped to their corresponding raw file identifiers are vital for running large-scale analysis. However, as shown in Figure 7, this does not hold true for human studies. From Figure 7, it is evident that even after performing thorough curation, we cannot achieve 100% in capturing information for these five basic metadata fields, a fact that is largely due to patient data privacy regulations. Similar is the case with mouse and rat information, see Supplementary Figure S6. Moreover, information related to animal models are much more scare, obstructing automated retrieval. Hence, manual curation accuracy is highly dependent on information availability, as curators cannot harvest information for annotation fields that are not available. On the contrary, the level of detail also depends on the type or aim of the experiment carried out. The authors and database owners obviously need to focus on the qualitative aspect of the experimental information, especially the phenotype of the sample, to allow normalized access for beginners, with standard prose, in order to support a robust computational analysis across all studies in ArrayExpress and GEO.Figure 7.


NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

Bagewadi S, Adhikari S, Dhrangadhariya A, Irin AK, Ebeling C, Namasivayam AA, Page M, Hofmann-Apitius M, Senger P - Database (Oxford) (2015)

Coverage of basic metadata annotation fields for human AD priority 1 samples with automated retrieval and manual curation. Automated retrieval involved downloading the metadata information from ArrayExpress and GEO, programmatically. For missing meta-annotations, we applied manual curation step to harvest information from the published articles and their associated Supplementary materials. It is clear from the above statistics that manual curation accuracy for basic annotations, such as patient’s clinical manifestations, and raw file information, is highly dependent on data availability.
© Copyright Policy - creative-commons
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4608514&req=5

bav099-F7: Coverage of basic metadata annotation fields for human AD priority 1 samples with automated retrieval and manual curation. Automated retrieval involved downloading the metadata information from ArrayExpress and GEO, programmatically. For missing meta-annotations, we applied manual curation step to harvest information from the published articles and their associated Supplementary materials. It is clear from the above statistics that manual curation accuracy for basic annotations, such as patient’s clinical manifestations, and raw file information, is highly dependent on data availability.
Mentions: The underlying metadata information for any gene expression study has been underrepresented and thus is largely under-utilized. To perform large-scale analysis, associated annotations are of utmost importance. With the availability of detailed annotation information, one is capable of selecting studies that focus on a particular attribute, such as stage or gender. Each priority class has a specific set of fields for curation; some fields are organism dependent. After prioritization of experiments (cf. Experiment Prioritization section), we expect to have ∼100% coverage of essential clinical and relational parameters during manual metadata curation for priority 1 studies. For example, age, gender, phenotype and stage are basic experimental variables for human studies. Additionally, in case of animal models, mouse and rat strain names are important for translational pipelines, as some strains are highly specific models for human NDD while others not (38). Irrespective of the organisms, samples mapped to their corresponding raw file identifiers are vital for running large-scale analysis. However, as shown in Figure 7, this does not hold true for human studies. From Figure 7, it is evident that even after performing thorough curation, we cannot achieve 100% in capturing information for these five basic metadata fields, a fact that is largely due to patient data privacy regulations. Similar is the case with mouse and rat information, see Supplementary Figure S6. Moreover, information related to animal models are much more scare, obstructing automated retrieval. Hence, manual curation accuracy is highly dependent on information availability, as curators cannot harvest information for annotation fields that are not available. On the contrary, the level of detail also depends on the type or aim of the experiment carried out. The authors and database owners obviously need to focus on the qualitative aspect of the experimental information, especially the phenotype of the sample, to allow normalized access for beginners, with standard prose, in order to support a robust computational analysis across all studies in ArrayExpress and GEO.Figure 7.

Bottom Line: Much of the information to complete, or refine meta-annotations are distributed in the associated publications.Curated metadata for Alzheimer's disease gene expression studies are available for download.Database URL: www.scai.fraunhofer.de/NeuroTransDB.html.

View Article: PubMed Central - PubMed

Affiliation: Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, 53754 Sankt Augustin, Germany, Rheinische Friedrich-Wilhelms-Universitaet Bonn, Bonn-Aachen International Center for Information Technology, 53113, Bonn, Germany, shweta.bagewadi@scai.fraunhofer.de.

No MeSH data available.


Related in: MedlinePlus