Limits...
Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Kafkas Ş, Kim JH, Pi X, McEntyre JR - J Biomed Semantics (2015)

Bottom Line: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central.Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves.These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

ABSTRACT

Background: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.

Results: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.

Conclusions: Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

No MeSH data available.


Related in: MedlinePlus

Distribution of average number of database citations over years. Articles with supplementary data (left axis), Supplementary data (right axis). This figure describes distribution of average number of database citations over years in supplementary data and in the bodies of articles which have supplementary data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4363206&req=5

Fig3: Distribution of average number of database citations over years. Articles with supplementary data (left axis), Supplementary data (right axis). This figure describes distribution of average number of database citations over years in supplementary data and in the bodies of articles which have supplementary data.

Mentions: Figure 3 shows the distribution of the average number of accession numbers identified in the bodies and supplementary files of these articles. On average, the number of citations in the supplementary data is significantly higher than the number of citations in the article bodies. This is perhaps not surprising since authors tend to cite only the key data in the article bodies and provide a larger set of data in supplementary files.


Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Kafkas Ş, Kim JH, Pi X, McEntyre JR - J Biomed Semantics (2015)

Distribution of average number of database citations over years. Articles with supplementary data (left axis), Supplementary data (right axis). This figure describes distribution of average number of database citations over years in supplementary data and in the bodies of articles which have supplementary data.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4363206&req=5

Fig3: Distribution of average number of database citations over years. Articles with supplementary data (left axis), Supplementary data (right axis). This figure describes distribution of average number of database citations over years in supplementary data and in the bodies of articles which have supplementary data.
Mentions: Figure 3 shows the distribution of the average number of accession numbers identified in the bodies and supplementary files of these articles. On average, the number of citations in the supplementary data is significantly higher than the number of citations in the article bodies. This is perhaps not surprising since authors tend to cite only the key data in the article bodies and provide a larger set of data in supplementary files.

Bottom Line: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central.Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves.These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

ABSTRACT

Background: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.

Results: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.

Conclusions: Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

No MeSH data available.


Related in: MedlinePlus