Limits...
Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Kafkas Ş, Kim JH, Pi X, McEntyre JR - J Biomed Semantics (2015)

Bottom Line: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files.Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves.These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

ABSTRACT

Background: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.

Results: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.

Conclusions: Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

No MeSH data available.


Distribution of database citations in the OA-ePMC articles. This figure describes distribution of database citations in the Europe PMC open access full text articles.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4363206&req=5

Fig2: Distribution of database citations in the OA-ePMC articles. This figure describes distribution of database citations in the Europe PMC open access full text articles.

Mentions: Figure 2 shows the distribution of database citation in the 410,364 OA-ePMC articles and their supplementary data. The analysis reveals that 16.8% of articles (68,995/410,364; Figure 2 (c)) have supplementary data in either text or text convertible format. Only, 3,365 of these 68,995 articles (3,365/410,364; 0.82%; Figure 2 (f)) contain database citations in both their body and supplementary data.Figure 2


Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Kafkas Ş, Kim JH, Pi X, McEntyre JR - J Biomed Semantics (2015)

Distribution of database citations in the OA-ePMC articles. This figure describes distribution of database citations in the Europe PMC open access full text articles.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4363206&req=5

Fig2: Distribution of database citations in the OA-ePMC articles. This figure describes distribution of database citations in the Europe PMC open access full text articles.
Mentions: Figure 2 shows the distribution of database citation in the 410,364 OA-ePMC articles and their supplementary data. The analysis reveals that 16.8% of articles (68,995/410,364; Figure 2 (c)) have supplementary data in either text or text convertible format. Only, 3,365 of these 68,995 articles (3,365/410,364; 0.82%; Figure 2 (f)) contain database citations in both their body and supplementary data.Figure 2

Bottom Line: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files.Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves.These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

ABSTRACT

Background: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.

Results: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.

Conclusions: Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

No MeSH data available.