Limits...
Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Kafkas Ş, Kim JH, Pi X, McEntyre JR - J Biomed Semantics (2015)

Bottom Line: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files.Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves.These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

ABSTRACT

Background: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.

Results: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.

Conclusions: Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

No MeSH data available.


Average number of database citations in article bodies by including and excluding ENA. This figure describes distribution of average number of database citations in article bodies by excluding and including ENA citations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4363206&req=5

Fig4: Average number of database citations in article bodies by including and excluding ENA. This figure describes distribution of average number of database citations in article bodies by excluding and including ENA citations.

Mentions: As can be seen from Figure 3, there is a peak in 2007 in the average number of citations in article bodies. A further analysis done on the average number of citations shows that ENA citations are the main source of this trend (see Figure 4).Figure 3


Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

Kafkas Ş, Kim JH, Pi X, McEntyre JR - J Biomed Semantics (2015)

Average number of database citations in article bodies by including and excluding ENA. This figure describes distribution of average number of database citations in article bodies by excluding and including ENA citations.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4363206&req=5

Fig4: Average number of database citations in article bodies by including and excluding ENA. This figure describes distribution of average number of database citations in article bodies by excluding and including ENA citations.
Mentions: As can be seen from Figure 3, there is a peak in 2007 in the average number of citations in article bodies. A further analysis done on the average number of citations shows that ENA citations are the main source of this trend (see Figure 4).Figure 3

Bottom Line: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files.Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves.These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD UK.

ABSTRACT

Background: In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases.

Results: Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771.

Conclusions: Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

No MeSH data available.