Limits...
Calling International Rescue: knowledge lost in literature and data landslide!

Attwood TK, Kell DB, McDermott P, Marsh J, Pettifer SR, Thorne D - Biochem. J. (2009)

Bottom Line: With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist?We ask you, please, to read the instructions carefully.The time has come: you may turn over your papers...

View Article: PubMed Central - PubMed

Affiliation: School of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, UK. teresa.k.attwood@manchester.ac.uk

ABSTRACT
We live in interesting times. Portents of impending catastrophe pervade the literature, calling us to action in the face of unmanageable volumes of scientific data. But it isn't so much data generation per se, but the systematic burial of the knowledge embodied in those data that poses the problem: there is so much information available that we simply no longer know what we know, and finding what we want is hard - too hard. The knowledge we seek is often fragmentary and disconnected, spread thinly across thousands of databases and millions of articles in thousands of journals. The intellectual energy required to search this array of data-archives, and the time and money this wastes, has led several researchers to challenge the methods by which we traditionally commit newly acquired facts and knowledge to the scientific record. We present some of these initiatives here - a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment. With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist? We explore these questions, and, as you read on, we invite you to engage in an experiment with us, a real-time test of a new technology to rescue data from the dormant pages of published documents. We ask you, please, to read the instructions carefully. The time has come: you may turn over your papers...

Show MeSH

Related in: MedlinePlus

Tools that could support the discovery of errors and inconsistencies could have profound consequences for the evolution of knowledgeIn 2007, Liu et al. [71] reported in Science the discovery of a novel plant G protein-coupled receptor (GPCR), so-called GCR2 (a). Much of the supporting evidence rested on a ‘characteristic’ hydropathy profile (reported as a Supplementary Figure), which showed seven peaks, apparently consistent with known GPCR transmembrane (TM) domain topology (b). Illingworth et al. challenged this result, pointing to the clear similarity of GCR2 with LanC-like proteins and showing that the topology of the hydropathy profile was the result of the seven-fold symmetry of the inner helical toroid (the blue/green region in the centre of the structure) of this globular protein (c) [66]. It is interesting to compare a hydropathy plot (d) with that reported by Liu et al. (b), generated using the same DAS TM prediction server [72] – note the omission of the significance bars in the latter, which in the former show that only one of the seven peaks scores above the significance threshold for TM domains and hence argues strongly against this being a membrane protein. Compare the structure of a bona fide GPCR [bovine rhodopsin, PDB code 1F88 (e)] with the nisin cyclase structure shown in Illingworth's paper [PDB code 2G0D (c)]. Despite the obvious lack of sequence and structural similarity of GCR2 to genuine GPCRs, and its clear affiliation with the LanC-like proteins, this error has been propagated to the description line of its UniProt entry, even though the entry contains database cross-references to LanC-like proteins rather than GPCRs (f). For readers viewing this article using UD, click on the UD logos in the Figure to explore this scenario further. Reproduced from Illingworth, C.J.R., Parkes, K.E., Snell, C.R., Mullineaux, P.M. and Reynolds, C.A (2008) Criteria for confirming sequence periodicity identified by Fourier transform analysis: application to GCR2, a candidate plant GPCR? Biophysical Chemistry 133, 28–35, Copyright (2008), with permission from Elsevier; and from Liu, X. G., Yue, Y. L., Li, B., Nie, Y. L., Li, W., Wu, W. H. and Ma, L. G. (2007) A G protein-coupled receptor is a plasma membrane receptor for the plant hormone abscisic acid. Science 315, 1712–1716 (http://www.sciencemag.org/cgi/content/abstract/315/5819/1712), with permission from AAAS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2805925&req=5

Figure 13: Tools that could support the discovery of errors and inconsistencies could have profound consequences for the evolution of knowledgeIn 2007, Liu et al. [71] reported in Science the discovery of a novel plant G protein-coupled receptor (GPCR), so-called GCR2 (a). Much of the supporting evidence rested on a ‘characteristic’ hydropathy profile (reported as a Supplementary Figure), which showed seven peaks, apparently consistent with known GPCR transmembrane (TM) domain topology (b). Illingworth et al. challenged this result, pointing to the clear similarity of GCR2 with LanC-like proteins and showing that the topology of the hydropathy profile was the result of the seven-fold symmetry of the inner helical toroid (the blue/green region in the centre of the structure) of this globular protein (c) [66]. It is interesting to compare a hydropathy plot (d) with that reported by Liu et al. (b), generated using the same DAS TM prediction server [72] – note the omission of the significance bars in the latter, which in the former show that only one of the seven peaks scores above the significance threshold for TM domains and hence argues strongly against this being a membrane protein. Compare the structure of a bona fide GPCR [bovine rhodopsin, PDB code 1F88 (e)] with the nisin cyclase structure shown in Illingworth's paper [PDB code 2G0D (c)]. Despite the obvious lack of sequence and structural similarity of GCR2 to genuine GPCRs, and its clear affiliation with the LanC-like proteins, this error has been propagated to the description line of its UniProt entry, even though the entry contains database cross-references to LanC-like proteins rather than GPCRs (f). For readers viewing this article using UD, click on the UD logos in the Figure to explore this scenario further. Reproduced from Illingworth, C.J.R., Parkes, K.E., Snell, C.R., Mullineaux, P.M. and Reynolds, C.A (2008) Criteria for confirming sequence periodicity identified by Fourier transform analysis: application to GCR2, a candidate plant GPCR? Biophysical Chemistry 133, 28–35, Copyright (2008), with permission from Elsevier; and from Liu, X. G., Yue, Y. L., Li, B., Nie, Y. L., Li, W., Wu, W. H. and Ma, L. G. (2007) A G protein-coupled receptor is a plasma membrane receptor for the plant hormone abscisic acid. Science 315, 1712–1716 (http://www.sciencemag.org/cgi/content/abstract/315/5819/1712), with permission from AAAS.

Mentions: What is clear is that new technologies will emerge (and indeed, are already emerging) to promote a fundamental shift away from how scholarly communication currently works [69]. A key driver of this change will be realization of the benefits that accrue from having more explicit links between articles and the data and concepts they describe [70]. Processes that will particularly profit from such links are peer review and the dissemination of (reliable) knowledge. Were a paper to become an interactive interface to its underlying data, it could, for example, facilitate further research across multiple articles and databases, and lead more easily to the discovery of errors; combined with suitable social technologies for community commentary, a published paper could at the same time act as its own self-correcting record. This would be an especially powerful development, as the extent to which peer review of an article extends to its underlying data is generally not at all clear, and current mechanisms for data correction, updating and maintenance are not synchronized with those for managing the literature [37]. Thus, as Antezana points out, reported ‘facts’ may be incomplete, incorrect or simply false, and new knowledge may refute ‘accepted’ information [10]. Unfortunately, however, we have no way of knowing what the error rates in the literature or in biological databases actually are, or indeed what are the rates of propagation of those errors between databases and papers, and vice versa. The ramifications of new tools and technologies that could support the discovery of errors and inconsistencies, which could allow us to track and to consistently record the evolution of the current state of our knowledge, are therefore potentially profound. Consider, for a moment, the example illustrated in Figure 13.


Calling International Rescue: knowledge lost in literature and data landslide!

Attwood TK, Kell DB, McDermott P, Marsh J, Pettifer SR, Thorne D - Biochem. J. (2009)

Tools that could support the discovery of errors and inconsistencies could have profound consequences for the evolution of knowledgeIn 2007, Liu et al. [71] reported in Science the discovery of a novel plant G protein-coupled receptor (GPCR), so-called GCR2 (a). Much of the supporting evidence rested on a ‘characteristic’ hydropathy profile (reported as a Supplementary Figure), which showed seven peaks, apparently consistent with known GPCR transmembrane (TM) domain topology (b). Illingworth et al. challenged this result, pointing to the clear similarity of GCR2 with LanC-like proteins and showing that the topology of the hydropathy profile was the result of the seven-fold symmetry of the inner helical toroid (the blue/green region in the centre of the structure) of this globular protein (c) [66]. It is interesting to compare a hydropathy plot (d) with that reported by Liu et al. (b), generated using the same DAS TM prediction server [72] – note the omission of the significance bars in the latter, which in the former show that only one of the seven peaks scores above the significance threshold for TM domains and hence argues strongly against this being a membrane protein. Compare the structure of a bona fide GPCR [bovine rhodopsin, PDB code 1F88 (e)] with the nisin cyclase structure shown in Illingworth's paper [PDB code 2G0D (c)]. Despite the obvious lack of sequence and structural similarity of GCR2 to genuine GPCRs, and its clear affiliation with the LanC-like proteins, this error has been propagated to the description line of its UniProt entry, even though the entry contains database cross-references to LanC-like proteins rather than GPCRs (f). For readers viewing this article using UD, click on the UD logos in the Figure to explore this scenario further. Reproduced from Illingworth, C.J.R., Parkes, K.E., Snell, C.R., Mullineaux, P.M. and Reynolds, C.A (2008) Criteria for confirming sequence periodicity identified by Fourier transform analysis: application to GCR2, a candidate plant GPCR? Biophysical Chemistry 133, 28–35, Copyright (2008), with permission from Elsevier; and from Liu, X. G., Yue, Y. L., Li, B., Nie, Y. L., Li, W., Wu, W. H. and Ma, L. G. (2007) A G protein-coupled receptor is a plasma membrane receptor for the plant hormone abscisic acid. Science 315, 1712–1716 (http://www.sciencemag.org/cgi/content/abstract/315/5819/1712), with permission from AAAS.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2805925&req=5

Figure 13: Tools that could support the discovery of errors and inconsistencies could have profound consequences for the evolution of knowledgeIn 2007, Liu et al. [71] reported in Science the discovery of a novel plant G protein-coupled receptor (GPCR), so-called GCR2 (a). Much of the supporting evidence rested on a ‘characteristic’ hydropathy profile (reported as a Supplementary Figure), which showed seven peaks, apparently consistent with known GPCR transmembrane (TM) domain topology (b). Illingworth et al. challenged this result, pointing to the clear similarity of GCR2 with LanC-like proteins and showing that the topology of the hydropathy profile was the result of the seven-fold symmetry of the inner helical toroid (the blue/green region in the centre of the structure) of this globular protein (c) [66]. It is interesting to compare a hydropathy plot (d) with that reported by Liu et al. (b), generated using the same DAS TM prediction server [72] – note the omission of the significance bars in the latter, which in the former show that only one of the seven peaks scores above the significance threshold for TM domains and hence argues strongly against this being a membrane protein. Compare the structure of a bona fide GPCR [bovine rhodopsin, PDB code 1F88 (e)] with the nisin cyclase structure shown in Illingworth's paper [PDB code 2G0D (c)]. Despite the obvious lack of sequence and structural similarity of GCR2 to genuine GPCRs, and its clear affiliation with the LanC-like proteins, this error has been propagated to the description line of its UniProt entry, even though the entry contains database cross-references to LanC-like proteins rather than GPCRs (f). For readers viewing this article using UD, click on the UD logos in the Figure to explore this scenario further. Reproduced from Illingworth, C.J.R., Parkes, K.E., Snell, C.R., Mullineaux, P.M. and Reynolds, C.A (2008) Criteria for confirming sequence periodicity identified by Fourier transform analysis: application to GCR2, a candidate plant GPCR? Biophysical Chemistry 133, 28–35, Copyright (2008), with permission from Elsevier; and from Liu, X. G., Yue, Y. L., Li, B., Nie, Y. L., Li, W., Wu, W. H. and Ma, L. G. (2007) A G protein-coupled receptor is a plasma membrane receptor for the plant hormone abscisic acid. Science 315, 1712–1716 (http://www.sciencemag.org/cgi/content/abstract/315/5819/1712), with permission from AAAS.
Mentions: What is clear is that new technologies will emerge (and indeed, are already emerging) to promote a fundamental shift away from how scholarly communication currently works [69]. A key driver of this change will be realization of the benefits that accrue from having more explicit links between articles and the data and concepts they describe [70]. Processes that will particularly profit from such links are peer review and the dissemination of (reliable) knowledge. Were a paper to become an interactive interface to its underlying data, it could, for example, facilitate further research across multiple articles and databases, and lead more easily to the discovery of errors; combined with suitable social technologies for community commentary, a published paper could at the same time act as its own self-correcting record. This would be an especially powerful development, as the extent to which peer review of an article extends to its underlying data is generally not at all clear, and current mechanisms for data correction, updating and maintenance are not synchronized with those for managing the literature [37]. Thus, as Antezana points out, reported ‘facts’ may be incomplete, incorrect or simply false, and new knowledge may refute ‘accepted’ information [10]. Unfortunately, however, we have no way of knowing what the error rates in the literature or in biological databases actually are, or indeed what are the rates of propagation of those errors between databases and papers, and vice versa. The ramifications of new tools and technologies that could support the discovery of errors and inconsistencies, which could allow us to track and to consistently record the evolution of the current state of our knowledge, are therefore potentially profound. Consider, for a moment, the example illustrated in Figure 13.

Bottom Line: With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist?We ask you, please, to read the instructions carefully.The time has come: you may turn over your papers...

View Article: PubMed Central - PubMed

Affiliation: School of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, UK. teresa.k.attwood@manchester.ac.uk

ABSTRACT
We live in interesting times. Portents of impending catastrophe pervade the literature, calling us to action in the face of unmanageable volumes of scientific data. But it isn't so much data generation per se, but the systematic burial of the knowledge embodied in those data that poses the problem: there is so much information available that we simply no longer know what we know, and finding what we want is hard - too hard. The knowledge we seek is often fragmentary and disconnected, spread thinly across thousands of databases and millions of articles in thousands of journals. The intellectual energy required to search this array of data-archives, and the time and money this wastes, has led several researchers to challenge the methods by which we traditionally commit newly acquired facts and knowledge to the scientific record. We present some of these initiatives here - a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment. With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist? We explore these questions, and, as you read on, we invite you to engage in an experiment with us, a real-time test of a new technology to rescue data from the dormant pages of published documents. We ask you, please, to read the instructions carefully. The time has come: you may turn over your papers...

Show MeSH
Related in: MedlinePlus