Limits...
Section level search functionality in Europe PMC.

Kafkas Ş, Pi X, Marinos N, Talo' F, Morrison A, McEntyre JR - J Biomed Semantics (2015)

Bottom Line: Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches.To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles.The tagger's performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.

ABSTRACT

Background: As the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content. Here, we present a new feature available in Europe PMC that allows selected sections of full text articles to be searched, including figures and reference lists. Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches.

Results: To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger's performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%.

Conclusions: The section search is available from the advanced search within Europe PMC (http://europepmc.org). The source code is freely available from http://europepmc.org/ftp/oa/SectionTagger/.

No MeSH data available.


Distribution of XML to non-XML documents, including OA status, by publication year. This figure shows the distribution of XML to non-XML documents available in Europe PMC including OA status by publication year. The section tagger operates on the full text articles provided in XML format only. The figure shows that XML-formatted documents make up close to 100% of content available in Europe PMC that has been published in the last 7 years, which means that only a small minority of recent articles available in Europe PMC are missed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4359544&req=5

Fig2: Distribution of XML to non-XML documents, including OA status, by publication year. This figure shows the distribution of XML to non-XML documents available in Europe PMC including OA status by publication year. The section tagger operates on the full text articles provided in XML format only. The figure shows that XML-formatted documents make up close to 100% of content available in Europe PMC that has been published in the last 7 years, which means that only a small minority of recent articles available in Europe PMC are missed.

Mentions: The section tagger only operates on the full text articles that are available as XML, since OCR (scanned) content lacks parsable section headings. However, Figure 2 shows that XML-formatted documents make up close to 100% of Europe PMC content published in the last 7 years.Figure 2


Section level search functionality in Europe PMC.

Kafkas Ş, Pi X, Marinos N, Talo' F, Morrison A, McEntyre JR - J Biomed Semantics (2015)

Distribution of XML to non-XML documents, including OA status, by publication year. This figure shows the distribution of XML to non-XML documents available in Europe PMC including OA status by publication year. The section tagger operates on the full text articles provided in XML format only. The figure shows that XML-formatted documents make up close to 100% of content available in Europe PMC that has been published in the last 7 years, which means that only a small minority of recent articles available in Europe PMC are missed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4359544&req=5

Fig2: Distribution of XML to non-XML documents, including OA status, by publication year. This figure shows the distribution of XML to non-XML documents available in Europe PMC including OA status by publication year. The section tagger operates on the full text articles provided in XML format only. The figure shows that XML-formatted documents make up close to 100% of content available in Europe PMC that has been published in the last 7 years, which means that only a small minority of recent articles available in Europe PMC are missed.
Mentions: The section tagger only operates on the full text articles that are available as XML, since OCR (scanned) content lacks parsable section headings. However, Figure 2 shows that XML-formatted documents make up close to 100% of Europe PMC content published in the last 7 years.Figure 2

Bottom Line: Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches.To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles.The tagger's performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%.

View Article: PubMed Central - PubMed

Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.

ABSTRACT

Background: As the availability of open access full text research articles increases, so does the need for sophisticated search services that make the most of this new content. Here, we present a new feature available in Europe PMC that allows selected sections of full text articles to be searched, including figures and reference lists. Users can now search particular parts of an article, reducing noise and allowing fine-tuning of searches.

Results: To the best of our knowledge, Europe PMC is the first service that provides a granular literature search by allowing users to target their search to particular sections of articles. This new functionality is based on a heuristic algorithm that identifies and categorises article sections into 17 pre-defined categories based on the section heading. The tagger's performance is measured against a manually curated dataset consisting of 100 full text articles with an F-score of 98.02%.

Conclusions: The section search is available from the advanced search within Europe PMC (http://europepmc.org). The source code is freely available from http://europepmc.org/ftp/oa/SectionTagger/.

No MeSH data available.