Limits...
Tough mining: the challenges of searching the scientific literature.

Dickman S - PLoS Biol. (2003)

View Article: PubMed Central - PubMed

Affiliation: sdickman@cbtadvisors.com <sdickman@cbtadvisors.com>

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The standard “front end” for biomedical literature search is MEDLINE and its Entrez query system... And they would go beyond Entrez because they would search the entire medical literature in full-text format and not, as MEDLINE does, just the abstracts... Owing to lack of access, says Hirschman, “we miss a great deal by not having large corpora of full-text articles” included in the design of both the KDD Cup and the next challenge evaluation, called BioCreative, being held later this year... Many of the relevant biological data are found outside abstracts, but getting access to full text is complicated at best... Adapting tools to new domains has traditionally been one of the “critical stumbling blocks” for text-processing technology, she says... The dynamic growth of biological terminology does not help... Mueller, a nuclear physicist by background, called Textpresso “a search engine for full-text searches of abstracts and articles” that can help find answers to more challenging queries than simple keyword searches... Textpresso went up—unpublicized—on the Web in February this year and already receives a couple of hundred hits a day, a big number in a field of about 2,000 researchers... Textpresso needs full-text access to be as good as it is, says Mueller. “We noticed” that drawing on full text “greatly increased the chances of a true hit,” not a false positive... Like so many other early software products, its long-term success will hinge on demand as well as improvements made in the upgrades... Because of the ontology problem, improvements in searching in the next couple of years are likely to result from the application of ever-better techniques within existing domains... Collaborations among Wormbase, Flybase, and other model-organism database groups will help improve all their search tools... MEDLINE itself may benefit from more advanced search techniques, though these will be restricted to abstract searches... The big unknown for predicting further development of text-search tools is the path publishers will take... If each publisher or portal such as Reed-Elsevier or HighWire were to license or develop its own tool for searching its own content, the result might be better than the status quo, but would still be unsatisfying.

Show MeSH

Related in: MedlinePlus

Barely Getting below the SurfaceThe four levels of information retrieval: Google and MEDLINE both use keywords to direct a searcher to documents. But the next level has been tough to crack. Improved software would allow biologists to jump from the Web or MEDLINE to specifics with a single query. (Adapted with permission from the MITRE Corporation.)
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC261887&req=5

pbio.0000048-g001: Barely Getting below the SurfaceThe four levels of information retrieval: Google and MEDLINE both use keywords to direct a searcher to documents. But the next level has been tough to crack. Improved software would allow biologists to jump from the Web or MEDLINE to specifics with a single query. (Adapted with permission from the MITRE Corporation.)

Mentions: Language-processing software tools have been successfully applied in text-mining of nonscientific sources, especially to newswire content. Computer programs can already perform all three levels of text-mining (Figure 1) effectively: retrieving documents relevant to a given subject; extracting lists of entities or relationships among entities; and answering questions about the material, delivering specific facts in response to natural-language queries.


Tough mining: the challenges of searching the scientific literature.

Dickman S - PLoS Biol. (2003)

Barely Getting below the SurfaceThe four levels of information retrieval: Google and MEDLINE both use keywords to direct a searcher to documents. But the next level has been tough to crack. Improved software would allow biologists to jump from the Web or MEDLINE to specifics with a single query. (Adapted with permission from the MITRE Corporation.)
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC261887&req=5

pbio.0000048-g001: Barely Getting below the SurfaceThe four levels of information retrieval: Google and MEDLINE both use keywords to direct a searcher to documents. But the next level has been tough to crack. Improved software would allow biologists to jump from the Web or MEDLINE to specifics with a single query. (Adapted with permission from the MITRE Corporation.)
Mentions: Language-processing software tools have been successfully applied in text-mining of nonscientific sources, especially to newswire content. Computer programs can already perform all three levels of text-mining (Figure 1) effectively: retrieving documents relevant to a given subject; extracting lists of entities or relationships among entities; and answering questions about the material, delivering specific facts in response to natural-language queries.

View Article: PubMed Central - PubMed

Affiliation: sdickman@cbtadvisors.com <sdickman@cbtadvisors.com>

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The standard “front end” for biomedical literature search is MEDLINE and its Entrez query system... And they would go beyond Entrez because they would search the entire medical literature in full-text format and not, as MEDLINE does, just the abstracts... Owing to lack of access, says Hirschman, “we miss a great deal by not having large corpora of full-text articles” included in the design of both the KDD Cup and the next challenge evaluation, called BioCreative, being held later this year... Many of the relevant biological data are found outside abstracts, but getting access to full text is complicated at best... Adapting tools to new domains has traditionally been one of the “critical stumbling blocks” for text-processing technology, she says... The dynamic growth of biological terminology does not help... Mueller, a nuclear physicist by background, called Textpresso “a search engine for full-text searches of abstracts and articles” that can help find answers to more challenging queries than simple keyword searches... Textpresso went up—unpublicized—on the Web in February this year and already receives a couple of hundred hits a day, a big number in a field of about 2,000 researchers... Textpresso needs full-text access to be as good as it is, says Mueller. “We noticed” that drawing on full text “greatly increased the chances of a true hit,” not a false positive... Like so many other early software products, its long-term success will hinge on demand as well as improvements made in the upgrades... Because of the ontology problem, improvements in searching in the next couple of years are likely to result from the application of ever-better techniques within existing domains... Collaborations among Wormbase, Flybase, and other model-organism database groups will help improve all their search tools... MEDLINE itself may benefit from more advanced search techniques, though these will be restricted to abstract searches... The big unknown for predicting further development of text-search tools is the path publishers will take... If each publisher or portal such as Reed-Elsevier or HighWire were to license or develop its own tool for searching its own content, the result might be better than the status quo, but would still be unsatisfying.

Show MeSH
Related in: MedlinePlus