Limits...
Tough mining: the challenges of searching the scientific literature.

Dickman S - PLoS Biol. (2003)

View Article: PubMed Central - PubMed

Affiliation: sdickman@cbtadvisors.com <sdickman@cbtadvisors.com>

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The standard “front end” for biomedical literature search is MEDLINE and its Entrez query system... And they would go beyond Entrez because they would search the entire medical literature in full-text format and not, as MEDLINE does, just the abstracts... Owing to lack of access, says Hirschman, “we miss a great deal by not having large corpora of full-text articles” included in the design of both the KDD Cup and the next challenge evaluation, called BioCreative, being held later this year... Many of the relevant biological data are found outside abstracts, but getting access to full text is complicated at best... Adapting tools to new domains has traditionally been one of the “critical stumbling blocks” for text-processing technology, she says... The dynamic growth of biological terminology does not help... Mueller, a nuclear physicist by background, called Textpresso “a search engine for full-text searches of abstracts and articles” that can help find answers to more challenging queries than simple keyword searches... Textpresso went up—unpublicized—on the Web in February this year and already receives a couple of hundred hits a day, a big number in a field of about 2,000 researchers... Textpresso needs full-text access to be as good as it is, says Mueller. “We noticed” that drawing on full text “greatly increased the chances of a true hit,” not a false positive... Like so many other early software products, its long-term success will hinge on demand as well as improvements made in the upgrades... Because of the ontology problem, improvements in searching in the next couple of years are likely to result from the application of ever-better techniques within existing domains... Collaborations among Wormbase, Flybase, and other model-organism database groups will help improve all their search tools... MEDLINE itself may benefit from more advanced search techniques, though these will be restricted to abstract searches... The big unknown for predicting further development of text-search tools is the path publishers will take... If each publisher or portal such as Reed-Elsevier or HighWire were to license or develop its own tool for searching its own content, the result might be better than the status quo, but would still be unsatisfying.

Show MeSH
The Impact of Challenge Evaluations (and Investment Dollars)Driven by investment and competition as well as the pressure of regular challenge evaluations, error rates in speech recognition have dropped steadily, to the point where the technology has become standard from directory assistance to travel to financial information. Error rates drop by a factor of two every two years as challenge evaluations attract wide participation. (Graph adapted with permission from the MITRE Corporation.) Source: Pallett D, Garofolo J, Fiscus J (2000) Measurements in support of research accomplishments. Communications of the ACM: Special section on broadcast news understanding.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC261887&req=5

pbio.0000048-g002: The Impact of Challenge Evaluations (and Investment Dollars)Driven by investment and competition as well as the pressure of regular challenge evaluations, error rates in speech recognition have dropped steadily, to the point where the technology has become standard from directory assistance to travel to financial information. Error rates drop by a factor of two every two years as challenge evaluations attract wide participation. (Graph adapted with permission from the MITRE Corporation.) Source: Pallett D, Garofolo J, Fiscus J (2000) Measurements in support of research accomplishments. Communications of the ACM: Special section on broadcast news understanding.

Mentions: The good news from news-mining is that improvement seems to arrive in direct proportion to the time and energy expended by the research community. Similar improvement has occurred in speech recognition by computers, she adds (Figure 2). When people took successively harder problems and worked on them for four or five years, she explains, it caused error rates to drop, as a rule, by a factor of two every two years.


Tough mining: the challenges of searching the scientific literature.

Dickman S - PLoS Biol. (2003)

The Impact of Challenge Evaluations (and Investment Dollars)Driven by investment and competition as well as the pressure of regular challenge evaluations, error rates in speech recognition have dropped steadily, to the point where the technology has become standard from directory assistance to travel to financial information. Error rates drop by a factor of two every two years as challenge evaluations attract wide participation. (Graph adapted with permission from the MITRE Corporation.) Source: Pallett D, Garofolo J, Fiscus J (2000) Measurements in support of research accomplishments. Communications of the ACM: Special section on broadcast news understanding.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC261887&req=5

pbio.0000048-g002: The Impact of Challenge Evaluations (and Investment Dollars)Driven by investment and competition as well as the pressure of regular challenge evaluations, error rates in speech recognition have dropped steadily, to the point where the technology has become standard from directory assistance to travel to financial information. Error rates drop by a factor of two every two years as challenge evaluations attract wide participation. (Graph adapted with permission from the MITRE Corporation.) Source: Pallett D, Garofolo J, Fiscus J (2000) Measurements in support of research accomplishments. Communications of the ACM: Special section on broadcast news understanding.
Mentions: The good news from news-mining is that improvement seems to arrive in direct proportion to the time and energy expended by the research community. Similar improvement has occurred in speech recognition by computers, she adds (Figure 2). When people took successively harder problems and worked on them for four or five years, she explains, it caused error rates to drop, as a rule, by a factor of two every two years.

View Article: PubMed Central - PubMed

Affiliation: sdickman@cbtadvisors.com <sdickman@cbtadvisors.com>

AUTOMATICALLY GENERATED EXCERPT
Please rate it.

The standard “front end” for biomedical literature search is MEDLINE and its Entrez query system... And they would go beyond Entrez because they would search the entire medical literature in full-text format and not, as MEDLINE does, just the abstracts... Owing to lack of access, says Hirschman, “we miss a great deal by not having large corpora of full-text articles” included in the design of both the KDD Cup and the next challenge evaluation, called BioCreative, being held later this year... Many of the relevant biological data are found outside abstracts, but getting access to full text is complicated at best... Adapting tools to new domains has traditionally been one of the “critical stumbling blocks” for text-processing technology, she says... The dynamic growth of biological terminology does not help... Mueller, a nuclear physicist by background, called Textpresso “a search engine for full-text searches of abstracts and articles” that can help find answers to more challenging queries than simple keyword searches... Textpresso went up—unpublicized—on the Web in February this year and already receives a couple of hundred hits a day, a big number in a field of about 2,000 researchers... Textpresso needs full-text access to be as good as it is, says Mueller. “We noticed” that drawing on full text “greatly increased the chances of a true hit,” not a false positive... Like so many other early software products, its long-term success will hinge on demand as well as improvements made in the upgrades... Because of the ontology problem, improvements in searching in the next couple of years are likely to result from the application of ever-better techniques within existing domains... Collaborations among Wormbase, Flybase, and other model-organism database groups will help improve all their search tools... MEDLINE itself may benefit from more advanced search techniques, though these will be restricted to abstract searches... The big unknown for predicting further development of text-search tools is the path publishers will take... If each publisher or portal such as Reed-Elsevier or HighWire were to license or develop its own tool for searching its own content, the result might be better than the status quo, but would still be unsatisfying.

Show MeSH