Limits...
Finding related sentence pairs in MEDLINE.

Smith LH, Wilbur WJ - Inf Retr Boston (2010)

Bottom Line: We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior.The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average.We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Branch, National Center for Biotechnology Information, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT
We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.

No MeSH data available.


Distribution of highest H-scores for 1,000 random query sentences. The H-scores are grouped by their integer value
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2992462&req=5

Fig2: Distribution of highest H-scores for 1,000 random query sentences. The H-scores are grouped by their integer value

Mentions: Next, we looked at the H-score to select pairs that are more likely to be related. The distribution of H-scores for the 1,000 pairs is shown in Fig. 2, grouped by the integer part of the score. The average H-score was 5.01, with quartiles 3.58, 4.55, and 5.92. The comparison of H-score and average rating is shown in Fig. 3, grouped the same as Fig. 2. The graph shows that as the H-score increases, the proportion of H-matches with high rating also increases. The number of pairs that received an average rating >2 was 409/750 (54.5%) for H ≥ 3.58, 303/500 (60.6%) for H ≥ 4.55, and 174/250 (69.6%) for H ≥ 5.92.Fig. 2


Finding related sentence pairs in MEDLINE.

Smith LH, Wilbur WJ - Inf Retr Boston (2010)

Distribution of highest H-scores for 1,000 random query sentences. The H-scores are grouped by their integer value
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2992462&req=5

Fig2: Distribution of highest H-scores for 1,000 random query sentences. The H-scores are grouped by their integer value
Mentions: Next, we looked at the H-score to select pairs that are more likely to be related. The distribution of H-scores for the 1,000 pairs is shown in Fig. 2, grouped by the integer part of the score. The average H-score was 5.01, with quartiles 3.58, 4.55, and 5.92. The comparison of H-score and average rating is shown in Fig. 3, grouped the same as Fig. 2. The graph shows that as the H-score increases, the proportion of H-matches with high rating also increases. The number of pairs that received an average rating >2 was 409/750 (54.5%) for H ≥ 3.58, 303/500 (60.6%) for H ≥ 4.55, and 174/250 (69.6%) for H ≥ 5.92.Fig. 2

Bottom Line: We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior.The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average.We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.

View Article: PubMed Central - PubMed

Affiliation: Computational Biology Branch, National Center for Biotechnology Information, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894 USA.

ABSTRACT
We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.

No MeSH data available.