Limits...
#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs.

Schedl M - Inf Retr Boston (2012)

Bottom Line: For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com.For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb.We further compare the results to those obtained when using Web pages as data source.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational Perception, Johannes Kepler University, Altenberger Straße 69, 4040 Linz, Austria.

ABSTRACT
Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures, normalization techniques, query schemes, index term sets, and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts. We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com. For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb. We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.

No MeSH data available.


Distribution of MAP scores among all 23,100 ranks on music set C224a
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4008152&req=5

Fig3: Distribution of MAP scores among all 23,100 ranks on music set C224a

Mentions: Table 10 shows the 10 top-ranked and the 10 bottom-ranked variants with their MAP scores (considering up to 15 nearest neighbors) for set C224a. The MAP scores of the 23,100 evaluated variants span a wide range and are quite diverse (cf. Fig. 3), with a mean of μ = 37.89 and a standard deviation of σ = 17.16. From Table 10 it can be seen that highest MAP scores can only be achieved when using QS_A, TS_A, and NORM_NO. At the other end of the ranking we see that QS_M and SIM_OVL dominate the most inferior variants.Table 10


#nowplaying Madonna: a large-scale evaluation on estimating similarities between music artists and between movies from microblogs.

Schedl M - Inf Retr Boston (2012)

Distribution of MAP scores among all 23,100 ranks on music set C224a
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4008152&req=5

Fig3: Distribution of MAP scores among all 23,100 ranks on music set C224a
Mentions: Table 10 shows the 10 top-ranked and the 10 bottom-ranked variants with their MAP scores (considering up to 15 nearest neighbors) for set C224a. The MAP scores of the 23,100 evaluated variants span a wide range and are quite diverse (cf. Fig. 3), with a mean of μ = 37.89 and a standard deviation of σ = 17.16. From Table 10 it can be seen that highest MAP scores can only be achieved when using QS_A, TS_A, and NORM_NO. At the other end of the ranking we see that QS_M and SIM_OVL dominate the most inferior variants.Table 10

Bottom Line: For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com.For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb.We further compare the results to those obtained when using Web pages as data source.

View Article: PubMed Central - PubMed

Affiliation: Department of Computational Perception, Johannes Kepler University, Altenberger Straße 69, 4040 Linz, Austria.

ABSTRACT
Different term weighting techniques such as [Formula: see text] or BM25 have been used intensely for manifold text-based information retrieval tasks. Their use for modeling term profiles for named entities and subsequent calculation of similarities between these named entities have been studied to a much smaller extent. The recent trend of microblogging made available massive amounts of information about almost every topic around the world. Therefore, microblogs represent a valuable source for text-based named entity modeling. In this paper, we present a systematic and comprehensive evaluation of different term weighting measures, normalization techniques, query schemes, index term sets, and similarity functions for the task of inferring similarities between named entities, based on data extracted from microblog posts. We analyze several thousand combinations of choices for the above mentioned dimensions, which influence the similarity calculation process, and we investigate in which way they impact the quality of the similarity estimates. Evaluation is performed using three real-world data sets: two collections of microblogs related to music artists and one related to movies. For the music collections, we present results of genre classification experiments using as benchmark genre information from allmusic.com. For the movie collection, we present results of multi-class classification experiments using as benchmark categories from IMDb. We show that microblogs can indeed be exploited to model named entity similarity with remarkable accuracy, provided the correct settings for the analyzed aspects are used. We further compare the results to those obtained when using Web pages as data source.

No MeSH data available.