Limits...
Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.

Masucci AP, Kalampokis A, Eguíluz VM, Hernández-García E - PLoS ONE (2011)

Bottom Line: In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free.Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties.However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Física Interdisciplinar y Sistemas Complejos, Consejo Superior de Investigaciones Científicas - Universitat de les Illes Balears, Palma de Mallorca, Spain.

ABSTRACT
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.

Show MeSH
Connectivity distribution of the minimum spanning tree of the semantic space.Degree distribution  for the undirected minimum spanning tree for the whole network representing the semantic space. In the insets the cumulative degree distribution  is displayed.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3046238&req=5

pone-0017333-g003: Connectivity distribution of the minimum spanning tree of the semantic space.Degree distribution for the undirected minimum spanning tree for the whole network representing the semantic space. In the insets the cumulative degree distribution is displayed.

Mentions: We compute the undirected MST of the complete network of Wikipedia via the Prim's algorithm [23]. The degree distribution of the MST is scale free with exponent -2.4 and a fat tail (see Fig. 3). Again the scale-free behaviour of the degree distribution tells us about the hierarchical structure of the MST of the SS. If we glimpse at Fig. 4, where a small portion of the MST centred on the Wikipedia entry nature is shown, we can have a rough idea of how this hierarchy organises itself. A very general concept, such as “nature”, hasn't got a lot of connections, but it is an important bridge for the semantic flow between less complex concepts. Those less complex concepts are in general more connected and eventually form taxonomies, which are hubs in the MST.


Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.

Masucci AP, Kalampokis A, Eguíluz VM, Hernández-García E - PLoS ONE (2011)

Connectivity distribution of the minimum spanning tree of the semantic space.Degree distribution  for the undirected minimum spanning tree for the whole network representing the semantic space. In the insets the cumulative degree distribution  is displayed.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3046238&req=5

pone-0017333-g003: Connectivity distribution of the minimum spanning tree of the semantic space.Degree distribution for the undirected minimum spanning tree for the whole network representing the semantic space. In the insets the cumulative degree distribution is displayed.
Mentions: We compute the undirected MST of the complete network of Wikipedia via the Prim's algorithm [23]. The degree distribution of the MST is scale free with exponent -2.4 and a fat tail (see Fig. 3). Again the scale-free behaviour of the degree distribution tells us about the hierarchical structure of the MST of the SS. If we glimpse at Fig. 4, where a small portion of the MST centred on the Wikipedia entry nature is shown, we can have a rough idea of how this hierarchy organises itself. A very general concept, such as “nature”, hasn't got a lot of connections, but it is an important bridge for the semantic flow between less complex concepts. Those less complex concepts are in general more connected and eventually form taxonomies, which are hubs in the MST.

Bottom Line: In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free.Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties.However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Física Interdisciplinar y Sistemas Complejos, Consejo Superior de Investigaciones Científicas - Universitat de les Illes Balears, Palma de Mallorca, Spain.

ABSTRACT
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.

Show MeSH