Limits...
Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.

Masucci AP, Kalampokis A, Eguíluz VM, Hernández-García E - PLoS ONE (2011)

Bottom Line: In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free.Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties.However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Física Interdisciplinar y Sistemas Complejos, Consejo Superior de Investigaciones Científicas - Universitat de les Illes Balears, Palma de Mallorca, Spain.

ABSTRACT
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.

Show MeSH
The stochastic model representing the semantic space.Results for the simulation of the stochastic model representing the semantic space. This is a simulation of a toy-model for an encyclopedia of  pages with size  log-normally distributed, with first moment  and second moment . The parameter of the model are ,  and . In the top panels the out-degree distribution  (left panel) and the in-degree distribution  (right panel) of the semantic network at the percolation threshold are shown. The corresponding cumulative distributions  are displayed in the insets. In the bottom left panel we show the cluster size distribution  at the percolation threshold, the relative cumulative distribution  is displayed in the inset. In the right bottom panel we show the frequency-rank distribution  for the words in the model.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3046238&req=5

pone-0017333-g005: The stochastic model representing the semantic space.Results for the simulation of the stochastic model representing the semantic space. This is a simulation of a toy-model for an encyclopedia of pages with size log-normally distributed, with first moment and second moment . The parameter of the model are , and . In the top panels the out-degree distribution (left panel) and the in-degree distribution (right panel) of the semantic network at the percolation threshold are shown. The corresponding cumulative distributions are displayed in the insets. In the bottom left panel we show the cluster size distribution at the percolation threshold, the relative cumulative distribution is displayed in the inset. In the right bottom panel we show the frequency-rank distribution for the words in the model.

Mentions: In Fig. 5 we show that with an opportune choice of the parameters this model can generate a system with the desired properties.


Wikipedia information flow analysis reveals the scale-free architecture of the semantic space.

Masucci AP, Kalampokis A, Eguíluz VM, Hernández-García E - PLoS ONE (2011)

The stochastic model representing the semantic space.Results for the simulation of the stochastic model representing the semantic space. This is a simulation of a toy-model for an encyclopedia of  pages with size  log-normally distributed, with first moment  and second moment . The parameter of the model are ,  and . In the top panels the out-degree distribution  (left panel) and the in-degree distribution  (right panel) of the semantic network at the percolation threshold are shown. The corresponding cumulative distributions  are displayed in the insets. In the bottom left panel we show the cluster size distribution  at the percolation threshold, the relative cumulative distribution  is displayed in the inset. In the right bottom panel we show the frequency-rank distribution  for the words in the model.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3046238&req=5

pone-0017333-g005: The stochastic model representing the semantic space.Results for the simulation of the stochastic model representing the semantic space. This is a simulation of a toy-model for an encyclopedia of pages with size log-normally distributed, with first moment and second moment . The parameter of the model are , and . In the top panels the out-degree distribution (left panel) and the in-degree distribution (right panel) of the semantic network at the percolation threshold are shown. The corresponding cumulative distributions are displayed in the insets. In the bottom left panel we show the cluster size distribution at the percolation threshold, the relative cumulative distribution is displayed in the inset. In the right bottom panel we show the frequency-rank distribution for the words in the model.
Mentions: In Fig. 5 we show that with an opportune choice of the parameters this model can generate a system with the desired properties.

Bottom Line: In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free.Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties.However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process.

View Article: PubMed Central - PubMed

Affiliation: Instituto de Física Interdisciplinar y Sistemas Complejos, Consejo Superior de Investigaciones Científicas - Universitat de les Illes Balears, Palma de Mallorca, Spain.

ABSTRACT
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.

Show MeSH