Limits...
Universal entropy of word ordering across linguistic families.

Montemurro MA, Zanette DH - PLoS ONE (2011)

Bottom Line: We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families.While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.

View Article: PubMed Central - PubMed

Affiliation: The University of Manchester, Manchester, United Kingdom. M.Montemurro@manchester.ac.uk

ABSTRACT

Background: The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language.

Methodology/principal findings: We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.

Conclusions/significance: Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.

Show MeSH

Related in: MedlinePlus

Word correlations and entropy in real languages.(A) Normalized histograms of the fluctuation exponent                                α computed using Detrended Fluctuation Analysis                            (see Materials and Methods) for                            four languages. The medians of the distributions are statistically                            different (p<10−5, Mann-Whitney                                U-test computed over all possible pairs). (B)                            Average fluctuation exponent, , as a                            function of the average entropy of the random texts,                                    , for the                            same languages as shown in panel A.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3094390&req=5

pone-0019875-g004: Word correlations and entropy in real languages.(A) Normalized histograms of the fluctuation exponent α computed using Detrended Fluctuation Analysis (see Materials and Methods) for four languages. The medians of the distributions are statistically different (p<10−5, Mann-Whitney U-test computed over all possible pairs). (B) Average fluctuation exponent, , as a function of the average entropy of the random texts, , for the same languages as shown in panel A.

Mentions: We calculated the fluctuation exponent α for all the texts in the corpora. Its distribution was only slightly variable across languages, showing large overlapping areas. Thus, as a test for the statistical significance of their differences, we estimated significance values p for the medians of each pair of distributions, and only kept those for which the hypothesis of equal medians could be rejected (p<10−5, Mann-Whitney U-test [31]). In Figure 4A we present the distributions for the four languages that passed the statistical test. Figure 4B shows the fluctuation exponent α as a function of average entropy of the random texts for each of the languages considered in Figure 4A.


Universal entropy of word ordering across linguistic families.

Montemurro MA, Zanette DH - PLoS ONE (2011)

Word correlations and entropy in real languages.(A) Normalized histograms of the fluctuation exponent                                α computed using Detrended Fluctuation Analysis                            (see Materials and Methods) for                            four languages. The medians of the distributions are statistically                            different (p<10−5, Mann-Whitney                                U-test computed over all possible pairs). (B)                            Average fluctuation exponent, , as a                            function of the average entropy of the random texts,                                    , for the                            same languages as shown in panel A.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3094390&req=5

pone-0019875-g004: Word correlations and entropy in real languages.(A) Normalized histograms of the fluctuation exponent α computed using Detrended Fluctuation Analysis (see Materials and Methods) for four languages. The medians of the distributions are statistically different (p<10−5, Mann-Whitney U-test computed over all possible pairs). (B) Average fluctuation exponent, , as a function of the average entropy of the random texts, , for the same languages as shown in panel A.
Mentions: We calculated the fluctuation exponent α for all the texts in the corpora. Its distribution was only slightly variable across languages, showing large overlapping areas. Thus, as a test for the statistical significance of their differences, we estimated significance values p for the medians of each pair of distributions, and only kept those for which the hypothesis of equal medians could be rejected (p<10−5, Mann-Whitney U-test [31]). In Figure 4A we present the distributions for the four languages that passed the statistical test. Figure 4B shows the fluctuation exponent α as a function of average entropy of the random texts for each of the languages considered in Figure 4A.

Bottom Line: We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families.While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.

View Article: PubMed Central - PubMed

Affiliation: The University of Manchester, Manchester, United Kingdom. M.Montemurro@manchester.ac.uk

ABSTRACT

Background: The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language.

Methodology/principal findings: We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.

Conclusions/significance: Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal.

Show MeSH
Related in: MedlinePlus