Zipf's Law: Balancing Signal Usage Cost and Communication Efficiency.
Bottom Line:
The model incorporates the principle of least effort in communications, minimizing a combination of the information-theoretic communication inefficiency and direct signal cost.We prove a general relationship, for all optimal languages, between the signal cost distribution and the resulting distribution of signals.Zipf's law then emerges for logarithmic signal cost distributions, which is the cost distribution expected for words constructed from letters or phonemes.
View Article:
PubMed Central - PubMed
Affiliation: Department of Computer Science, University of Hertfordshire, Hatfield, United Kingdom.
ABSTRACT
We propose a model that explains the reliable emergence of power laws (e.g., Zipf's law) during the development of different human languages. The model incorporates the principle of least effort in communications, minimizing a combination of the information-theoretic communication inefficiency and direct signal cost. We prove a general relationship, for all optimal languages, between the signal cost distribution and the resulting distribution of signals. Zipf's law then emerges for logarithmic signal cost distributions, which is the cost distribution expected for words constructed from letters or phonemes. No MeSH data available. |
Related In:
Results -
Collection
License getmorefigures.php?uid=PMC4591018&req=5
Mentions: Lets assume that each letter (or phoneme) has an inherent cost which is approximate to a unit letter cost. Furthermore, assume that the cost of a word roughly equals the sum of its letter costs. A language with an alphabet of size a then has a unique one letter words which the approximate cost of one, a2 two letter words with an approximate cost of two, a3 three letter words with a cost of three, etcetera. If we rank these words by their cost, then their cost will increase approximately logarithmically with their cost rank. To illustrate, Fig 1 is a plot of the 1000 cheapest unique words formed with a ten letter alphabet (with no word length restriction), where each letter has a random cost between 1.0 and 2.0. The first few words deviate from the logarithmic cost function, as their cost only depends on the letter cost itself, but the latter words closely follow a logarithmic function. A similar derivation of the logarithmic cost function from first principles can be found in the model of Mandelbrot [9]. |
View Article: PubMed Central - PubMed
Affiliation: Department of Computer Science, University of Hertfordshire, Hatfield, United Kingdom.
No MeSH data available.