Zipf's Law


George K. Zipf's Law says that the the frequency of a word occuring in a body of text or documents is approximately inverse to the rank of that word by its number of occurances.

For example, let's say the word "the" is the most frequently occuring word in the novel "Moby Dick," occuring 1450 times. And let's say the word "with" is the second-most frequently occuring word in that novel.

Well, we would then expect "with" to occur 725 times (1450/2 or the number of occurences of the top ranked word divided by the rank of the second ranked word).

Similarly, we would expect the third-most frequently occuring word (ranked 3rd) to occur 483 times (1450/3), the 4th ranked word to occur 362 times (1450/4), the 5th ranked word to occur 290 times (1450/5), and so on.

Many others, including Benoit Mandelbrot, have expanded on Zipf's Law.


Return to the main menu.