Zipf ' s Law

Source: Internet
Author: User
Tags nltk

Let F (w) is the frequency of a word w in free text. Suppose that all the words of a text is ranked according to their frequency, and the most frequent word first. ZIPF's law states that the frequency of a word type was inversely proportional to it rank (i.e., FXR = k, for some const Ant k). For example, the 50th is common word type should occur three times as frequently as the 150th most common word type.
A. Write a function to process a large text and plot word frequency against word rank using Pylab.plot. Do you confirm Zipf ' s law? (Hint:it helps to use a logarithmic scale.) What's going on at the extreme ends of the plotted line?
B. Generate random text, e.g, using random.choice ("ABCDEFG"), taking care to include the space character. You'll need to import random first. Use the string concatenation operator to accumulate characters into a (very) long string. Then tokenize this string, generate the Zipf plot as before, and compare the both plots. What does your make of Zipf ' s law in the light of this?

1  fromNltk.corpusImportGutenberg as GB2 3 defvalidate_zipf (text,ranklimit):4FDIST=NLTK. Freqdist ([w forWinchTextifW.isalpha ()])5x=Range (Ranklimit)6freq=[]7      forKeyinchFdist.keys ():8 freq.append (Fdist[key])9Y=sorted (freq,reverse=True) [: Ranklimit]Ten pylab.plot (x, y) One  A defTest (): -Text=gb.words (fileids=['Shakespeare-hamlet.txt']) -VALIDATE_ZIPF (text,150) the     

The result of the operation is:

Zipf ' s Law

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.