DIY chat Robot Two-first knowledge NLTK library

Last Update:2018-07-26 Source: Internet

Author: User

Tags nltk

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

NLTK is an excellent natural language processing toolkit, a more important tool for our chat bots, and this section describes its installation and basic use

Please respect original, reprint please indicate source website www.shareditor.com and original link address NLTK library installation

Pip Install NLTK

Execute python and download the book:

[Root@centos #] Python
python 2.7.11 (default, Jan, 08:29:18)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clan g-700.1.81)] on Darwin
Type ' help ', ' copyright ', ' credits ' or ' license ' for more information.
>>> import NLTK
>>> nltk.download ()

Select Book after point download start download

After the download is complete, enter:

>>> from Nltk.book Import *

You will see that the books can load normally as follows:

Introductory Examples for the NLTK book * * *
Loading Text1, ..., Text9 and Sent1, ..., sent9 Type the
name of T He text or sentence to view it.
Type: ' Texts () ' or ' sents () ' to list the materials.
Text1:moby Dick by Herman Melville 1851
Text2:sense and Sensibility by Jane Austen 1811
text3:the Book of Genes is
text4:inaugural Address Corpus
text5:chat Corpus
text6:monty Python and the Holy Grail
text7:wall S Treet Journal
text8:personals Corpus
text9:the Man is Thursday by G. K. Chesterton 1908

This text* is a book node, the direct input Text1 will output the title of the book:

>>> Text1
<text:moby Dick by Herman Melville 1851>

Search Text

Perform

>>> text1.concordance ("former")

20 statement contexts that contain former are displayed

Please respect original, reprint please indicate source website www.shareditor.com and original link address

We can also search for related words, such as:

>>> text1.similar ("ship")
Whale boat sea captain the world to head time crew man and Pequod line
deck Body Fishery air boats Side voyage

I entered the ship and looked for boat, all synonyms.

We can also see where a word appears in the article:

>>> Text4.dispersion_plot (["Citizens", "democracy", "freedom", "duties", "America"])

Word Statistics

Len (Text1): Returns the total number of words

Set (Text1): Returns all Word collections for text

Len (Set (TEXT4)): Returns the total number of words in a text

Text4.count ("is"): Returns the total number of occurrences of the term "is"

Freqdist (Text1): Count the word frequency of the article and save it in a list from a large to a small sort

Fdist1 = Freqdist (Text1); Fdist1.plot (cumulative=true): statistic word frequency and output cumulative image

The longitudinal axis indicates how much the total number of words is after the words in the horizontal axes, so that these words add up to almost the total number of words in the article.

Fdist1.hapaxes (): Returns a word that appears only once

Text4.collocations (): Frequent double-linked words

Natural Language processing key points

The Chinese team defeated the United States team and China beat the American team. "Win", "defeat" a pair of antonyms, but express the same meaning: China won, the United States lost. This requires the machine to automatically analyze who wins and who is responsible.

Auto-generated language: automatic generation of language-based automatic understanding of language, without understanding can not be automatically generated

Machine translation: Now a lot of machine translation, but it is difficult to achieve the best, such as we translate Chinese into English, and then translated into Chinese, and then translated into English, round and round 10, the discovery and the initial difference is very big.

Man-Machine Dialogue: This is what we want to achieve the ultimate goal, here is a "Turing test" method, that is, in 5 minutes to answer the question of 30% is passed, can pass the thought of intelligence.

Natural language processing is divided into two factions, the school is based on rules, that is, completely from the syntactic syntax and so on, according to the rules of the language analysis and processing, which in the last century experienced many years of trial failure, because there are too many rules, and many languages do not follow the pattern out of cards, imagine you chase your shadow, you run faster he You'll never catch up with it. Another school is based on statistics, that is, to collect a large number of corpus data, through the way of statistical learning to understand the language, which is more and more attention in the contemporary and has become a trend, because with the development of hardware technology, big data storage and computing is not a problem, no matter what the rules, language is a statistical law, Of course, based on the statistical shortcomings, that is, "small probability events will never happen" cause there are always some problems can not be solved.

In the next section we will solve the corpus problem based on the statistical scheme.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More