Python and Natural Language Processing (i) Building the environment

Source: Internet
Author: User
Tags nltk

Reference book "Python Natural Language Processing", the book version is Python2 and NLTK2, I use the version is Python3 and NLTK3

Experimental environment Windows8.1, has Python3.4, and installed NumPy, matplotlib, reference: http://blog.csdn.net/monkey131499/article/details/50734183

installation of NLTK3, Natural Language Toolkit, Natural Language Toolkit, address: http://www.nltk.org/

Install command: Pip install NLTK

Code:

saintkings-mac-mini:~ saintking$ sudo pip install NLTK

Password:

The directory '/users/saintking/library/caches/pip/http ' or its parent directory isn't owned by the current user and the The cache has been disabled. Please check the permissions and owner for that directory. If executing pip with sudo, your may want sudo ' s-h flag.

The directory '/users/saintking/library/caches/pip ' or its parent directory isn't owned by the current user and caching W Heels has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, your may want sudo ' s-h flag.

Collecting NLTK

Requirement already satisfied:six in/library/python/2.7/site-packages (from NLTK)

Installing collected PACKAGES:NLTK

Successfully installed nltk-3.2.5

saintkings-mac-mini:~ saintking$

After the installation is complete test: import NLTK

saintkings-mac-mini:~ saintking$ python

Python 2.7.10 (default, Jul, 18:31:42)

[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on Darwin

Type "Help", "copyright", "credits" or "license" for more information.

>>> Import NLTK

>>> Nltk.download ()

Showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

No error indicates that the installation was successful.

NLTK contains a large number of software, data and documentation for text analysis and language structure analysis. Data resources can be downloaded and used by themselves. Address: http://www.nltk.org/data.html, data table: http://www.nltk.org/nltk_data/

To download Nltk-data, enter the command in Python:

>>>import NLTK

>>>nltk.download ()

A new window pops up to select the downloaded resource

Double-click the row to install.

>>> Import NLTK

>>> Nltk.download ()

Showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml

True

>>>

Click File to change the path of the download installation. All represents all data sets, All-corpora represents only corpora and no grammar or training models, and book represents only the data for examples or exercises in books. Note that the data is saved path, either in the C disk, or in the root directory of Python, or later when the program calls the data can not be found and error.

Note: Software Installation requirements: Python, NLTK, Nltk-data must be installed, NumPy and Matplotlin recommended installation, NETWORKX and Prover9 optional Installation "

Simple test NLTK word breaker function:

---

Here are a few ways to take a look at NLTK data:

1. Loading data

>>> from Nltk.book Import *

Introductory Examples for the NLTK book * * *

Loading Text1, ..., Text9 and Sent1, ..., sent9

Type the name of the text or sentence to view it.

Type: ' Texts () ' or ' sents () ' to list the materials.

Text1:moby Dick by Herman Melville 1851

Text2:sense and Sensibility by Jane Austen 1811

Text3:the Book of Genesis

Text4:inaugural Address Corpus

Text5:chat Corpus

Text6:monty Python and the Holy Grail

Text7:wall Street Journal

Text8:personals Corpus

Text9:the man is Thursday by G. K. Chesterton 1908

>>>

2. Search for text

>>> Print (text1.concordance (' monstrous '))

Displaying of Matches:

Ong The former, one is of a most monstrous size. ... This came towards us,

On the Psalms. "Touching that monstrous bulk of the whale or Ork we have r

ll over with a heathenish array of monstrous clubs and spears. Some were thick

D as you gazed, and wondered what monstrous cannibal and savage could ever hav

That has survived the flood; Most monstrous and most mountainous! That Himmal

They might scout at Moby Dick as a monstrous fable, or still worse and more de

th of Radney. ' " CHAPTER of the monstrous Pictures of whales. I shall ere l

ing Scenes. In connexion with the monstrous pictures of whales, I am strongly

Ere to enter upon those still more monstrous stories of them which is to be fo

Ght has been rummaged out of this monstrous cabinet there are no telling. but

of Whale-bones; For whales of a monstrous size is oftentimes cast up dead u

None

>>>

3. Similar text

>>> Print (text1.similar (' monstrous '))

Imperial subtly impalpable pitiable curious abundant perilous

Trustworthy untoward singular lamentable few determined maddens

Horrible tyrannical lazy mystifying Christian exasperate

None

>>>

4. Context of a common vocabulary

>>> Print (text2.common_contexts ([' Monstrous ', ' very '))

A_pretty Is_pretty a_lucky Am_glad Be_glad

None

>>>

5. Vocabulary distribution Map

>>> Text4.dispersion_plot ([' Citizens ', ' democracy ', ' freedom ', ' duties ', ' America ')

6. Vocabulary statistics

#Encoding=utf-8ImportNLTK fromNltk.bookImport*Print('~~~~~~~~~~~~~~~~~~~~~~~~~')  Print('Length of document Text3:', Len (TEXT3))Print('Document TEXT3 Vocabulary and identifier ordering:', Sorted (set (TEXT3)))Print('Document TEXT3 The total number of vocabularies and identifiers:', Len (set (TEXT3)))Print('average number of individual words used:', Len (TEXT3) *1.0/Len (Set (TEXT3)))Print('number of words Abram used in Text3:', Text3.count ('Abram'))  Print('the percentage of words Abram used in Text3:', Text3.count ('Abram') *100/len (TEXT3))

Python and Natural Language Processing (i) Building the environment

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.