Reference book "Python Natural Language Processing", the book version is Python2 and NLTK2, I use the version is Python3 and NLTK3
Experimental environment Windows8.1, has Python3.4, and installed NumPy, matplotlib, reference: http://blog.csdn.net/monkey131499/article/details/50734183
installation of NLTK3, Natural Language Toolkit, Natural Language Toolkit, address: http://www.nltk.org/
Install command: Pip install NLTK
Code:
saintkings-mac-mini:~ saintking$ sudo pip install NLTK
Password:
The directory '/users/saintking/library/caches/pip/http ' or its parent directory isn't owned by the current user and the The cache has been disabled. Please check the permissions and owner for that directory. If executing pip with sudo, your may want sudo ' s-h flag.
The directory '/users/saintking/library/caches/pip ' or its parent directory isn't owned by the current user and caching W Heels has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, your may want sudo ' s-h flag.
Collecting NLTK
Requirement already satisfied:six in/library/python/2.7/site-packages (from NLTK)
Installing collected PACKAGES:NLTK
Successfully installed nltk-3.2.5
saintkings-mac-mini:~ saintking$
After the installation is complete test: import NLTK
saintkings-mac-mini:~ saintking$ python
Python 2.7.10 (default, Jul, 18:31:42)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on Darwin
Type "Help", "copyright", "credits" or "license" for more information.
>>> Import NLTK
>>> Nltk.download ()
Showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
No error indicates that the installation was successful.
NLTK contains a large number of software, data and documentation for text analysis and language structure analysis. Data resources can be downloaded and used by themselves. Address: http://www.nltk.org/data.html, data table: http://www.nltk.org/nltk_data/
To download Nltk-data, enter the command in Python:
>>>import NLTK
>>>nltk.download ()
A new window pops up to select the downloaded resource
Double-click the row to install.
>>> Import NLTK
>>> Nltk.download ()
Showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
True
>>>
Click File to change the path of the download installation. All represents all data sets, All-corpora represents only corpora and no grammar or training models, and book represents only the data for examples or exercises in books. Note that the data is saved path, either in the C disk, or in the root directory of Python, or later when the program calls the data can not be found and error.
Note: Software Installation requirements: Python, NLTK, Nltk-data must be installed, NumPy and Matplotlin recommended installation, NETWORKX and Prover9 optional Installation "
Simple test NLTK word breaker function:
---
Here are a few ways to take a look at NLTK data:
1. Loading data
>>> from Nltk.book Import *
Introductory Examples for the NLTK book * * *
Loading Text1, ..., Text9 and Sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: ' Texts () ' or ' sents () ' to list the materials.
Text1:moby Dick by Herman Melville 1851
Text2:sense and Sensibility by Jane Austen 1811
Text3:the Book of Genesis
Text4:inaugural Address Corpus
Text5:chat Corpus
Text6:monty Python and the Holy Grail
Text7:wall Street Journal
Text8:personals Corpus
Text9:the man is Thursday by G. K. Chesterton 1908
>>>
2. Search for text
>>> Print (text1.concordance (' monstrous '))
Displaying of Matches:
Ong The former, one is of a most monstrous size. ... This came towards us,
On the Psalms. "Touching that monstrous bulk of the whale or Ork we have r
ll over with a heathenish array of monstrous clubs and spears. Some were thick
D as you gazed, and wondered what monstrous cannibal and savage could ever hav
That has survived the flood; Most monstrous and most mountainous! That Himmal
They might scout at Moby Dick as a monstrous fable, or still worse and more de
th of Radney. ' " CHAPTER of the monstrous Pictures of whales. I shall ere l
ing Scenes. In connexion with the monstrous pictures of whales, I am strongly
Ere to enter upon those still more monstrous stories of them which is to be fo
Ght has been rummaged out of this monstrous cabinet there are no telling. but
of Whale-bones; For whales of a monstrous size is oftentimes cast up dead u
None
>>>
3. Similar text
>>> Print (text1.similar (' monstrous '))
Imperial subtly impalpable pitiable curious abundant perilous
Trustworthy untoward singular lamentable few determined maddens
Horrible tyrannical lazy mystifying Christian exasperate
None
>>>
4. Context of a common vocabulary
>>> Print (text2.common_contexts ([' Monstrous ', ' very '))
A_pretty Is_pretty a_lucky Am_glad Be_glad
None
>>>
5. Vocabulary distribution Map
>>> Text4.dispersion_plot ([' Citizens ', ' democracy ', ' freedom ', ' duties ', ' America ')
6. Vocabulary statistics
#Encoding=utf-8ImportNLTK fromNltk.bookImport*Print('~~~~~~~~~~~~~~~~~~~~~~~~~') Print('Length of document Text3:', Len (TEXT3))Print('Document TEXT3 Vocabulary and identifier ordering:', Sorted (set (TEXT3)))Print('Document TEXT3 The total number of vocabularies and identifiers:', Len (set (TEXT3)))Print('average number of individual words used:', Len (TEXT3) *1.0/Len (Set (TEXT3)))Print('number of words Abram used in Text3:', Text3.count ('Abram')) Print('the percentage of words Abram used in Text3:', Text3.count ('Abram') *100/len (TEXT3))
Python and Natural Language Processing (i) Building the environment