[Python + nltk] Brief Introduction to natural language processing and NLTK environment configuration and introduction (I)
1. Introduction to Natural Language Processing
The so-called "Natural Language" refers to the language used for daily communication, such as English and Hindi. It is difficult to use clear rules to portray it as it evolves.
In a broad sense, "Natural Language Processing" (NLP) includes operations performed by all computers on Natural Language, from the simplest occurrence frequency of counting words to comparing different writing styles to the most complex "Understanding" of what people say.
NLP-based technology is increasingly widely used, such as mobile phones and handheld computers supporting input methods (predictive text) and handwriting recognition, the network search engine can find information in unstructured text, machine translation can translate Chinese text into Spanish and so on.
This book includes the practical experience of Natural Language processing by using the Python programming Language and the open-source library of the Natural Language Toolkit (NLTK, Natural Language Toolkit. This book can be self-taught, can be used as a textbook for natural language processing or computer linguistics courses, or supplementary books for AI, Text Mining, and corpus linguistics courses.
Why does this book use Python?
Python is a simple and powerful language conversion method and is suitable for processing language data.
As an interpreted language, Python facilitates interactive conversion. As an object-oriented language, Python allows data and methods to be encapsulated and reused. As a dynamic language, Python allows attributes to be added to objects only when the program is running. This allows automatic type conversion of variables to improve development efficiency. Python comes with a powerful standard library, including image programming, numerical processing, and network connections.
This section describes how to use a very short Python program to analyze text information of interest (Chapter 1-3), structured programming (chapter 2), and language processing: tagging, classification, and information extraction (Chapter 5-7), exploration and analysis of sentences, recognition of syntactic structures, and construction of methods for expressing sentences (chapter 8-10) the last chapter describes how to effectively manage language data (Chapter 1 ).
Ii. NLTK environment Configuration
First install Python. One of the user-friendly methods of Python is that you can run your program in the Interactive interpreter and use a simple Interactive DeveLopment Environment (IDLE) to access the Python interpreter. The NLTK is configured later in the IDLE environment.
Then download the NLTK. The information is as follows:
Since my computer is a windows system, the installation steps are as follows:
Install NLTK3.0
Test NLTK input code:
>>> import nltk>>> nltk.download()
As shown in:
Download NLTK Library: Use nltk. download () Browse available software packages. The Collections tab on the download tool shows how the software packages are packaged into groups. Select the row marked by book to obtain the example of this book and all the data required for the contact. See references.
After clicking Download, the installation takes some time. The final option book becomes "Installed":
If the download fails, you can double-click the download option you are interested in:
After the data is downloaded to the machine, you can use the Python interpreter to load some of the data and enter "from nltk. book import * "tells the interpreter to load all text from the NLTK book and enter text1 to find the corresponding text name. As shown in:
Now your NLTK is configured successfully.
Iii. Common Natural Language Processing Methods
1. concordance Function
Function: Search for text. In text1, enter the concordance () function to search for the word monstrous in whale logs.
>>> text1.concordance(monstrous)
Tip: You can use the shortcut key Alt + P to obtain the previously entered command and search for 11 matching results.
2. similar functions
Function: You can use the similar () function to query words in parentheses that are similar in the context. Word index shows the context, such as the context of monstrous, such as the_pictures and the_size.
>>> text1.similar(monstrous)
It can be found that most of the similarities with monstrous are adjectives: curious (curious), impalpable (invisible), perilous (dangerous), lazy (lazy), etc.
I suspect that it is related to the context semantic structure, but it does not "understand" its specific meaning. For example, the Monstrous Pictures, more monstrous stories, and a monstrous size. Obviously, monstrous acts as the adjective structure for modifying nouns-header + monstrous + noun.
3. common_contexts Function
Function: The common_contexts function allows us to study the context of two or more words, such as monstrous and very.
>>> text2.common_contexts([monstrous,very])a_pretty is_pretty a_lucky am_glad be_glad
The words must be enclosed by square brackets and parentheses, separated by commas. Personal Understanding: It seems that similar is related to it, while common_contexts is similar in structure.
4. generate Function
Function: generate random texts by using the generate () function to automatically generate articles.
>>> text3.generate()
Note: When you run this command for the first time, the command is executed slowly due to the collection of word sequence statistics. Each time you run it, the output text will be different. Although the text is random, it reuse words and phrases in the source text, so that we can feel its style and content.
Error: "AttributeError: 'text' object has noattribute 'generate'". For the reason, refer to StackFlow:
The ideal output is as follows:
Conclusion: I hope this entry-level article will help you. If there are any errors or deficiencies, thank you! In the future, we will explain in depth the knowledge of natural language processing and Python mining, as well as the extensive application and understanding of NLTK. We recommend that you purchase genuine books for reading. This is a good book by Steven Bird, Ewan Klein & Edward Loper.