This article is mainly to summarize the recent study of papers, books related knowledge, mainly natural Language pracessing (Natural language processing, referred to as NLP) and Python mining Wikipedia infobox and other content knowledge.
This article mainly refer to the book "Natural Language processing with python"python Natural Language processing , we hope to help. Books:
Official Web Book: http://www.nltk.org/book/
CSDN:
I. Simple introduction to natural language processing
The so-called "natural language" refers to the daily communication of the language used, such as English, Hindi with the continuous evolution, it is difficult to use clear rules to portray.
Broadly speaking, "natural Language Processing" (Natural Language processing referred to as NLP) contains all of the computer's operations on natural languages, from the simplest way to compare different writing styles to the most complex "understanding" people by the frequency of counting words appearing.
NLP-based technology is increasingly widely used, such as mobile phones and handheld computers that support Input Method Association hints (predictive text) and handwriting recognition , Web search engines can search for information in unstructured text, Machine Translation can translate Chinese text into Spanish and so on.
This book includes practical experience in natural language processing by using the open Source Library of Python programming language and Natural Language Toolkit (nltk,natural Language Toolkit). The book is self-taught and can be used as a textbook for natural language processing or computer linguistics, or as an additional reading for artificial intelligence, text mining, corpus linguistics.
Why does this book use Python?
Python is a simple, powerful language that is ideal for processing language data.
As an explanatory language, Python facilitates interactivity, and as an object-oriented language, Python allows data and methods to be encapsulated and reused. As a dynamic language, Python allows programs such as properties to be added to objects when they are run, allowing variable automatic type conversions to improve development efficiency. Python comes with a powerful standard library that includes components such as image programming, numerical processing, and network connectivity.
Chapter descriptions include:How to use a short Python program to analyze interesting text information (chapter 1-3), Structured Programming Chapter (4th chapter), the main content of language processing: labeling, classification and information extraction (chapter 5-7), exploring and analyzing sentences, identifying syntactic structures and constructing methods of expressing meanings (8-10 chapters), The last chapter describes how to effectively manage language data (chapter 11th).
Two. NLTK Environment configuration
Install Python first, and download it on the website https://www.python.org/.
One way Python is user friendly is that you can run your program in the interactive interpreter and access the Python interpreter through a simple interactive development of the GUI (Interactive development environment, or idle) graphical interface. The rear configuration nltk is done in an idle environment.
Then download the NLTK with the following information:
Website Link: http://www.nltk.org/
Installation steps:http://www.nltk.org/install.html
: HTTPS://PYPI.PYTHON.ORG/PYPI/NLTK
Because my computer is a Windows system, the steps to install are as follows:
Installing NLTK3.0
test NLTK Input Code:
>>> import nltk>>> nltk.download ()
as shown in the following:
Download NLTK Book collection: Use Nltk.download () to browse available packages, the Collections tab on the Downloader shows how packages are packaged and grouped, and select the row for the book tag to get all the data you need for an example and a contact. can be referenced.
Click "Download" after the installation takes a certain amount of time, the last option book becomes "Installed":
and if you can't download it, you can double-click on the option you're interested in:
when the data is downloaded to the machine, you can use the Python interpreter to load some of them, and at the Python prompt, enter "from Nltk.book import *" To tell the interpreter to load all the text from the NLTK book. Enter Text1 to find the corresponding text name. As shown in the following:
at this point your NLTK configuration is successful.
Three. Common methods of natural language processing
1.concordance function
Features: Search for text, enter function concordance () in Text1, and find the word monstrous in Moby Dick.
>>> text1.concordance ("monstrous")
tip: You can search for 11 matching results by using the shortcut key alt+p to get the previously entered command.
2.similar function
function: a function similar () can be used to query words in parentheses that resemble the words in the context. The word index allows us to see the context of this, such as the context in which monstrous appears, such as The_pictures and The_size.
>>> text1.similar ("monstrous")
It can be found that most of the similarities with monstrous (ugly) are adjectives: curious (curious), impalpable (invisible), perilous (dangerous), lazy (lazy), etc.
My suspicions should be related to the contextual semantic structure, but not to "understand" its specific meaning. Such as: The monstrous Pictures, more monstrous stories, a monstrous size. It is obvious that monstrous serves as the adjective structure of the noun-+monstrous+ noun.
3.common_contexts function
function: Function common_contexts allows us to study two or more two words in common contexts, such as monstrous and very.
>>> text2.common_contexts (["Monstrous", "very"]) A_pretty is_pretty a_lucky Am_glad Be_glad
These words must be enclosed in square brackets and parentheses, separated by commas. Personal understanding: It seems that similar is the word associated with it, and common_contexts is a similar structure.
4.generate function
Features: Generates some random text from the function generate () to generate the article automatically.
>>> Text3.generate ()
Note: When you run this command for the first time, the output text will be different each time you run it because the statistics for the word series are being collected for a slow execution. Although the text is random, it reuses the words and phrases in the source text so that we can feel its style and content.
error: "Attributeerror: ' Text ' object has noattribute ' generate '" for the reason referenced Stackflow:
The ideal output results are as follows:
Summary: Finally hope this introductory article is helpful to everyone, if there are errors or shortcomings, Pro Haihan! There will also be an in-depth explanation of natural language processing and Python mining-related knowledge, including the wider application and understanding of NLTK. Suggest that you buy genuine books read, very good book "Python Natural Language Processing" Steven Bird, Ewan Klein & Edward Loper.
(By:eastmount 2015-4-16 night 8 o'clock http://blog.csdn.net/eastmount/)
[PYTHON+NLTK] Natural Language Processing simple introduction and NLTK bad environment configuration and Getting started knowledge (i)