Common functions of natural language 2

Common functions of natural language 2_

Last Update:2016-11-13 Source: Internet

Author: User

Tags nltk

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Same enthusiasts please add

qq:231469242

SEO Keywords

Natural language, Nlp,nltk,python,tokenization,normalization,linguistics,semantic

Study Reference book: http://nltk.googlecode.com/svn/trunk/doc/book/

http://blog.csdn.net/tanzhangwen/article/details/8469491

A NLP Enthusiast Blog

http://blog.csdn.net/tanzhangwen/article/category/1297154

1. downloading data using a proxy

Nltk.set_proxy ("**.com:80")

Nltk.download ()

2. Use the sents (Fileid) function when it appears: Resource ' tokenizers/punkt/english.pickle ' not found. NLTK Downloader to obtain the resource:

Import NLTK

Nltk.download ()

Install the ' Models ' item in the installation window, then ' Find ' Punkt in ' Identifier ' column, click Download to install the packet

3. Corpus Corpus Element Acquisition function

From Nltk.corpus import Webtext

Webtext.fileids () #得到语料中所有文件的id集合

Webtext.raw (Fileid) #给定文件的所有字符集合

Webtext.words (Fileid) #所有单词集合

Webtext.sents (Fileid) #所有句子集合

Example	Description
Fileids ()	The files of the corpus
Fileids ([categories])	The files of the corpus corresponding to these categories
Categories ()	The categories of the corpus
Categories ([Fileids])	The categories of the corpus corresponding to these files
Raw ()	The raw content of the corpus
Raw (FILEIDS=[F1,F2,F3])	The raw content of the specified files
Raw (CATEGORIES=[C1,C2])	The raw content of the specified categories
Words ()	The words of the whole corpus
Words (FILEIDS=[F1,F2,F3])	The words of the specified fileids
Words (CATEGORIES=[C1,C2])	The words of the specified categories
Sents ()	The sentences of the whole corpus
Sents (FILEIDS=[F1,F2,F3])	The sentences of the specified fileids
Sents (CATEGORIES=[C1,C2])	The sentences of the specified categories
Abspath (Fileid)	The location of the given file on disk
Encoding (Fileid)	The encoding of the file (if known)
Open (Fileid)	Open a stream for reading the given corpus file
Root ()	The path to the root of locally installed corpus
Readme ()	The contents of the README file of the corpus

4. Some common functions of text processing

If text is a list of Word collections

Len (text) #单词个数

Set (text) #去重

Sorted (text) #排序

Text.count (' a ') #数给定的单词的个数

Text.index (' a ') #给定单词首次出现的位置

Freqdist (text) #单词及频率, keys () is the word, *[key] gets the value

Freqdist (text). Plot (50,cumulative=true) #画累积图

Bigrams (text) #所有的相邻二元组

Text.collocations () #找文本中频繁相邻二元组

Text.concordance ("word") #找给定单词出现的位置及上下文

Text.similar ("word") #找和给定单词语境相似的所有单词

Text.common_context ("A", "B") #找两个单词相似的上下文语境

Text.dispersion_plot ([' A ', ' B ', ' C ',...]) #单词在文本中的位置分布比较图

Text.generate () #随机产生一段文本

NLTK ' s Conditional Frequency distributions:commonly-used methods and idioms for defining,accessing, and visualizing a con Ditional Frequency distribution.of counters.

Example	Description
Cfdist = conditionalfreqdist (Pairs)	Create a conditional frequency distribution from a list of pairs
Cfdist.conditions ()	Alphabetically sorted list of conditions
Cfdist[condition]	The frequency distribution for this condition
Cfdist[condition][sample]	Frequency for the given sample for this condition
Cfdist.tabulate ()	Tabulate the conditional frequency distribution
Cfdist.tabulate (samples, conditions)	Tabulation limited to the specified samples and conditions
Cfdist.plot ()	Graphical plot of the conditional frequency distribution
Cfdist.plot (samples, conditions)	Graphical plot limited to the specified samples and conditions
Cfdist1 < Cfdist2	Test if samples in Cfdist1 occur less frequently than incfdist2

To is Continued

Common functions of natural language 2_

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More