How the Nlpir of Big Data semantic analysis is implemented

Last Update:2016-12-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Semantic analysis refers to the use of a given natural language (including chapters and sentences) translates into a formal expression that reflects its meaning, that is, to translate natural language that humans can understand into a form language that can be understood by computers, and to communicate with one another. It is oriented to the whole sentence, not only the semantic relationship between the main predicate and its argument, but also the semantic information contained in the non-main predicate, such as quantity (quantity), attribute (attribute) and frequency (frequency), etc.

The semantic analysis of natural language processing is the bottleneck of natural language processing technology to deep application . There are two main methods of semantic analysis on concept and relationship level: statistic-based feature vector extraction method and semantic dictionary (WordNet, hownet, etc.) Semantic similarity calculation method. For the specific application of the two methods have a large deficiency, the former because the relationship between the statistical model is only applicable to paragraph, text or multi-document and other coarse-grained semantic analysis, and not suitable for the application of sentence vocabulary level, the latter can easily deal with the relationship between the entity concept.

Nlpir text Search and mining system for the needs of Internet content processing, the fusion of natural language understanding, Web search and text mining technology, provides two of technology for the development of the basic toolset. It provides a visual display of the effect of middleware processing, and can also be used as a processing tool for small-scale data.

First, Chinese participle

1, the Word segmentation method based on string matching. This method, according to different scanning methods, find the thesaurus for Word segmentation.

2, the total segmentation method. It first cuts out all the possible words that match the thesaurus, and then uses the statistical language model to determine the optimal segmentation results.

3, word-word word segmentation method. Can be understood as the classification of the word problem, that is, natural language processing in the sequence labeling problem.

4. Chinese character segmentation in dictionaries and rules

When slicing, the string to be segmented is used to match the entry in the dictionary, and if the match succeeds, it is cut into a word.

5. Segmentation method of statistical learning based on large-scale corpus

This kind of method mainly uses the various probability information obtained from the large-scale corpus to divide the Chinese string. This method often does not need the manual maintenance rule, also does not need the complex linguistics knowledge, and the expansibility is better, is the present participle algorithm more commonly the practice.

6. Chinese character segmentation method combining rule and statistic method

Now most of the word segmentation algorithms are based on the combination of rules and statistics, which can reduce the dependence of statistics on corpus, make full use of the existing lexical information, and make up the deficiency of the rule method.

Second, the word mark

a text string in addition to participle, but also need to do part-of-speech tagging, named entity recognition, new word discovery and so on. Usually there are two kinds of schemes, one is the first participle, then the part-of-speech labeling, the other is to use a model to complete these tasks.

Three, the language model

A language model is a probabilistic model used to calculate the probability of a sentence generation, i.e. P (W_1,w_2,w_3...w_m), m represents the total number of words.

N-gram language model is simple and effective, but it only considers the position of the word, does not consider the similarity between words, Word method and word meaning, and there are sparse data problems, so later, gradually put forward more language models

Neural network language model, which is based on N-gram, first, each word w_{m-n+1},w_{m-n+2} ... w_{m-1} maps to the word vector space, and then combines the word vectors of each word into a larger vector as a neural network input, and the output is P (w_m).

How the Nlpir of Big Data semantic analysis is implemented

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How the Nlpir of Big Data semantic analysis is implemented

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How the Nlpir of Big Data semantic analysis is implemented

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support