HANLP introduction of natural language processing technology

Source: Internet
Author: User

This has been a time of exposure to learning about Hadoop, so there is some understanding of natural language processing techniques. On the network of Natural language processing technology to share a lot of articles, today to share your HANLP aspects of the content.
Natural language processing technology is actually all associated with natural language computer processing technology, the purpose of natural language processing technology is to enable the computer to understand and receive our natural language input instructions, from the translation of our human language into a computer can understand and will not produce ambiguity of a language. Joining the current big data and artificial intelligence, the rapid development of natural language processing technology can help the development of AI very well.

(Big fast Dkhadoop integrated development Framework)
The hanlp I want to share here is the natural language processing technology that I use when I'm learning to use the big fast dkhadoop Big Data integration platform, which can be used to do natural language processing efficiently, such as summarizing articles, semantic discrimination and improving the accuracy and effectiveness of content retrieval.
I would like to find a popular case to introduce HANLP, a time did not think of any good case, simply from the HANLP data structure he participle simple introduction.
first, let's look at the data structure of HANLP:
Binary tire tree: The tire tree is a prefix compression structure that compresses a large number of strings and provides a get operation that is faster than map. The trie tree in HANLP uses an ordered array to store sub-nodes, and the binary search algorithm can provide faster query speed than TreeMap.
Unlike the normal trie tree where the parent node stores the child node reference, the trie tree of the even-numbered group converts the subordinate relationship of the node to the addition and validation of the character code.
For a transfer of a receive character C moving from state s to T, the condition to be met is:
Base[s] + c = t
Check[t] = s such as: base[, + store = a shop
check[Shop] = number One
Compared to the prefix compression of the trie tree (Success table), the AC automaton also implements the suffix compression (output table)
In the case of a match failure, the AC auto-opportunity jumps to the most likely successful state (fail pointer)
about HANLP participle
1. Dictionary participle
The longest word of a dictionary based on the trie tree or Acdat of a even group (that is, to find all possible words from the dictionary, in order to select the longest word)

Output: [hanlp/noun, is not/null, special/adverb, convenient/adjective,? /null]
2, Ngram participle

The bigram in the statistical corpus, according to the transfer probability, chooses the most probable sentence, achieves the elimination ambiguity the goal
3, HMM2 participle

This is a generative model of word-formation, which provides sequence labeling from second-order hidden horse model.

Known as TNT tagger, it is characterized by the use of low order events to smooth high-order events to compensate for data sparse problems in higher order models
4. CRF participle

This is a generative model of word formation, which provides sequence labeling by CRF
Compared with the advantages of HMM,CRF is able to take advantage of more features, good for oov participle, the disadvantage is that the memory is large, decoding slow.

HANLP introduction of natural language processing technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.