Natural language processing is a major branch of artificial intelligence, this paper briefly introduces the basic content of natural language processing, as a summary.
Communication with the computer in natural language is a long-cherished one. There are two benefits, the first: people no longer need to learn all kinds of computer language, direct use of their usual familiar language is good. Second: To give people a deeper understanding of human language ability and the mechanism of intelligence. The realization of this requires both the computer to understand the meaning of the expression, but also the ability to organize words to output the table to complete the dialogue. Now practical natural Language processing system has: a variety of database and expert system of natural Language interface, a variety of machine translation systems, full-text retrieval system, Automatic Digest system, but these systems achieve the effect distance people expect the "man-machine dialogue" distance is still very remote, the current realization are some basic functions.
The most important problem to realize the natural language communication between human and machine is that there are a variety of ambiguities in the natural language text and the dialogue at all levels. The existence of ambiguity makes it necessary to eliminate them with a lot of knowledge and reasoning. At a deeper level, we need to figure out how the human brain is making vague and logical judgments about language.
Basic theories of natural language processing: automata, formal logic, statistical machine learning, Chinese linguistics, formal grammar theory
Language resources: Corpus, dictionaries
Key technologies: Man coding, lexical analysis, syntactic analysis, semantic analysis, text generation, speech recognition
Application System: Text classification and clustering, information retrieval and filtering, information extraction and answer system, pinyin Chinese character conversion system, machine translation, new information retrieval, etc.
Controversy: the solution to development constraints has two directions: 1. Linguists tend to innovate the basic theory 2. Engineers tend to be perfected and optimized by existing methods
Difficulty: 1. The boundary of the word we know that in spoken language, there is no deliberate pause between words and words, and we understand the meaning because our brains are programmed to divide the words we hear into the most appropriate combinations. The same is true in writing.
2. The meaning of the disambiguation word has a different meaning, we need to choose the most fluent explanation of the word
3. Syntactic ambiguity the grammar of natural language is usually ambiguous, and the word for a sentence can be programmed with different logical meanings, and we must determine the most appropriate meaning according to contextual.
4. Defective and irregular input encountered in the dialect, hehe.
5. Language behavior and planning this is primarily a semantic understanding and implementation of the future. Say, "Can you help me get a book?" "It's better to take a book directly than to answer" yes ". It would be better to answer "no" or "too far away" than to answer "yes" and stay still. Furthermore, if a course has not been opened for a year, how many students failed to ask the course last year? "It's better to answer" than to answer "no one failed" last year.
After the summary of the introduction, to carefully understand the natural language processing technology milestones.
Reference reading: [1] Three milestones of natural language processing Technology Microsoft Research Asia Huangchangning Zhangxiaofeng 2002.5
[2]http://baike.baidu.com/view/18784.htm Baidu Encyclopedia of Natural Language Processing
Natural language Processing