Preface:
0.1 differences between natural and artificial languages:
(1) Natural language is full of ambiguity, and the ambiguity of artificial language can be controlled
(2) The structure of natural language is complex and diverse, while the structure of artificial language is relatively simple
(3) The semantic expression of natural language is changeable, so far there is no simple and universal way to describe it, while the semantics of artificial language can be directly defined by people.
(4) The structure and semantics of natural language are inextricably and intricately linked, there is generally no one by one corresponding isomorphic relationship, while artificial language can often be used to deal with the structure and semantics, artificial language structure and suppression
There is a neat one by one corresponding isomorphic relationship between Yu
These unique properties of natural language make natural language processing become a big problem in artificial intelligence field. 0.2 The concept difference between natural language processing and computational linguistics:
The term natural language processing is mainly used for illustrative methods, and the term computational linguistics is mainly used to illustrate the developmental direction of the current natural language processing in theory 0.3:
(1) With the development of Corpus and corpus linguistics, the processing of large-scale real text has become the main strategic goal of natural language processing, and the probabilistic and data-driven methods have almost become the standard methods of natural language processing.
(2) Increasing use of machine learning in natural language processing to acquire language knowledge
(3) More and more attention has been paid to statistical mathematics methods
(4) More and more emphasis on the role of vocabulary in natural language processing
(5) The rapid development of multi-language online natural language processing technology, which is due to the development of the network, the Internet has gradually become a multi-lingual network world, the Internet, machine translation, information retrieval, information extraction and other issues of processing become more urgent
In the Book of 18 pages, Feng teacher cited the introduction of computational linguistics principles of books, are the works of Mr. FengChapter I. 1.1 Induction of formal models in natural language processing (by: Feng Zhiwei teacher)(1) Form model based on phrase structure grammar: Mainly Chomsky's phrase structure grammar, recursive transfer network and extended transfer network, bottom-up analysis method and top-down analysis method, general syntax generator and line graph analysis method, Earley algorithm, left angle analysis method, CKY algorithm, Tomita algorithm, Chomsky's governing-restraint theory and minimalist scheme, Youchi (A.joshi) tree adjacency syntax, etc. (2) a formal model based on the unity operation: There is the lexical functional grammar of Kaplan (R.m.kaplan), Martin Kai's function-in-one syntax, Gezda (G.gazdar) The generalized phrase structure syntax, the patr of Shieber, the central language-driven phrase structure grammar of c.pollard, Perella (F.pereira) of the stator sentence syntax, etc. (3) a form model based on dependency and valence: mainly Teniere (L.TESNI ' ere The dependency grammar, the German scholar's Valence grammar, the Hudson (Hudson) Word law, etc. (4) a formal model based on the lattice grammar: filmer (C.j.fillmore), the lattice grammar and Framework Network (5) is based on the lexical doctrine of the formal model: mainly gross (M.gross) Lexical grammar, Srito (Sleator) and Tompere (Temperley) chain syntax, lexical semantics, word nets (WordNet), etc. (6) based on probabilistic and statistical form models: N-ary syntax, hidden Markov model, maximum entropy model, conditions with the airport, Chaniac ( Charniak) Probability context-independent syntax and lexical probabilistic context-independent syntax, Bayes formula, dynamic programming algorithm, noise channel model, minimum editing distance algorithm, decision tree model, weighted automata, Viterbi algorithm, forward algorithm, etc. (7) The formal model of Semantic automatic Processing: The main semantic analysis method, semantics field theory, semantic network theory, Montague's Montague Grammar, Y.a.willks's preferred semantics, shank (R.c.schank) concept dependence theory, Meritsouk (Mel ' Chuk) meaning- Text theory, etc. (8) The formal model of pragmatic automatic processing: The rhetoric structure theory of the main Mann and Thomson (Tompson), the common sense reasoning technique in text coherence, etc.1.2 Four kinds of logical grammar with great influence(1) Stator sentence Syntax (DCG) (2) Outer Syntax (XG) (3) Modified component structure Syntax (MSG) (4) Constraint logic syntax (PLG)1.3 Lexical semantics * * * (related to current work)Lexical semantics (Lexical semantics) is a product of the combination of modern semantics and modern lexicology, whose research object is the problem of word meaning in language. It originates from linguistics and is closely related to artificial intelligence and cognitive science, such as Semantic web, ontology, lexicography, knowledge representation, etc.1.4 The important role of natural language processing for social progressThis paper mainly introduces the application of several natural language processing, such as: automatic generation of weather forecast, composition automatic scoring, voice geographical navigation light1.5 Characteristics of language symbolsOn the basis of Saussure (General linguistics course), Mr. Feng summed up the 7 characteristics of language symbols: the hierarchy of language symbols, the non-element of language symbols, the discretization of language symbols, the recursion of language symbols, the randomness of language symbols, the redundancy of language symbols, the fuzziness of language symbols.