During the summer vacation, I started to study NLP. I started learning NLP from Zong Chengqing's "Natural Language Processing Statistics.
I. Language: A language consists of speech, vocabulary, and syntax. Speech and text constitute two basic attributes of a language. speech is the material shell of a language, the text is the writing symbol system that records the language.
2. Speech: 1) pronunciation and speech (Articulatory phonetics)
2) acoustic acoustics (Acoustic phonetics)
3) Auditory phonetics)
4) Instrumental phonetics)
Iii. Concept of natural language processing: various types of processing and processing techniques for natural language information that is unique to humans in writing and orally using computers.
Interdisciplinary: speech recongnition)
Speech Synthesis)
Voice application: 1) Human-Machine Interaction System
2) speech translation
3) Audio document summary
4) Voice Document Retrieval
Iv. NLP content: 1) machine translation; 2) automatic summarization; 3) information retrieval; 4) document classification; 5) Question Answering System; 6) text editing and automatic proofreading; 7) Information Filtering; 8) Speech Teaching; 9) text recognition;
10) ASR; 11) text conversion; 12) Speaker Recognition, authentication, and verification
5. Layers of natural language processing design: 1) morphology
2) grammar
3) Semantics
4) Pragmatic
6. Difficulties: 1) disambiguation)
2) Processing of unknown language phenomena (such as network language, Mars language, and game LANGUAGE)
Ambiguity analysis results increase exponentially as the number of prepositional phrases increases
Combination of qataran
Types of ambiguity: 1) syntactic structure ambiguity; 2) word class ambiguity; 3) semantic ambiguity; 4) semantic ambiguity
VII. Basic NLP methods and procedures
1) corpus collection is used as the basis for statistical model establishment
2) Screening and Processing
8. keywords for Baidu after class: context-independent grammar, hmm, noisy channel model, semantic formal and Computational Problems, syntactic analysis problems, resolution of condicate ambiguity, and recognition of unregistered Words in Chinese Automatic Word Segmentation