A syntax parsing
- How the syntax is stored and expressed:
1 is inch (NP (N Seattle)))). 2S stands for sentence 3np,vp,pp is noun phrase, verb phrase, preposition phrase 4 s,v,p respectively is name, move, preposition
- Syntax parsing algorithm:
How to represent the syntax in a sentence, define the following rules and variables
1 n denotes a set of non-leaf nodes, such as {S, NP, VP, N ...} 2) σ represents a set of leaf node annotations, such as {Boeing,is...} 3) R represents a set of rules, each rule can be represented as x->y1y2 ... Yn,x∈n,yi∈ (n∪σ)4) s denotes the beginning of the syntax tree annotation
The right syntax tree as above
As is called context-independent syntax , from these syntax definitions can deduce the syntax of the sentence format.
But this definition has a problem, some words have multiple parts of speech, while the Rules, np-pp PP, this, the second PP in the end modified PP or modified NP unknown.
This is the context-independent syntax for probability distributions. Give a probability to each rule. The probability and maximum result are the best results of the syntax tree.
- Methods of parsing: training, recognition
- Training stage: From a large corpus, generally Penn Treebank this type, which has already defined the syntax format, from which to extract the grammar rules, the probability of each rule is counted.
- Identification phase (predictive phase): for the sentences to be parsed, according to the parameters in the trained model. Give a syntax tree (brute force traversal, dynamic programming). Deep solution)
"NLP"