For greater flexibility, we change the way we treat grammar classes, such as S, NP, and V. We break down these atomic tags into dictionary-like structures, so that a series of values can be extracted as features.
9.1 grammar features
Start with a simple example and store features and their values in a dictionary.
>>>kim = {:, : , : >>>chase = {:, : , : }
CAT: grammar type; ORTH: spelling; REF: Give an indicator or link. In a rule-based Grammar context, such feature and feature value pairs are called feature structures.
You can also add features as needed.
>>>chase[] = >>>chase[] =
AGT: The responsible role. PAT: The responsible role. It is the object here.
For example, we want to deal with the sentence Kim chased Lee.
>>>sent = >>>tokens =>>>lee = {:, : , : >>> fs fs[] ==>>>subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(tokens[2>>>verb[] = subj[] >>>verb[] =obj[] >>> kin [, , , ]: ... %=>=>=>=>l
The same method can apply to different verbs and add more features, for example:
>>>surprise = {:, : , : : , : }
Syntaxes
The morphological attribute of a verb is changed along with the attribute of the subject and noun phrase, which is used as an agreement ).
For example:
**the dogs runs
We can use the method of improving grammar to deal with this situation. The following is an example. However, this method is very troublesome.
Improved grammar:
(7) S ->->->-> -> ->
Improved Syntax:
(8) S ->->->->->->-> -> -> -> -> ->
To avoid this explosive increase, we can use attributes and constraints.
Use attributes and constraints
Det[NUM=sg]-> =pl]-> =sg]-> =pl]-> =sg]-> =pl]->
Can we use? N to improve:
S -> NP[NUM=?n]VP[NUM==?n]-> Det[NUM=?n]N[NUM==?n]-> V[NUM=?n]
However, some words are not picky about single and multiple numbers. There are two Representation Methods. Obviously, the second one is simpler and clearer than the first one.
First:
Det[NUM=sg]-> | | =pl]-> | |
Second:
Det[NUM=?n]-> | |
The following code demonstrates most of the ideas described in this chapter so far:
>>>nltk.data.show_cfg(%S -> NP[NUM=?n]VP[NUM=NP[NUM=?n]-> N[NUM==?n]-> PropN[NUM==?n]-> Det[NUM=?n]N[NUM==pl]-> N[NUM=VP[TENSE=?t,NUM=?n]-> IV[TENSE=?t, NUM==?t,NUM=?n]-> TV[TENSE=?t,NUM=Det[NUM=sg]-> | =pl]-> | -> | | =sg]-> | =sg]-> | | | =pl]-> | | | =pres, NUM=sg]-> | =pres,NUM=sg]-> | =pres, NUM=pl]-> | =pres,NUM=pl]-> | =past] -> | =past]-> |
The following code shows how to parse a sentence:
If the syntax cannot analyze the input, trees is empty. Otherwise, it contains one or more analysis trees. It depends on whether there is syntactic ambiguity in comfort.
>>>tokens = >>> nltk >>>cp = load_parser(, trace=2>>>trees =|.Kim .like.chil.||[----] . .| PropN[NUM=]-> *|[----] . .| NP[NUM=]-> PropN[NUM=]*|[----> . .| S[]-> NP[NUM=?n]*VP[NUM=?n]{?n: |. [----] .| TV[NUM=,TENSE=]-> *|. [----> .| VP[NUM=?n,TENSE=?t]-> TV[NUM=?n,TENSE=?t]*, ?t: |. . [----]| N[NUM=]-> *|. . [----]| NP[NUM=]-> N[NUM=]*|. . [---->| S[]-> NP[NUM=?n]*VP[NUM=?n]{?n: |. [---------]| VP[NUM=,TENSE=-> TV[NUM=,TENSE=]NP[]*|[==============]| S[]-> NP[NUM=]VP[NUM=]*
Finally, you can check the analysis tree:
>>> tree trees: =] (PropN[NUM==, TENSE==, TENSE==] (N[NUM=] children))))
Terms
Simple values such as sg and pl are usually atomic. A special case of Atomic values is a Boolean value, which only specifies whether an attribute is true or false.
For example, AUX represents a helper verb.
V[TENSE=pres,aux=+]->
Sometimes, we can combine protocol features as different parts of a category to indicate the value of AGR.
Attribute value matrix: AVM
[POS == [PER = 3== fem ]]
When there are complex attributes, you can reconstruct the Syntax:
S -> NP[AGR=?n]VP[AGR==?n]-> PropN[AGR==?t,AGR=?n]-> Cop[TENSE=?t,AGR==pres, AGR=[NUM=sg,PER=3]]-> =[NUM=sg,PER=3]]-> ->