Jobski divides the method into four types: 0, 1, 2, and 3. The concepts of these Grammar types must be mastered, which is a very important test point. For these kinds of grammar, generally, books only have simple
Concepts are abstract, so many students do not really understand them. Next I will explain the concept in combination with examples.
Type 0 Grammar
Set G = (VN, VT, P, S). If each of its generative α → β is in this structure: α (VN 1_vt) * And contains at least one non-Terminator, while β( VN 1_vt) *, G is a 0-type syntax. Type 0
Also known as phrase grammar. A very important theoretical result is that the 0 grammar capability is equivalent to Turing ). In other words, any 0-type language is recursive and enumerable. Otherwise, the recursive enumerable set must be
A language of Type 0. Type 0 grammar is the least restrictive of these types of grammar, so we can see at least 0 grammar in the questions.
Type 1 Grammar
A 1-type grammar is also called a contextual grammar, which corresponds to a Linear Bounded automatic machine. It is based on the 0-type grammar. Each α → β has | β |> = | α |. | β | indicates the length of Beta.
Note: Although | β |> = | α | is required, there is a special case: α → ε also meets the 1 grammar.
If a-> BA is available, | β | = 2, | α | = 1 meets the requirements of type 1 grammar. Otherwise, AA-> A does not conform to the 1-type syntax.
Type 2 Grammar
A 2-type grammar is also called a context-independent grammar, which corresponds to a push-down machine. The type-2 Grammar is based on the type-1 grammar and satisfies that every α → β has a non-Terminator. For example, a-> BA meets the requirements of Type 2 Grammar.
For example, although AB-> Bab meets the requirements of type 1 grammar, it does not meet the requirements of Type 2 Grammar because its α = AB, while AB is not a non-Terminator.
Type 3 Grammar
A 3-type grammar is also called a regular syntax. It corresponds to a finite state automation. It satisfies the following requirements on the basis of Type 2 Syntax: A → α | α B (right linear) or a → α | B α (left linear ).
For example, a-> A, A-> AB, B-> A, B-> CB meets the requirements of Type 3 grammar. If the derivation is a-> AB, a-> AB, B-> A, B-> CB, or a-> A, A-> BA, b-> A, B-> CB does not meet the requirements of Type 3 Method
. Specifically, for example, a-> AB, a-> AB, B-> A, B-> A-> AB in CB does not conform to the definition of type 3 grammar, if you change AB to the form of "A non-terminator + A Terminator" (that is, AB ),
. In Example A-> A, A-> BA, B-> A, B-> CB, if you change B-> CB to B-> BC, because the rules a → α | α B (right linear) and a → α | B α (left linear) cannot appear in one syntax at the same time,
Only one of them can be fully satisfied.
Note: upper-case letters in the preceding example indicate non-terminologies, while lower-case letters indicate terminologies.
Context-sensitive grammar and type-0 grammar are discussed ).
We will first discuss context-related syntaxes.
The format of context-related syntax rewrite rule P is
Phi → PSI,
In this example, both PHI and PSI are symbolic strings, and the length of | Phi | ≤ | PSI |, that is, the length of PSI is not less than the length of Phi.
Now there is a formal language L = {anbncn}, which is a symbolic string consisting of n A, n B and N C, and requires n ≥ 1. The syntax g for generating this language is:
G = {VN, VT, S, P}
Vn = {s, B, c}
Vt = {a, B, c}
S = s
P:
S → asbc ①
S → ABC ②
CB → BC ③
AB → AB ④
Bb → BB ⑤
BC → BC 6
CC → CC 7
Start from S and use rule ① n-1 to obtain
S => an-1S (BC) n-1
Then, use rule ② To obtain
S => An (BC) N
Rule ③ (BC) N can be transformed into bncn. For example, if n = 3
Aaabcbcbc => aaabbccbc => aaabbcbcc => aaabbbccc,
In this way
S => anbncn
Next, use rule ④ To obtain
S => anbBn-1Cn
Then, use rule ⑤ n-1 to obtain
S => anbncn
Then, use rule 6 to obtain
S => anbncCn-1
Finally, use rules 7 n-1 to obtain
S => anbncn
This is the formal language to be generated.
In each rewrite rule of this syntax, the number of symbols on the right is always greater than or equal to the number of symbols on the left to meet the conditions.
| Phi | ≤| PSI |
Therefore, this syntax is context-related syntax.
Chomsky points out that there is a relationship between context-related syntax and context-independent Syntax:
First, each context-independent syntax is included in the context-related syntax.
In the rewriting rules of context-related syntaxes, between them, Phi and PSI are both symbol strings. When the symbol string on the left of the rewriting rule is converted into a separate non-ultimate symbol, there is a → PSI, because it is a symbol string,
Therefore, it can be replaced by ω, that is, a → ω, which is the rewriting rule of context-independent syntax.
Second, there are context-related languages that are not context-independent languages. For example, the language l3 = {α} cannot be generated using the finite state syntax specified by Chomsky, nor can it be generated using context-independent syntax.
. However, it can be generated using context-related syntax. The syntax of language L3 is as follows:
G = {VN, VT, S, P}
Vn = {s}
Vt = {a, B}
S = s
P:
S → as ①
S → BS ②
α S → α ③
In rule ③, α is any non-empty sign string on the set {a, B}. Because the length of α S is not greater than the length of α, and α S is not a single non-ultimate symbol, but a symbol string. Therefore, this syntax cannot
It can be context-independent syntax, but context-related syntax.
For example, the formal language abbabb can be generated like this: From S, use rule ① once, get s => As, use rule ② twice, get s => Abbs, use Rule ③ once to get s => abbabb.
It can be seen that context-related syntax generation is better than finite state syntax and context-independent syntax. However, the context-independent syntax can use the powerful Chomsky paradigm to achieve hierarchical division.
Therefore, in natural language computer processing, people are still willing to use context-independent syntax.
Finally, we will discuss the type-0 grammar syntax ).
The rewriting rule of the 0-type syntax is Phi → PSI. There are no restrictions except for the requirements of Phi ≈. Chomsky proves that every 0-type language is a recursive recursively
Enumerable set). It also proves that any context-related language is a language of Type 0 at the same time, and there is also a language of Type 0 that is not a context-related language. Therefore, context-related languages should be in Type 0.
In a language, it is a subset of the 0 language.
However, because there are almost no restrictions on the rewrite rules of Type 0 syntax, it is quite difficult to describe natural languages, and its generation capability is too strong, it will generate difficult to count unqualified sentences. Therefore
Among the four types of syntaxes, context-independent syntaxes are the most suitable for describing natural languages. This syntax is often called phrase structure syntax by Chinese natural language processing scholars.
Chomsky's formal language theory has a significant impact on computer science. Chomsky compares the four types of syntaxes with the Turing Machine, Linear Bounded automaton, backward-first-out automaton, and finite automatic
Machine and other four types of automated machines (automated machines are the abstract machines used to identify languages) are linked, and prove the equivalence of the syntax generation capability and the recognition capability of the linguistic automated machines. Four important results:
① If a language can be recognized by a Turing machine, it can be generated using Type 0 syntax, and vice versa.
② If a language can be recognized by a Linear Bounded automatic machine, it can be generated using context-related syntax, and vice versa.
③ If a language can be recognized by a later-in-first-out automatic machine, it can be generated using context-independent syntax, and vice versa.
④ If a language can be recognized by Finite Automation, it can be generated using finite state syntax, and vice versa.
Chomsky's conclusions provide brilliant insights into the language generation and recognition processes. This provides design, algorithm analysis, compilation technology, image recognition, and human
AI and so on are all very useful and play a huge role in natural language processing.
Computer Compilation Principle-Grammar