In 1956, Noam Chomsky divided the distribution into four types based on the different rows that imposed restrictions on the production formula and defined the corresponding four formal languages, as follows:
Grammar type |
Generate limit |
Grammar Language |
Type 0 Grammar |
α → β Alpha and beta belong to (vtuvn) *, and Alpha length is not 0 |
Language 0 |
Type 1 Grammar |
α → β Alpha and beta belong to (vtuvn) *, and Alpha length is smaller than beta length. |
1 Language/context-related language |
Type 2 Grammar |
A → β A belongs to VN and beta belongs to (vtuvn )* |
Type 2 Language/context-independent language |
Type 3 Grammar |
A → α | α B (right linear) or a → α | B α (left linear) Where a and B belong to VN and α belong to VT or empty set |
Type 3 Language/formal language |
Grammar restrictions
Type 0 to type 3 grammar, restrictions gradually increase, and are inclusive.
Type 0 Grammar
Type 0 grammar is also known as phrase grammar. A very important conclusion is that the 0 grammar capability is equivalent to a Turing machine. Any Type 0 language is recursive and enumerative. Conversely, recursive enumerated sets must be a type 0 language.
Type 0 grammar is a syntax with the least limit. The generative formula only needs to contain a non-terminator (uppercase) on the left, for example, a0 → A0, A1 → B.
Type 1 Grammar
Type 1 grammar is also known as contextual grammar. Based on Type 0 grammar, a new restriction is added: the right part of the generative formula has a length greater than or equal to the left part. The exception is S → ε, that is, it is derived from an empty set. For example, 0a0 → 011000.
Type 2 Grammar
Type 2 Grammar, also known as context-independent grammar, adds restrictions based on type 1 grammar: the left part of the generative formula is not a Terminator. For example, a → AB, AB → Bac.
Type 3 Grammar
Type-3 grammar is the strictest syntax, also known as regular syntax. On the basis of type-2 grammar, the restriction is added: the right part of the formula has at most two symbols, it also has one of the following forms: A → A, A → AB, where A, B, vn, A, VT. Note that the type 3 syntax can only be left or right linear, but not both. Left and Right Linear refer to the right part of the generative formula, and the position of non-Terminator is left or right.
Note that different syntaxes may generate the same language.
Guide tree
If all the terminal nodes are associated with the Terminator, the string consisting of the terminal nodes from left to right of each guide tree is a sentence pattern of grammar G, then the string is a sentence in grammar g, and the Guide tree is a complete guide tree.
A syntax tree should have the following features:
1. Each node has a mark, which is a symbol of V:
2. The root tag is s:
3. If a node N has at least one child except itself and marks a, a must be in VN;
4. For the direct descendant of node N, the order from left to right is node N1, N2 ...... NK, whose labels are A1, A2 ,..., AK, so a> A1, A2... AK, which must be a production formula in P.
Existing syntax G = ({a, B}, {S, A}, S, P), where: S → AAS | A, A → SBA | SS | Ba, construct a guide tree corresponding to aabaa.
From the formula, VT = {a, B}, vn = {S, A} can be obtained, and S → AAS | A, that is, S → AAS, S →, A → SBA | SS | Ba, that is, a → SBA, A → SS, A → ba. Based on the generated formula, a guide tree can be obtained:
Regular
A regular expression is also called a regular expression. It is a tool that represents the regular level. Each regular expression corresponds to a regular syntax (Type 3 syntax ).
Convert regular grammar into regular syntax
Rule 1: obtained from a → XB, B → Y: A → XB → XY
Rule 2: From A → XA | y, we can see that: A → XA, A → y, push down a → XA → x ^ 2a → x ^ 3A ...... → X * A → x * y
Rule 3: A → X, A → y to a = x | y
For example, the regular expression of the language L = {A ^ MB ^ n | M> = 0, N> = 1: because * represents 0 to multiple, M is greater than or equal to 0, so a ^ m can be expressed as a *, and N is greater than or equal to 1, which can be represented by BB, therefore, the regular expression of language l can be expressed as a * BB *.
Finite Automaton
Finite automaton is a system mathematical model with discrete input and output. Finite Automation has a limited number of States. Each state can be migrated to zero or multiple States. The input string determines the state of migration to be executed. Finite automatic machines can be recorded as a quintuple: M = (Q, Σ, Delta, q0, F), where:
- Q
- The input alphabet is rich, and each element is called an input symbol.
- The transfer function delta: Q x Σ-> 2q is a single-value ing between Q and Σ Cartesian product to Q.
- The initial status is q0 and q0 belongs to Q.
- End state set F, F included in Q
For example, M = ({S, A, B, C, F}, {}, S, {f}, Delta), and its delta is: Delta (S, 0) = B, Delta (s, 1) = A, Delta (A, 0) = F, Delta (A, 1) = C, Delta (B, 0) = C, delta (B, 1) = F, Delta (C, 0) = F, Delta (C, 1) = F. The corresponding status transition diagram is:
This finite automatic machine can be interpreted as: Starting from S, ending with F, S accepts 1 to A, s accepts 0 to B ...... If all the string W concatenated by characters received from S to F comes from Σ of the alphabet, W is recognized by this automatic machine, and m can recognize the set of string W to become the language that m can recognize.
Finite Automaton can be divided into Deterministic Finite Automaton and uncertain finite automaton. The difference is that the starting state of an uncertain Finite Automaton and the State to which it is switched are uncertain. Finite automatic machines are in the lexical analysis phase during compilation. They are used to determine the state transition and execute relevant semantic actions. For example, when an identifier is identified, add the identifier to the symbol table and send the word of the identifier to the syntax analysis program.
Conversion between regular and Finite Automaton
Each formal type r corresponds to a finite automatic machine m, and m can accept the value of the formal type.
Define the Initial State S and end state F. s goes through R to F to form a directed graph:
Conversion rules:
For example, the identifier in C can only start with "_" and contain letters and numbers. Assume that a represents the letter {A, A, B, B ,......}, B Represents the number {0 ...... 9}, then the regular expression of the identifier that C can accept can be:
(_|a)(_|a|b)*
The finite automatic diagram corresponding to this regular expression is:
Let's talk about grammar, regular expression, and automatic priority first.