Grammar forms are defined in classification
A context-independent grammar consists of four elements:
- A collection of terminologies, also known as lexical units.
- A non-terminator set, also known as a syntax variable.
- A generative set.
- A start symbol.
Grammar g can be abstracted into the form of tuples: g = (VN, VT, P, S)
Vn indicates a non-terminator set, VT indicates a terminator set, P indicates a generative set, and s indicates the starting symbol.
L (g) represents L (G) = {w | W ε Vt * and S = +> w}
- The symbol string W is derived from the starting symbol
- W is composed of Terminator only.
- W refers to sentences in this language
- L (G) is composed of all such sentences
Conventions:
- Uppercase letters ~ Z indicates non-Terminator, or enclose non-Terminator with Angle brackets.
- The lower-case symbols A, B, and C in front indicate a single Terminator.
- The lower-case letters U, V, W, X, Y, Z, α, beta, and gamma indicate the symbol strings on (Vt 1_vn.
Grammar classification:
- Type 0 grammar: unrestricted grammar, phrase grammar, α-> β, α contains at least one non-Terminator.
- Type 1 grammar: contextual grammar. α-> β satisfies | α | <=| β |, for generative α 1A α 2-> α 1 β α 2, when beta is used to replace a, it can only be performed when the context is α 1 and α 2.
- Type 2 Grammar: Context-independent grammar, A-> β, where A is a single non-Terminator. When beta is used to replace a, it is irrelevant to the Context Environment of.
- Type 3 grammar: Regular syntax. Each generative form is a-> AB or a->.
Identified by four automatic machines:
- Turing Machine
- Linear boundary Machine
- Push-down Automation
- Finite Automaton
String and language operations
An alphabet is a finite collection of symbols.
A string in an alphabet is a finite sequence of symbols in the alphabet.
A language is a set of any number of strings in a given alphabet.
The basic terms of a string include prefix, suffix, substring, true prefix, and suffix, and subsequence.
Defines the product of two strings as the join of two strings.
Define exponential operation: S0 = ε, and for I> 0, Si is si-1s. Because ε s = s, we can see S1 = s, S2 = SS, and so on.
The most important operations in a language are: parallel, join, closure, and positive closure.
For example, if l is set to {A, B ,..., z, a, B ,..., z}, so that D represents the set of digits {0, 1 ,..., 9 }. Regard L and D as languages, and all their strings are 1 in length.
Creates new languages from L and d based on operators: l ∪ D, LD, L4, L *, L (L ∪ d) *, and D +.
Regular Expression
Regular Expressions can be recursively constructed by small regular expressions according to the following rules. Each regular expression R represents a language l (R), which is also defined recursively based on the language expression represented by R.
Induction Basics
- ε is a regular expression, L (ε) = {ε }.
- If a is a symbol on Σ, then a is a regular expression, L (A) = {}.
Induction steps
Assume that R and S are both regular expressions and represent the languages L (r) and L (s) respectively, then:
- (R) | (s) is a regular expression language that represents the language L (r) limit L (s ).
- (R) (s) is a regular expression language that represents the language L (r) L (s ).
- (R) * is a regular expression language (L (r ))*.
- (R) is a regular expression language that represents the language L (r ).
As follows:
- The unary operator * has the highest priority and is left-side.
- The connection has a lower priority, which is left-aligned.
- | Has the lowest priority.
The language defined by a regular expression is called a regular expression set. If the two regular expressions R and S represent the same language, R and S are equivalent and r = S.
Some Algebraic laws that are true for any regular expressions such as R, S, and T:
- R | S = S | r, | Exchange Law
- R | (S | T) = (r | S) | T, | combination Law
- R (ST) = (RS) T, connection combination Law
- R (S | T) = Rs | RT; (S | T) r = Sr | TR, connection allocation rate
- ε r = r ε = R, and ε is the unit of the connection.
- R * = (r | ε) *. The closure must contain ε.
- R ** = r *, * idempotence
If Σ is a set of basic symbols, a regular expression is a sequence defined in the following form: di = RI (1 <= I <= N)
- Each Di is a new symbol, which is not in Σ and is different from each other.
- Each Ri is a regular expression on the alphabet Σ 5E {D1,..., di-1. Avoid recursive Definition Issues.
Grammar, language, and regular expression