Introduction: I have been reading compilation principles recently. I chose to read textbooks at the beginning. After reading the textbooks for a while, I found the books are too complex, the author seems to be unable to show that he is at a high level without interrupting you. It's a simple concept. I don't know what it means, which makes me very depressed, later, I found some videos on the Internet and found that it was not very difficult.
This article briefly introduces the basic knowledge of grammar:
Basic formal language knowledge
- Alphabet: the non-empty finite set of symbols is called the alphabet.
- A symbolic string: a finite sequence of symbols in the alphabet is called a symbolic string of the alphabet.
- Symbol length: | A |, that is, the number of symbols in
- Prefix: the string obtained after zero or multiple symbols are deleted at the end of string.
- Suffix: The symbol string obtained after deleting zero or more symbols at the beginning of string
- Sub-string: string a that deletes the prefix and suffix
- Sub-sequence: string a is obtained by deleting zero or multiple symbols (which can be discontinuous.
- Language: any set of symbolic strings in a certain alphabet.
- Terminator: an element that cannot be further divided. It is generally replaced by lowercase letters.
- Non-Terminator: elements that can be split again. They are generally replaced by uppercase letters.
Non-formal language operations
- Merge language L and M, Lum = {S | S, l or S, m}
- Connection between language L and M, lm = {st | S? l, T? m}
- Kleene closure of language l, l * =
- Positive closure of language l, l + =
3rd records indicate a set of 0 to infinite l elements, and 4th records indicate a set of 1 to infinite l elements. For example
L1 = {A, B ,... Y, z} M1 = {1, 2... 8, 9} (l1um1) = {A, B ,... Y, Z, 1, 2... 8, 9}
(L1um1) * = {A, B ,... Y, Z, 1, 2... 8, 9, AA, 1a ,... XYZ, 6789st ..}
L1 (l1um1) * = {all letters and numbers and symbol strings with headers}
Now, we have prepared the grammar work. Let's talk about the purpose of grammar. grammar is a formal language and its role is to describe the language by means of generation: each sentence in a language can be constructed using strictly defined rules.
In my understanding, each language has its own rules, such as Chinese. The most common rule is: Subject + Predicate + object. Each programming language also has its own syntax rules, grammar is a rule that restricts word strings in source code.
Grammar
The syntax is a triplet: g = {vt, vn, S, P}
Vt: a set of non-null finite symbols. Each element of a set is an Terminator, such as ABC.
Vn: a set of non-null finite symbols. Each element of VN is a non-Terminator, such as Abed.
S: A non-terminator set. It is the starting symbol of grammar G.
P: A non-null finite set. Its elements are called generative expressions.
Grammar Constraints
Vn and VT do not contain common elements, that is, vn 1_vt = PHI (there is no intersection between upper and lower case)
V represents the VN grammar VT, which is called the alphabet of grammar G.
The rule, also known as the generative formula or generative formula, is like α → β. α is called the left part of the rule, and beta is called the right part of the rule, the START character s must appear at the left of a certain formula.
Relevant article links to compilation principles-grammar classification and guide tree