Three basic factors for advanced programming languages:

Syntax: A constituent rule that describes the composition of a language (including lexical rules and grammatical rules)

Semantics: Describing the meaning of grammatical components

Pragmatics: A method for describing the use of grammatical components

Formal language theory (formal language theory) is a mathematical approach to the generation, general nature and rules of natural languages (such as English) and artificial languages (such as programming languages). Formal language is a kind of mathematical language that simulates these languages, it adopts mathematical symbols and is constituted according to strict grammatical rules. In broad terms, formal language is a collection of strings of symbols taken from an alphabet. As natural language has grammatical rules, formal language is also generated by formal grammar. A formal grammar is a set of finite variable elements, also known as non-terminator or grammatical categories. Each variable can be used to define a language, which can be defined recursively, by means of primitive symbols called Terminator, plus the variable itself, which is defined recursively. The rules relating to the variable are called the production, and the production determines how the language is constructed. A typical production representation: the language represented by a given variable contains such strings, which are obtained by linking the strings in some other meta-languages with a number of terminator.

PS: Formal language theory is an important theoretical basis of compiling principle, and the mathematical mechanism of language and grammar and the classification of language and grammar are mainly discussed in formal language theory. Grammar is a very important basic concept in formal language.

1, Alphabet and symbols: The alphabet is a non-empty and poor collection of elements. For example, a collection consisting of 26 English letters is an alphabet. The letter memento to ∑. The elements in the alphabet are called symbols. For example, the elements a,b,c in 26 English letters are called symbols.

2, symbol string and its operation

A, symbol string: The symbol's poor sequence is called the symbol string. Sequences with any of the symbols in the alphabet are symbolic strings. What symbol does not contain the symbol string is called the empty symbol string, denoted by the Greek alphabet ε.

b, the length of the symbol string: The number of symbols contained in the symbol string, called the length of the symbol string. Set X is a string of symbols whose length is recorded as |x|, for example, x = string, |x| = 6. Special have | ε| = 0, that is, the length of the empty symbol string equals 0.

c, the connection of the symbolic string. Set X, Y is a two-symbol string, and XY is referred to as a connection to Y. In particular, Εa = aε= A, where a is an arbitrary symbol string.

D, the product of the set of symbol strings. Set a A, a is a set of two symbol strings, AB represents the product of a and B, then define ab={XY | x∈a,y∈b}. In particular, {ε}a = a{ε} = A, labeled φa are = aφ=φ, where φ is an empty set.

E,εφ, the empty symbol string does not belong to an empty set.

F, the power of the symbol string. The connection of the same symbol string can be written in the form of a power. Set X is a symbol string, then define X^0 =ε

x^1 = X

x^2 = XX

X^3 = x^2x=xxx

G, the power of the set of symbolic strings. The product of the set of the same symbol string can also be written as a power. Set the symbol string set A, then define

a^0 = {ε}

A^1 = A

a^2 = AA

A^3 = a^2a=aaa

H, the positive closure of the set of symbolic strings. The positive closure of the set of symbol strings set A is a +, then a + = A1∪a2∪a3 ... An, A + is a collection of all the symbol strings on set a.

I, a reflexive closure of a set of symbolic strings, a reflexive closure of a set of symbol strings set to a *, then A * = {ε}∪a+ = a+∪{ε}. The difference between A's positive closure and A's star closure for collection A is whether it contains ε (empty symbol string), a star closure containing a null symbol string, or a positive closure.

3, grammar

Grammar is the basis of compiling principles, and is a method of describing a programming language and implementing its compiler. Grammar is a set of formal rules for defining the grammatical structure of a descriptive language, and the grammar of a programming language is to describe all the components of the programming language with the proper number of grammatical rules. grammar Can be defined as a four-tuple, grammar g = ( vn,vt, P,s ). Where Vn is a non-exhaustive set of non-terminating symbols, Vt is a set of finalization symbols, p is a production set, and S ∈VN is the grammatical identifier (also known as the start symbol). From the beginning of the grammar of the start of the symbol is repeated using the production of the non-terminator to replace and expand the language to derive a variety of sentences. Before you can define a grammar, you need to define the production formula first.

Vn: A non-exhaustive set of non-terminating symbols. In the English representation all uppercase letters are non-terminating symbols are biodegradable, and lowercase letters are the Terminator symbols are non-biodegradable. All the "<>" in Chinese representations are non-terminating symbols that are biodegradable, and without "<>", The Terminator is non-biodegradable.

Vt: A non-exhaustive set of end symbols. In the English representation all uppercase letters are non-terminating symbols are biodegradable, and lowercase letters are the Terminator symbols are non-biodegradable. All the "<>" in Chinese representations are non-terminating symbols that are biodegradable, and without "<>", The Terminator is non-biodegradable.

P: Non-exhaustive collection of production. The production is also called the rewrite rule, which means that a symbol string can be replaced with another symbol string, and the left symbol string can be replaced with the right symbol string of the resulting type. The resulting formula can be defined by ":: =" or "→" (to define the syntax result), i.e. α→β and V = VN∪VT, Vn∩vt =φ,α∈v+, β∈v*.

S: A non-terminator called a recognition symbol or a start symbol that appears at least as the left side of a single occurrence.

4, grammar classification (grammar is divided into 0 type, 1 type, 2 type, 3 type four kinds)

1,0 Grammar (phrase grammar): Type 0 Grammar the left and right β of all the production patterns are symbolic strings, without any restriction on them. That is, the left side of the resulting formula has at least one non-terminator right. If you make certain restrictions on the production of type 0 grammars, you can give the other three types of grammars.

2,1 Grammar (context-sensitive grammar): Type 1 grammar all the resulting left can contain one, two, or more than two characters, but must have at least one non-terminator. The length of the symbol string at the right of the resulting type must be greater than or equal to the length of the left symbol string. For the production of "s→ε" is a special case of type 1 grammar.

3,2 Grammar (Context-independent Grammar | Left linear grammar): All the Terminator of the 2-type grammar are a single non-terminator, and the right part is a string of symbols consisting of the terminator.

4,3 Grammar (right-linear grammar | Normal grammar): Type 3 Grammar all the production right is a single terminator or a single terminator followed by a single non-terminator. The right symbol string of the resulting type is less than or equal to 2.

For example:

Grammar g= ({a,b,t,s},{x,y,z},p,s) where P = {S→xtb|xb,t→xta|xa,b→yz,ay→ya,az→yzz}.

Explanation: We judge grammar g is what type of grammar can be judged from the most complex Type 3, and then down to judge, if not conform to type 3, then see if it is 2 type, not 2 type, and then see if it is 1 type. For the grammar g, there are two production sets in the set P is not a single non-terminator (AY→YA,AZ→YZZ), so the grammar g is not a 3-type grammar is not a 2-type grammar (2-type grammar and 3-type grammar are required to each of the production of P-left must be a single non-terminator). Look again is not the Type 1 grammar, grammar g in each of the production of p to meet the left of the production is composed of one or 2 characters and must contain a non-terminator, each production of the right symbol string length is greater than equal to the length of the left symbol string. That is, grammar g is a Type 1 grammar.

Figure out the relationship between the four grammars:

650) this.width=650; "src=" http://s2.51cto.com/wyfs02/M01/88/5F/wKiom1fzJ73xxN3cAABvqYd4O6g088.jpg "title=" 1.jpg " alt= "Wkiom1fzj73xxn3caabvqyd4o6g088.jpg"/>

Ps:4 the definition of a grammar type is gradually increasing the limit. So each formal grammar is context-independent, and each context-independent grammar is context-sensitive, and each context-sensitive grammar is a type 0 grammar.

This article from "Luo Chen's blog" blog, declined reproduced!

The formal language grammar classification of compiling principle