Grammar, language, and regular expression

Source: Internet
Author: User
Grammar forms are defined in classification

A context-independent grammar consists of four elements:

  • A collection of terminologies, also known as lexical units.
  • A non-terminator set, also known as a syntax variable.
  • A generative set.
  • A start symbol.

Grammar g can be abstracted into the form of tuples: g = (VN, VT, P, S)

Vn indicates a non-terminator set, VT indicates a terminator set, P indicates a generative set, and s indicates the starting symbol.

L (g) represents L (G) = {w | W ε Vt * and S = +> w}

  • The symbol string W is derived from the starting symbol
  • W is composed of Terminator only.
  • W refers to sentences in this language
  • L (G) is composed of all such sentences

Conventions:

  • Uppercase letters ~ Z indicates non-Terminator, or enclose non-Terminator with Angle brackets.
  • The lower-case symbols A, B, and C in front indicate a single Terminator.
  • The lower-case letters U, V, W, X, Y, Z, α, beta, and gamma indicate the symbol strings on (Vt 1_vn.

Grammar classification:

  1. Type 0 grammar: unrestricted grammar, phrase grammar, α-> β, α contains at least one non-Terminator.
  2. Type 1 grammar: contextual grammar. α-> β satisfies | α | <=| β |, for generative α 1A α 2-> α 1 β α 2, when beta is used to replace a, it can only be performed when the context is α 1 and α 2.
  3. Type 2 Grammar: Context-independent grammar, A-> β, where A is a single non-Terminator. When beta is used to replace a, it is irrelevant to the Context Environment of.
  4. Type 3 grammar: Regular syntax. Each generative form is a-> AB or a->.

Identified by four automatic machines:

  1. Turing Machine
  2. Linear boundary Machine
  3. Push-down Automation
  4. Finite Automaton
String and language operations

An alphabet is a finite collection of symbols.

A string in an alphabet is a finite sequence of symbols in the alphabet.

A language is a set of any number of strings in a given alphabet.

The basic terms of a string include prefix, suffix, substring, true prefix, and suffix, and subsequence.

Defines the product of two strings as the join of two strings.

Define exponential operation: S0 = ε, and for I> 0, Si is si-1s. Because ε s = s, we can see S1 = s, S2 = SS, and so on.

The most important operations in a language are: parallel, join, closure, and positive closure.

For example, if l is set to {A, B ,..., z, a, B ,..., z}, so that D represents the set of digits {0, 1 ,..., 9 }. Regard L and D as languages, and all their strings are 1 in length.

Creates new languages from L and d based on operators: l ∪ D, LD, L4, L *, L (L ∪ d) *, and D +.

Regular Expression

Regular Expressions can be recursively constructed by small regular expressions according to the following rules. Each regular expression R represents a language l (R), which is also defined recursively based on the language expression represented by R.

Induction Basics

  1. ε is a regular expression, L (ε) = {ε }.
  2. If a is a symbol on Σ, then a is a regular expression, L (A) = {}.

Induction steps

Assume that R and S are both regular expressions and represent the languages L (r) and L (s) respectively, then:

  1. (R) | (s) is a regular expression language that represents the language L (r) limit L (s ).
  2. (R) (s) is a regular expression language that represents the language L (r) L (s ).
  3. (R) * is a regular expression language (L (r ))*.
  4. (R) is a regular expression language that represents the language L (r ).

As follows:

  1. The unary operator * has the highest priority and is left-side.
  2. The connection has a lower priority, which is left-aligned.
  3. | Has the lowest priority.

The language defined by a regular expression is called a regular expression set. If the two regular expressions R and S represent the same language, R and S are equivalent and r = S.

Some Algebraic laws that are true for any regular expressions such as R, S, and T:

  • R | S = S | r, | Exchange Law
  • R | (S | T) = (r | S) | T, | combination Law
  • R (ST) = (RS) T, connection combination Law
  • R (S | T) = Rs | RT; (S | T) r = Sr | TR, connection allocation rate
  • ε r = r ε = R, and ε is the unit of the connection.
  • R * = (r | ε) *. The closure must contain ε.
  • R ** = r *, * idempotence

If Σ is a set of basic symbols, a regular expression is a sequence defined in the following form: di = RI (1 <= I <= N)

  • Each Di is a new symbol, which is not in Σ and is different from each other.
  • Each Ri is a regular expression on the alphabet Σ 5E {D1,..., di-1. Avoid recursive Definition Issues.

 

Grammar, language, and regular expression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.