Introduction to automatic machine theory, language and Computing Section 1.5 Translation

Source: Internet
Author: User
Document directory
  • 1.5.2 string
  • 1.5.3 Language
  • 1.5.4 Problems
The formula cannot be displayed. For details about garbled characters, refer to the core concepts of state machine 1.5 in the original article.

 

This section describes some core concepts distributed throughout the state machine theory. These concepts mainly include "Character Set combination" (I .e., symbol set ), "string" (a list composed of symbols in a collection of symbols), and "language" (a collection of strings composed of the same collection of symbols ).

1.5.1 Character Set combination

An alphabes is a finite, non-empty collection of symbols. In general, we use the symbol Σ to represent Character Set combination. The common alphabes set is as follows:
1. Σ = {0, 1}, that is, a binary set
2. Σ = {A, B ,..., Z}, a set of all lowercase letters
3. A collection of all ASCII codes or a combination of character sets consisting of all printable ASCII codes

1.5.2 string

A string set (sometimes a word) is a symbolic sequence composed of symbols selected from character sets, for example, 01101 is a binary string consisting of a collection of binary symbols Σ = {0, 1}, and string 111 is another string in this set.
Null String

A null string is a string with 0 characters. It is usually expressed by the symbol ε. It can be a string consisting of any character set.
String Length

Generally, the length of a string can be used to distinguish the number of characters in a string. For example, 01101 has five lengths. Generally, the length of a string can be directly expressed as the "number of characters" in the string. This is acceptable in general, but strictly speaking, it is not very correct. Because there are only two symbols: 0 and 1. In string 01101, there are 5 Characters and the length is 5. That is to say, only when you know that the number of characters represents the number of positions, it makes sense to say so.
The string length annotation method is: the length of the string ω is | ω |, for example, | 011 | = 3 and | ε | = 0.
Multiplication of character sets

If Σ is a character set combination, we can express all the strings that can be formed by the character set combination under a certain length. By using an additional marking method, we define Σ ^ K as a string of K length, and each character is in the character set Σ.
Example 1.24: Mark Σ ^ 0 = {ε}. At this time, there is no need to care about what Σ is, that is, ε is the only string with a length of 0. If Σ? Such operations are as follows: Σ ^ 1 = {0, 1}, Σ ^ 2 = {000,001,010,011,100,101,110,111, 01,}, and Σ ^ 3 =. You may be confused between Σ and Σ ^ 1. The former is a character set, which has 0, 1 and two collection elements. The latter represents a string set, which is string 0 and string 1 respectively. Each string has a length of 1. We will not try to use an independent representation for these two sets. Generally, the reader can identify whether {0, 1} is a character set combination or a string set through context.

All string sets composed of character sets Σ are usually represented by Σ ^ *, such as ^ * = {ε, 000 ,...}. Or another representation, as follows:
Σ ^ * = Σ ^ 0? Σ ^ 1 〖? Σ ^ 2...
In some cases, we want to exclude empty strings from the string set. The non-empty string set composed of Character Set Σ is represented by Σ ^ +. Then we can get two proper representation methods.
Σ ^ + = Σ ^ 1 〖? Σ ^ 2 〖? Σ ^ 3...
Σ ^ * = Σ ^ +? {ε}
String concatenation
If X and Y represent strings, XY represents the concatenation of these two strings, that is, the string is composed of one copy of x and one copy of Y. To be more precise, if X is composed of I characters, that is, x = A_1 A_2... A_ I, Y is composed of J characters: Y = B _1 B _2... B _j then the string xy is from the string xy = A_1 A_2… with the length of I + J... A_ I B _1 B _2... B _j.
Example 1.25: Make x = 01101, y = 110. Then xy = 01101110 and YX = 11001101. For all strings W, is there an equation W? =? W = W Cheng Li. That is to say? The result of concatenating with any string is the value of the string itself.

1.5.3 Language

A collection composed of all the strings selected from the string collection Σ ^ * is called a language, where Σ is a character set, expressed as l? Σ ^ *, here l is the language built on Σ. One thing to note is that if l is built on the language of Σ, l does not need to contain all the strings of Σ. So once we build language l on Σ, we also know that this language contains a superset of Character Set and Σ.
Here, the term "language" may make people feel strange. Although in general, a language can be considered as a collection of strings, for example, English is a collection of strings consisting of all valid words consisting of letters. Another example is C language and all other computer programming languages. These languages are characterized by the most reasonable strings in all letters in the character set. This character set is often a subset of the ASCII character set. Strictly speaking, different programming languages may have different character sets, mainly reflected in case characters, numbers, connectors, and mathematical operators.
However, when we study automated machines, the language we speak may be different from the language we mentioned above. Here are some abstract examples:
All languages with N 0 followed by N 1, such as N ≥ 0 :{?, 000111 ,...}
Languages with the same number of 0 and 1, such {?, 1001 ,...}
The binary result is a prime number language, such as {10, 11, 101,111,101 1 ,...}
Σ ^ * any Σ is a language
? Indicates an empty language, which can be constructed in any character set.
{?}, It is a language that only contains null strings. Of course, it is also a language constructed on any character set. But please note? = {?}; The former does not contain any string, while the latter contains an empty string.
For a language, the only important constraint is that all characters in the language must be limited, although they can have unlimited strings. That is, the character in the string that constitutes the language must come from a fixed and limited character set combination of letters.
(Translator's note: A list of language definitions in the form of a set)

1.5.4 Problems

In the theory of automatic machine, the problem is to determine whether a given string belongs to a certain language member. As we understand, problems and languages are expressed in a relationship. The exact or theoretical statement is that if Σ is a character set and L is a language built on Σ, the problem is:
Given a string W in Σ ^ *; judge whether it is in language l
Example 1.26: evaluate the gender of l_p, a language consisting of all binary strings consisting of binary numbers
Given a string consisting of multiple 0 s and multiple 1 s, if the binary value of the string is a prime number, yes is returned. Otherwise, no is returned. For some strings, this decision is obviously very simple. For example, 0011101 cannot represent a prime number. Generally, apart from 0, binary data generally starts with 1. However, it is not uncommon for string 11101 to determine whether it belongs to l_p. Generally, we need to use clear computing resources for computing: time and/or space resources.
What is potentially unsatisfactory is our definition of the question, which is usually not as simple as the decision we consider (I .e. yes or no ). we usually need to convert the input. For example, the role of the parser in the C compiler can be considered as a typical process of the concept of the problem. When we determine whether an ASCII string belongs to the language l_c, l_c represents a reasonable set of C Programs. However, in general, the compiler not only makes a conclusion, but usually generates a parsing tree and enters a symbol table, and may do more than this. Worse, the compiler needs to translate the entire C program into the target code for a specific platform, this is obviously far from the decision-making process of simply answering whether the program is a valid program "yes" or "no.
However, the definition of language problems is based on a reasonable solution to important problems in complex theories. In this theory, we are interested in the Process of deriving the boundaries of complex problems. In particular, it is important to prove that a problem cannot be solved within a limited time. In addition, to some extent, it is almost the same difficulty to judge and solve a problem.

That is to say, if we can prove that it is very difficult to determine whether a given string belongs to a language, it will also be difficult to translate into target code for that language. Therefore, if it is easy to generate the target code, we can run the converter. When the converter successfully converts it to the target code, we can say, this string is a valid element of the language. Generally, it is not difficult to determine whether the string is a valid member of the language in the last step. We can use a very fast algorithm to generate the target code and then determine the affiliation between the language and the string. Note: "It is difficult to test the affiliation, but it does not mean that compiling the language is also difficult ".
This way, by describing a known and effective method to solve a specific problem, it is inferred that another problem is also difficult to solve. The second problem is the "induction" of the first problem ". It is a very effective tool for studying complex problems. From our point of view, it is a very effective mechanism to describe the problem of language affiliation, instead of others.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.