1.4 non-numerical information and encoding

Last Update:2014-09-01 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Common Code for decimal number

Commonly used code of the decimal number includes 8421bcd code, remaining 3 code, and Gray code. The encoding principle is to convert the decimal 0 ~ 9. Each of the 10 Base numbers uses a 4-bit binary code instead of the Code. See table 1-30.

(1) 8421bcd code generation

8421bcd is a type of authorization code. The four-digit binary code ranges from left to right, that is, 8, 4, 2, and 1 are granted in sequence from high to low.

The correspondence between a base number and a 8421bcd code is similar to that between a decimal number and a binary number. However, this is not the case if two base numbers exist. For example, 32 is encoded as 00110010, which is obviously different from. Therefore, we regard this encoding as a false binary code, also known as binary-decimal code. In the early years, most of the input decimal data to the computer adopts this encoding, and the code is made on the tape and input through reading holes of the photoelectric device.

(2) Origin of the remaining three codes

Add 3 on the basis of the 8421bcd code, that is, 0011 of the binary code gets the additional 3 codes corresponding to 10 Base numbers. For example, if the 8421bcd code of the base number 1 is 0001, the Code 0001 + 0011 = 0100 is changed to the remaining three codes. Although the third code is transformed from the 8421bcd code, the third code is indeed an unauthorized code.

(3) Gray Code Generation

Gray code is a code generated by computation based on the binary number. N-bit long binary numbers can generate Gray Codes with equal digits. The principle of generating a gray code from a binary number is: "The high position does not move, (other bits) is produced by the difference or operation between the standard position and the high position ". The XOR operation rule is: the difference is 1, otherwise (that is, the same) is 0.

2. Common character codes-ASCII codes

ASCII code is the abbreviation of American Standard Code for information interchange. The Chinese meaning is the American Information Exchange Standard Code, which has been recognized by the International Organization for Standardization as an international standard code for information exchange, and widely used in PCs. This code is a seven-bit code, which occupies one byte in the computer and has a total of 128.

This code can be divided into control code (33) and output character code (95 ). By analyzing this code, we can obtain many criteria for information processing and information filtering. See table 1-32.

By analyzing the ASCII code, you can also obtain the character conversion method, as shown in table 1-33.

Through the above analysis, it indicates that there is an internal relationship between ASCII codes. Remember several key character codes to calculate other character codes.

3. Chinese character code

Using Chinese characters in a computer may be a major innovation. In order to facilitate the use of Chinese characters and computer processing, several encoding sets are used for the whole process of Chinese characters. See table 1-34.

After a Chinese character is entered by the keyboard, the key scanning code indicating the key position is obtained first, which is converted into an internal code.

The inner code of a Chinese character is 16 characters in length. Therefore, a Chinese character occupies two bytes of storage space. A Chinese character's shape is generally described by a dot matrix. The more the dot matrix, the more lifelike the Chinese character's shape, and the larger the storage used.

The processing process of Chinese Characters in PC is 1-5.

In order to standardize the use of Chinese characters in the computer, China published in 1981 "Communication Chinese character set (basic set) and exchange code standards", referred to as GB2312--80. A total of 6763 common Chinese characters are collected in this standard, including 3755 first-level Chinese characters and 3008 second-level Chinese characters. In addition, symbols and letters constitute the character set.

In order to use Chinese characters in a computer and be compatible with ASCII codes, the partition code and bit code are translated into a constant of 160. For example, the code "0101" is changed to "161161", and the code "9494" is changed to "254254 ". This encoding is used as the inner code of Chinese characters (and other characters in the character set. It is not hard to see that the Chinese character code in the computer has a high byte of 1, while the ASCII code is 7, and the high byte is always 0. Therefore, the Chinese character internal code is significantly different from the ASCII code.

4. Information Code Verification Method

To ensure the accuracy of information exchange, it is necessary to verify the information code. Common verification methods include parity, Hamming, and cyclic redundancy.

Information Code verification adds a verification bit to the original information code to increase the code margin. To improve the encoding efficiency, the fewer verification bits, the better. However, in terms of the verification function, the addition of fewer verification bits will reduce the ability to detect errors and correct errors.

1) Parity

The parity check is to add an odd or even parity code before or after the original information code is high. According to the principle of adding an odd number of 1 in a set of codes, the checkpoint is called the odd checksum; the bitwise added according to the principle of adding an even number of 1 in a set of codes is called an even verification code.

The checkpoint appended to the Information Code is calculated by the encoding equation. The code generated by appending the checkpoint to the Information Code is called a code word. After the code word is transmitted and exchanged, use the supervised equation to verify whether it is a code word. If it is not a code word, it indicates that it is transmitted incorrectly. It is worth noting that an even number of errors cannot be found when parity is used during transmission.

2) Haiming Verification

Haiming verification is a verification method that can check one error and one correct error. Before information is exchanged, information is first encoded by the encoding equation. After information is exchanged, the verification equation verifies the received information. If the calculated check value is 0, the information is transmitted correctly. If the check value is not 0, the information is incorrect. If the check value is 1, the error is 1st bits. If the check value is n, the error indicates the nth digit. The correct method is to reverse the Code where the error is located. The steps from coding to verification are as follows.

① In the n-bit information code, insert M verification codes. The number of M is calculated as follows:

If n = 4 (taking the 4-bit information code as an example) and M = 3, 3 verification codes must be inserted into the 4-bit information code to form a 7-bit Haiming verification code.

② The Relationship Between the subscript of the entered verification code and the H sub-mark of the Haiming code is as follows:

Calculate the position of the entered verification code in the Haiming code, and schedule the entered verification code in the Haiming code. The rest of the Haiming code is arranged according to the principle of "information code from low to high" to form an N + M-bit Haiming code.

③ After the Information Code and the entered verification code are scheduled in the location of the Haiming code, the next step is to use the known information code to calculate the value of the inserted verification code, that is, generate the Haiming code. Based on the correlation between the verification code and the Haiming code, the code equation for calculating the value of the verification bit can be obtained.

After the Haiming code is transmitted, the receiving Party can perform the sea verification. Hamming verification is performed based on the verification equation.

④ According to the principle that X "self-exclusive or" must be 0, that is, x = 0.

⑤ Error detection and Error Correction test.

3) cyclic redundancy check

Cyclic Redundancy check can check one error and correct one error. Cyclic Redundancy check uses three remainder values for encoding and three remainder values for verification. Cyclic Redundancy verification is called CRC code, which is generated by information code and generated code according to the encoding rules. If the CRC code is expressed as CRC (), it indicates a 7-bit long CRC code. Four of them are information codes, the other three digits are the remainder used in the preceding encoding. After the encoded cyclic signature code is passed and received, the generated code is used to remove the received cyclic signature code and obtain a remainder, which is the second remainder mentioned above. If the remainder is 0, it indicates that the transfer is correct. If the remainder is not 0, the error location can be obtained based on the remainder. If the code at the remainder is reversed, the transfer error is corrected. The following describes the encoding generation and verification methods.

(1) generate cyclic escape codes

The known information code n (x) is removed from the left by using an agreed R polynomial to obtain the remainder. Then, the cyclic attention code is obtained.

Note the following points during generation:

① Relationship between the left-shifted bits R and the agreed polynomial (that is, code generation. If three digits are left removed, four digits of the generated code are required. The four digits of the generated code must be three polynomials.

② The low degree of Polynomial generation must be 1.

③ When division is performed, if the highest divisor is 1, then the operator 1; otherwise, the operator 0.

④ When the offset is subtracted, the offset is used for an exclusive or operation.

(2) Verification Method

Remove the received cyclic attention code and use the generated polynomial (that is, the generated code) when encoding. If the remainder is 0, it is passed without error. If the remainder is not 0, then, the error location is determined based on the correspondence between the remainder and the error location. After the error location is found, the code value of this bit is reversed, and the error is corrected.

Difficulties

1. Chinese character encoding

(1) external codes of Chinese Characters

Currently, there are many types of external codes used for Chinese Character Input. According to the encoding features, they can be classified into the following types.

① Audio Code: it refers to the Audio Code that uses Chinese pinyin letters as the external code of Chinese characters. For example, the Chinese character "Wang" has an external code of "Wang ".

② Form code: the form character of a Chinese character is converted into a combination of buttons as the external code of the Chinese character. It is called form code. A five-stroke input code is a form code. There are many types of Chinese Character Form encoding, such as the four corners of the Chinese character form, so that "horizontal is 1; vertical is 2; point 3; horizontal is 4 ......", The "King" is encoded as 1121.

③ Audio-form code: it is called a sound-form code that uses the combination of Chinese pronunciation and physical features for encoding. For example, if the voice code of "King" is Wang and the form code is 1121, then the form code is w1121. Using sound form codes can not only avoid the stress of "Wang" and "Wang" in audio codes, but also distinguish the homography of "yu" and "Wang" in the form codes.

④ Form code: the Chinese character code generated by the form of a Chinese character expressed by 26 English letters is called form code. For example, if "Mouth" in "leaf" is represented by O and "10" in "leaf" is represented by X, the table code of "leaf" is "Ox ". According to this rule, the word "can" is "OT", and the word "order" is "it ".

⑤ Location code: arrange Chinese Characters in several rows and several columns. Behavior area, column as a bit. The sequence number of the row where each Chinese character is located and the serial number of the column where it is located are used as the Chinese character input code, which is called a location code. For example, if the word "ah" is in 16 rows (zone) and 01 columns (BIT), the "ah" code is 1601. Each Chinese Character corresponds to a four-digit location code without duplicate codes.

The location code is according to 1981 China's communication with Chinese character set basic set GB2312--80 contains Chinese characters and characters in 94 areas, 94 digits, a total of 94X94 = 8836.

From the location structure of the character set, we can see that: 01 ~ Area 09 is a symbol and letter; 10 ~ 15 is a custom symbol area; 16 ~ The 55-zone is a top-level Chinese Character in the Chinese pinyin order, with a total of 3755 Chinese characters. The so-called first-level Chinese character refers to Chinese characters with a high frequency; 56 ~ Area 87 contains 3008 second-level Chinese characters sorted by the beginning of the Department ~ Area 94 provides users with custom Chinese character spaces.

In the entire character set, there are more than 8000 Chinese characters and symbols, except for the blank backup area that is not placed in the character set, there is also a zone that is not filled with 94 Chinese characters or characters. It is not easy to remember the segments and bits of every member in the input of so many Chinese characters and characters.

This orchestration code has been listed as a national standard, so it is also called a national standard code.

(2) inner code of Chinese Characters

The encoding of Chinese characters stored, exchanged, and processed in computers is called the inner code of Chinese characters. The internal code of Chinese characters is based on the National Standard Location Code and National Standard Code, and is formed according to practical needs.

The inner code of a Chinese character is 16 characters in length. Each Chinese Character occupies two bytes. How is this internal code formed? We have to start with the national standard location code. The national standard location code is the external code of Chinese characters. If it is also used as the internal code of Chinese characters, the internal and external codes of Chinese characters can be unified into a single encoding. However, because you directly use the location code as the internal code of Chinese characters, it may cause conflicts with the Western ASCII code. Considering the compatibility of Chinese characters, there is no location code as the internal code.

The Country Code is a code based on the National Standard Location code, which is formed by "area code + 32" and "location code + 32" respectively.

The internal codes using the country code as Chinese characters are also not coordinated in practice. The maximum encoding of the Country Code is 126 | 126 in the computer, each byte is only a 7-bit code: 0111 | 1110 | 0111 | 1110, the accesskey secret of Xi Wen is also a 7-bit code. In this way, when code is processed in bytes in a computer, it is hard to identify whether it is an ascii code or a byte code of Chinese characters. Therefore, the country code is not directly used as the internal code of Chinese characters.

In order to distinguish Chinese character internal code from 7-bit ASCII code in a computer, the Chinese character internal code changes the height of each byte from 0 to 1 on the basis of the Chinese character internal code, in this way, the inner code of a Chinese character is an 8-bit code with a high position of 1 in every byte, and the value of this inner code is 128 for a 1-byte. so far, the process of generating a country code from a national standard location code and then generating an internal Chinese character code from the country code is analyzed.

The relationship between the National Standard Location Code, the national standard code, and the Chinese character internal code is: Adding 32 to the area and bit of the Country Code, the country code is formed; adding 128 to the first and second bytes of the Country Code, respectively, the inner code of the Chinese character is formed. The inner code of the Chinese character is obtained by adding 160 to the inner code of the Chinese character.

(3) font characters

The code that represents the Chinese character form. Any Chinese character or character in the GB2312--80 character set, such as the need to display or print the output, must use a font. The font code is a group of 0 and 1 code converted from the dot matrix representing the Chinese character form, the Chinese character lattice is an array of vertices formed after the Chinese character form is discretization in a rectangle area.

The more dense the dot matrix of discrete Chinese characters, the more realistic the Chinese characters are to be expressed, and the more storage space occupied. The general lattice specification is: simple 16x16 lattice; popular 24x24 lattice; Improved 32x32 lattice; precision 48x48 lattice.

The font code of Chinese characters is a matrix code of 0 and 1. When a Chinese character is displayed, the pixel of the display is stimulated with a shape code. When the code is 1, the pixel is shiny, and when the code is 0, the pixel is not bright, so that a Chinese character is displayed on the display. When a Chinese character is printed, the font size is 1 and the font size is 0. Therefore, the Chinese character is printed and output. Generally, the processing of Chinese characters is to combine the font numbers of each Chinese character into a Chinese character (font encoding) Library. When necessary, the Chinese character code is converted into an internal code, and then the internal code is used to find the Chinese Character Library, output Chinese characters after calling out the font code.

This process is also required to output Spanish letters and symbols.

2. parity code

(1) encoding Overview

In a computer, no matter which information code is a combination of 0 and 1, and in a group of code, the number of 1 contains an odd or even number, parity is a verification method established based on this fact. Parity is divided into odd and even checks. The odd verification is performed based on the odd number 1 as the benchmark. During the even verification, the odd number 1 is used as the benchmark for encoding and verification. It is a widely used verification method because it is convenient to implement parity verification and has high encoding efficiency.

(2) encoding method

The parity code is an encoding consisting of n-bit information code appended with a parity bit, so the encoding efficiency is N/(n + 1 ). The added check bit can be set in front of a set of information code or after it.

① Odd verification code: append a verification code to the n-bit information code to make it an odd number of 1. The additional verification code X is calculated as follows:

In formula, the operator is an exclusive or operator, and the algorithm is: "The difference is 1, otherwise it is 0 ".

② Even verification code: append a verification code to the n-bit information code to make it an even number of 1. The additional verification code X is calculated as follows:

Operator.

(3) Verification Method

The parity check is performed on the receiving end using the supervised equation. The supervised equation is as follows.

① The odd verification supervision equation is:

Bring the received odd check code into this equation. If the value is equal to 1, the exchange is correct; otherwise, it indicates an error. However, it should be pointed out that if two or even numbers of errors occur at the same time, the substituted equation will also produce the result with the value of 1.

② The even verification supervision equation is:

The received even verification code is substituted into this equation. If the value is equal to 0, the exchange is correct; otherwise, it indicates an error. However, if two or more even numbers of errors occur at the same time, the verification will fail.

(4) practical form

① Horizontal verification: sets the encoding form of the Verification Code by row, which is called a horizontal verification code.

② Vertical verification: the vertical verification code is used to set the verification code by column.

③ Horizontal and vertical dual verification: the horizontal and vertical dual verification codes are respectively set by row and column.

You can use both horizontal and vertical checks to check for one error and correct one error.

3. Hamming Code)

(1) encoding Overview

Hamming code was proposed by Richard Hamming of Bell Laboratory in 1950 to check and correct a wrong code. The Haiming code is formed by inserting the verification code according to the provisions of the Information Code (such as ASCII code and Chinese character code. After the Haiming code is exchanged, its value can be calculated using the supervised relationship. If the result is 0, it indicates that the transfer is correct. If the result is not 0, it indicates that the transfer is wrong. If the result is 1, it indicates that the error is in 1st bits; if the result is 2, it indicates that the error is in 2nd bits; and so on. The correct method is to reverse the value of the location where the error is located (0 to 1, 1 to 0 ).

(2) encoding Composition

The Haiming code is composed of M verification codes inserted between N-bit information codes. when the length of the Information Code is determined, the first is to calculate the number of verification codes to be inserted by the length of the Information Code n m; the second is the location of the Information Code and verification code.

(3) encoding generation

The Haiming Code contains known information code and unknown verification code. When the Information Code and the inserted verification code are scheduled, the task of generating the Haiming code is to calculate the value of the entered verification code. According to the principle of Haiming verification, it also utilizes the parity feature of the encoding and focuses this feature on the verification code. Specifically, the value of the verification code is determined based on the parity of the Information Code to be verified. (P94)

(4) encoding example

(5) Verification Method

The verification of the Haiming code is performed by the verification equation, also known as the supervised equation. In the coding of Haiming code, the code equation is obtained by the parity between the checked bit and the verified bit, and the establishment of the verification equation is also the parity between codes. (P96)

4. Cyclic Verification Code

(1) encoding Overview

Cyclic Verification Code, also known as Cyclic Redundancy Code, also known as polynomial code, is a widely used verification code in computer networks and data communication.

Cyclic verification code, as its name implies, is both a type of encoding with cyclic characteristics and a type of encoding related to redundancy. The characteristics of loops refer to the fact that the Code generated by the cyclic shift of a cyclic attention code is still a cyclic attention code. Redundancy refers to the use of a remainder when a cyclic attention code is formed.

The cyclic signature code is formed by attaching the signature bit after the Information Code. If the cyclic signature code is expressed as (), the length of the cyclic signature code is 7 bits, the Information Code is 4 bits, and the appended signature code is 7-4 = 3 bits.

The cyclic escape code generated by the Information Code requires a specific known Code while operating the Information Code. This code is called the Generation Code or the generation polynomial. Why is a known Generation Code called a generative polynomial? This is because any code in binary number or binary form can be expressed by a polynomial.

(2) encoding method

To intuitively understand the encoding method of the cyclic escape code, we first use an encoding instance to demonstrate the encoding process. (P100)

(3) encoding instance (P101)

(4) Verification Method

The received cyclic signature code is used for verification, and the generated polynomials used during encoding are also required. Use the generated polynomial to remove the received cyclic attention code. If the remainder is 0, it indicates that the transfer is correct. This principle is already set during encoding. If the remainder of division is not 0, it indicates that the received cyclic escape code is incorrect, and the location where the error is determined is only related to the generated polynomial used.

References:

[1] Liu kewu. Software Designer Examination Subject 1: computer and software engineering knowledge-test site analysis and simulation training [M]. Beijing: Tsinghua University Press, 2005.1.

1.4 non-numerical information and encoding

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More