Chinese character location code, GB code (Exchange code) and the method of intra-machine code conversion

Source: Internet
Author: User


First, Location code

In order to adapt to the need of computer processing Chinese character information, China promulgated the GB2312 National Standard in 1981. The standard selects 6,763 commonly used Chinese characters (among them, the first class commonly used Chinese characters 3,755, two Chinese characters 3,008) and 682 non-Chinese characters, and has stipulated the standard code for each character, in order to carry on the Chinese text exchange between different computer systems.

The GB2312 character set forms a 94-row, 94-column, two-dimensional table that is called the area code, and the column is called a bit number, and each character or symbol in the Code table is represented by its area code and bit number.

For the convenience of processing and storage, the area code and the bit number of each Chinese character are represented by a byte in the computer. For example, the "learning" word has an area code of 49 and a bit number of 07, and its location code is 4907, denoted by a 2-byte binary number:

00110001 00000111

650) this.width=650; "src=" http://img1.ph.126.net/-X5oaQiGattKWSstY1FPcA==/6631437201610768489.jpg "style=" border : 0px;height:auto; "/>

Second, the National Standard Exchange code

The location code cannot be used for Chinese character communication because it may conflict with the control Code (00H~1FH) used by the communication (i.e., 0~31). (because ASCII code encoded in the control signal encoding and character encoding, the first 32 is a control code, such as carriage return, newline, backspace, etc., in order to avoid these control codes, GB code in the location code based on the addition of 20H, that is, 32 of 16 binary) ISO2022 stipulates that each Chinese character area code and the number must be added 32 (that is, the binary number 00100000), the code after such processing is called GB Exchange code, referred to as the Exchange code or International code, therefore, the "learning" word of the national standard Exchange code calculation as:

00110001 00000111

+00100000 +00100000

--------------------------

01010001 00100111

The hexadecimal number means 5127H.

Three, in-machine code

Because Chinese characters and Western characters are commonly used in text, Chinese characters are confused with single-byte ASCII codes if they are not specifically identified. One solution to this problem is to treat a Chinese character as a two extended ASCII code, making the highest bit of two bytes representing the GB2312 Kanji 1. This high-level 1 double-byte Chinese character coding is the GB2312 character of the in-machine code, referred to as internal code.

Therefore, the "learning" word of the in-machine code is:

11010001 10100111

The d1a7h is represented by the 16 binary representation.

Finally, it is pointed out that the input code of Chinese characters and the in-machine code of Chinese characters are the concepts of different categories. No matter what encoding input method (such as pinyin, wubi font, etc.) to enter a Chinese character, its in-machine code is the same.

Iv. Summary

The conversion relationship between location code, GB code and internal code of machine

Method:

(1) The location code is first converted to hexadecimal number representation

(2) (hexadecimal representation of location code) +2020h= GB code;

(3) GB code +8080h= machine inside code

Example: Take the Chinese character "big" as an example, the "big" word area code is 2083

Example parsing:

1, the area code is 20, the bit number is 83

2, the location number 2083 is converted to hexadecimal representation of 1453H

3, 1453H+2020H=3473H, get GB code 3473H

4, 3473H+8080H=B4F3H, get in-machine code for B4F3H

5. 1453h+a0a0h=b4f3h, get the in-machine code for B4F3H

6, in-machine code B4F3H-A0A0H=1453H Location Code

The area code is 20H (32 arr., 83 bit)


This article is from the Web Learning Notes blog, so be sure to keep this source http://cdlaowang.blog.51cto.com/5022872/1764660

Chinese character location code, GB code (Exchange code) and the method of intra-machine code conversion

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.