Basic information
"Chinese character coded character set for information interchange" is a set of national character encoding character set issued by our national Standard Bureau in 1980 and implemented May 1, 1981.
Standard, the standard number is GB 2312-1980. It is the computer can recognize the code, suitable for Chinese character processing, Chinese character communication and other systems between the exchange of information. Basic set of total income of 6,763 Chinese characters and non-Chinese character graphic characters 682. The entire character set is divided into 94 districts, with 94 bits per zone. Each location has only one character, so it can be used in the region and bit to encode the Chinese character, called the location code.
This code is unique, there will be no coincident word. The conversion into 16 of the location code plus 2020H, you get GB code. GB code plus 8080H, it is commonly used in the computer machine code. 1995 also promulgated the "Encoding Extension Code" (GBK). GBK is compatible with the internal code standard of GB 2312-1980 national standards, while supporting iso/iec10646-1 and GB 13000-1 in all Chinese, Japanese and Korean (CJK) characters at the vocabulary level, totaling 20902 words. The relationship between Chinese character coded characters and Chinese character input encoding is to input Chinese characters into the computer according to different Chinese character input methods, and then the computer can recognize and handle the encoding characters when it is first converted into information interchange. The Chinese character output is to convert the internal code into encoding first and then send to the output device.
GB Standard
2312
GB 2312 or GB 2312-80 is a simplified Chinese character set of Chinese national standards, the full name of the "Information exchange with Chinese character coded character set," The basic set, also known as GB0, issued by the China National Standards Administration, May 1, 1981 implementation. GB2312 codes are used in mainland China, and in Singapore and other fields. Almost all Chinese-language systems and international software support GB 2312 in mainland China.
GB 2312 Standard includes 6,763 Chinese characters, one class of 3,755, two Chinese characters 3,008; At the same time, GB 2312 contains 682 full-width characters including Latin alphabet, Greek alphabet, Japanese hiragana and katakana letters, Russian Cyrillic letters.
The emergence of GB 2312, basically meet the needs of computer processing of Chinese characters, it has been included in Chinese characters have covered 99.75% of the use of China's frequency.
For the names of people, ancient Chinese and other aspects of the general antiseptic word, GB 2312 can not be processed, which led to the subsequent GBK and GB 18030 character set of characters appear.
Partition representation
In GB 2312, the received Chinese characters are "partitioned", with 94 characters/symbols per zone. This representation is also called Location code.
Area 01-09 is a special symbol.
16-55 is a class of Chinese characters, sorted by pinyin.
The 56-87 area is a class two Chinese character, sorted by radical/stroke.
10-15 Districts and 88-94 districts are not encoded.
For example, the word "ah" is the first Chinese character in the GB2312, and its location code is 1601.
BYTE structure
In programs that use GB2312, the EUC storage method is usually used for compatibility with ASCII. The "GB2312" on the browser's coded table usually refers to the "EUC-CN" notation.
Each character and symbol is expressed in two bytes. The first byte is called "High Byte" (also known as "area Byte"), and the second byte is called "Low byte" (also known as "bit Byte").
"High byte" uses 0xa1-0xf7 (the area code of area 01-87 plus 0xa0), "Low byte" uses 0xa1-0xfe (01-94 plus 0xa0). Since the first level of Chinese characters from the beginning of 16, the "High byte" range is 0xb0-0xf7, "low byte" range is 0xa1-0xfe, occupy the code bit is 72*94=6768. 5 of these vacancies are d7fa-d7fe.
For example, the word "ah" is stored in most programs in two bytes, 0xb0 (the first byte) 0xa1 (the second byte). Location Code = Region byte + bit byte (compared with location code: 0XB0=0XA0+16,0XA1=0XA0+1).
Coding table
B2312 Simplified Chinese Code Table
Code +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +a +b +c +d +e +f
a1a0, ... ˉˇ¨〃々-~‖ ... "
a1b0" "()〈〉" "" "〖〗" "
A1c0±x÷:∧∨∑∏∪∩∈∷√⊥∥∠
A1d0⌒⊙∫∮≡≌≈∽∝≠≮≯≤≥ ∞∵
a1e0∴♂♀°′″℃$¤¢£‰§№☆★
a1f00 ◇-▲※→←↑↓〓
Code +0 +1 +2 +3 +4 +5 +6 +7 +8 + 9 +a +b +c +d +e +f
A2a0ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹ?????
a2b0 ⒈⒉⒊⒋⒌⒍⒎⒏⒐⒑⒒⒓⒔⒕⒖
A2c0⒗⒘⒙⒚⒛⑴⑵⑶⑷⑸⑹⑺⑻⑼⑽⑾
A2d0⑿⒀⒁⒂⒃⒄⒅⒆⒇①②③④⑤⑥⑦
A2e0⑧⑨⑩?? 1,234,567,890?
a2f0? Ⅰⅱⅲⅳⅴⅵⅶⅷⅸⅹⅺⅻ??
Code +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +a +b +c +d +e +f
a3a0! "#¥% & ' () * +,-. /
A3b0 0 1 2 3 4 5 6 7 8 9:; < = >?
A3c0 @ A b c d E F g H I J k l m n o
a3d0 P Q R S T U V W X Y Z [\] ^ _
a3e0 ' A b c d e F g h i j k l M N o
A3f0 p q R S t u v w x y z {|}
...