The basic constituent unit of Chinese text is Chinese characters. At present, the total number of Chinese characters has more than 60,000 words. The characteristics of large number of Chinese characters, complicated shape, variant and many other characters give a series of problems to the representation, processing, transmission, exchange, input and output of Chinese characters in the computer, and also bring a lot of difficulty to character work. There are several coding schemes for Chinese character coding:
1. gb2312-80 code &http://www.aliyun.com/zixun/aggregation/37954.html ">NBSP;
GB2312 code is the People's Republic of China's national character information exchange code, the full name of the information exchange with Chinese character encoding character set-basic set, issued by the National Standards Bureau, May 1, 1981, the implementation of the mainland. This code is also used in Singapore and other fields.
GB2312 included characters and symbols, letters, kana, etc. a total of 7,445 graphic characters, of which Chinese characters accounted for 6,763. GB2312 stipulates that "for any graphic character is represented by two bytes, each byte is represented by seven-bit encoding", the habit of saying the first byte is "high byte", the second byte is "low byte." GB2312-80 contains most commonly used secondary characters, and 9-area symbols. The character set is the Chinese character set supported by almost all Chinese systems and internationalized software, which is also the most basic Chinese character set. Its coding range is high 0xa1-0xfe, low is also 0xa1-0xfe, Chinese characters start from 0xb0a1, end in 0xf7fe. GB2312 the Code table into 94 areas, corresponding to the first byte (0XA1-0XFE), 94 bits per zone (0XA1-0XFE), the second byte, two byte value is the area code value and the bit number value plus (20H), therefore also known as the location code. The 01-09 area is the symbol, the numeral area, the 16-87 area is the Chinese character area (0xb0-0xf7), the 10-15 district, the 88-94 district is the blank area which waits for further standardization. GB2312 will be included in the Chinese characters divided into two levels: the first level is commonly used in 3,755 of Chinese characters, placed in 16-55, according to the alphabetical/pen order; The second-order Chinese characters are commonly used Chinese characters 3,008, placed in 56-87 area, according to the radical/stroke order. Therefore GB2312 can represent up to 6,763 characters.
GB2312 's coding range is 2121h-777eh, overlapping with ASCII, the method is to the GB code two bytes highest position 1 to differentiate.
Figure 1:gb2312 Coding diagram
The dotted area in the image in the ASCII area is the original GB2312 encoding area, and the bottom right corner solid line area is the translated GB2312 encoding region. The detailed location distribution is as follows:
Area code word number character category
01 941-Like symbol
02 72 Order Number
03 94 Latin alphabet
04 83 Japanese Hiragana
05 86 Japanese Katakana
06 48 Greek Letters
07 66 Russian Letters
08 63 Hanyu Pinyin Symbol
09 76 Graphic Symbols
10-15 Spare areas
16-55 level 37,551 Chinese characters, with Pinyin as preface
56-87 level 30,082 Chinese characters, in order of strokes
88-94 Spare areas