1) Chinese Character Exchange Code (Chinese character code) Chinese Character Exchange Code (Chinese Character Code) is mainly used for Chinese character information exchange. National Standard Code: The Chinese Character Exchange Code stipulated in the basic set of "Chinese character encoding character set for Information Exchange" (codenamed gb2312 80) promulgated by the National Bureau of Standards in 1980 is used as the national standard Chinese character encoding. Gb2312 80 contains a total of 7445 characters:
6763 Chinese characters 3755 first-level Chinese characters 3008 (in alphabetical order of Chinese pinyin) 682 second-level Chinese characters (in the order of radicals and strokes) non-Chinese characters gb2312 80, all Chinese characters and symbols form a square matrix of 94X94. In this square matrix, each row is called a "area", and each column is called a "bit ". This square matrix is actually composed of 94 areas (numbered from 01 to 94), each of which has 94 characters (numbered from 01 to 94.
The combination of a Chinese character's region code and Location Code constitutes the "location code" of the Chinese character ". The upper two digits are the area code, and the lower two digits are the location code. In this way, the location code can uniquely identify a Chinese character or character. On the contrary, any Chinese character or symbol corresponds to a unique location code without duplicate codes.
The distribution of location codes is as follows:
Zone No. Content
No symbols on Zone 1 keyboard
Serial numbers of zone 2
Symbols on the 3-zone keyboard (given in Chinese)
4-5 Japanese letters
Greek letters in Area 6
Russian letters in Area 7
The name of the upper-right letter and upper-right letter that identifies the pinyin tone.
Area 9 tabulation symbols
Unused zones 10-15
Level-1 Chinese characters in the 16-55 area (sorted by pinyin letters)
Second-level Chinese Characters in Area 56-87 (arranged in the order of radicals and strokes)
Custom Chinese Characters in Area 8-94
We can see that 94 Chinese characters and symbols can be divided into four groups:
Area 1-15: graphic symbol area. Area 1-9 is the standard symbol area, and area 10-15 is the custom symbol area.
Area ② 16-55: it is a level-1 Chinese Character area, which contains 3755 Chinese characters. The Chinese Characters in these areas are sorted by Chinese pinyin, And the homophone words are listed by strokes.
Area 3 56-87: second-level Chinese Character area, which contains 3008 Chinese characters. The Chinese Characters in these areas are sorted in the order of the radicals and strokes.
(4) Area 8-94: the custom Chinese Character area.
The Country Code specifies that each Chinese character (including non-Chinese characters) is represented by two-byte code. Each byte has a maximum of 0 and only uses a minimum of 7 bits, while 34 low 7 bits are suitable for control, in this way, only 2 ^ 7-34 = 94 characters are encoded for Chinese characters. The two bytes are 94X94 = 8836 Chinese characters encoded. In the two bytes that indicate a Chinese character, the high byte corresponds to the row number in the encoding table, which is called the area code. The low byte corresponds to the column number in the encoding table, which is called the bit number.
The range of Chinese character country code is expressed in binary: 00100001 00100001 01111110 01111110
(1 + 32) 10 (1 + 32) 10 (94 + 32) 10 (94 + 32) 10 7-bit ASCII code is a character set consisting of 128 characters. The encoding value 0 31 (00000000 00011111) does not correspond to any printed characters. It is usually called a control character. It is used for communication control in computer communication or functional control on computer devices. The encoding value 32 (00100000) is a space character sp. The encoding value 127 (1111111) is the delete character Del.
Select 00100001 (33) 10 as the starting binary position of the Chinese character country code to skip the 32 control characters and space characters of the ASCII code. Therefore, the high and low positions of Chinese Character Mark codes are greater than the corresponding location codes (32) 10 or (00100000) 2 or (20) h, namely: country code high position = area code + 20 h (h indicates hexadecimal) Country Code low position = location code + 20 h
2) Chinese Character machine internal code (Internal Code) (Chinese character storage code)
Gb2312 (1980) contains a total of 7445 characters, including 6763 Chinese characters and 682 other symbols. The inner code range of the Chinese character area is high byte from the B0-F7, low byte from the A1-FE, the occupied bitwise of the Code is 72*94 = 6768. Five of them are D7FA-D7FE.
The Chinese Character machine internal code (inner Code) is used to unify the representation of different Chinese character input codes in the computer.
In order to unify the various input codes of Chinese characters in the computer, there is a Chinese character machine Internal Code dedicated to storing Chinese characters in the computer, it is used to convert multiple types of Chinese character input codes used for input into Chinese Character machine internal codes for storage, so as to facilitate Chinese Character Processing in the machine.
The internal code of the Chinese character machine is stored and processed in the computer. Computers must process both Chinese characters and English letters. Therefore, the computer must be able to distinguish Chinese and English characters. The inner code of an English character is an 8-bit ASCII code with a maximum value of 0. In order not to conflict with the 7-bit ASCII code, the maximum bit of each byte of the Country Code is changed from 0 to 1, and the rest of the encoding remains unchanged as the internal code of Chinese characters.
The range of inner codes of Chinese characters is expressed in binary: 10100001 10100001 11111110 11111110
The high and low positions of internal codes are higher than those of the corresponding national standard codes (128) 10 or (10000000) 2 or (80) H.
That is, the internal code is high = the national code is high + 80 h the internal code is low = the national code is low + 80 h
This is also because: The country standard code is high = the district code + 20 h the country standard code is low = the location code + 20 h
So: the internal code is high = the area code + a0h the internal code is low = the location code + a0h
That is to say, the intra-host code and intra-host code are higher than the corresponding intra-host code and bit Code respectively (160) 10 or (10100000) 2 or (A0) h
For example, the Code of the Chinese character "ah" is "1601", where the code is (16) 10 or (10) h, and the code is (01) 10 or (01) H.
Then: Machine internal code high = 10 h + a0h = b0h machine internal code low = 01 H + a0h = a1h so:
Internal code = b0a1h
Private void btnbrowser_click (Object sender, eventargs E)
{
String S = tbcharorstring. text;
String ST = string. empty;
Byte [] array = system. Text. encoding. Default. getbytes (s );
For (INT I = 0; I <array. length; I ++)
{
If (array [I]> = 161 & array [I] <= 247)
{
St = ST + system. Text. encoding. Default. getstring (array,
I, 2 );
St = ST + String. format ("high byte: {0}, low byte: {1}" + environment. newline, array [I], array [I + 1]);
I ++;
}
Else
{
St = ST + system. Text. encoding. Default. getstring (array,
I, 1 );
St = ST + String. Format ("ASCII: {0}" +
Environment. newline, array [I]);
}
}
Tbresult. Text = sT;
}
The demo interface is shown in the following figure: