Chinese character encoding can be divided into the following types:
Input code: it is an encoding designed to input Chinese characters into a computer using keyboard characters. When you enter English letters, press the key for the characters you enter. The entered code is the same as the internal code. When entering a Chinese character, you may have to press a few keys to enter a Chinese character. There are hundreds of Chinese Character Input schemes, but these extremely different external codes are converted into uniform internal codes after being entered into the computer. The Chinese Character Input scheme can be roughly divided into the following four types:
(1) audio codes: such as full fight, dual fight, and micro-soft pinyin
(2) form code: for example, wubi font, Zheng code, Table Code, etc.
(3) sound form code: such as smart ABC and natural code
(4) digital codes: such as location codes and telegraph Codes
Inner Code: (Chinese character storage code) is used to unify the representation of different Chinese character input codes in the computer. In order to unify the various input codes of Chinese characters in the computer, there is a Chinese character machine Internal Code dedicated to storing Chinese characters in the computer, it is used to convert Multiple Chinese character input codes used for input into Chinese Character machine internal codes for storage, so as to facilitate Chinese Character Processing in the machine.
Take gb1232 as an Example
The derivation process is as follows:
The Country Code specifies that each Chinese character (including some non-Chinese characters) consists of 2 bytes Code . Each byte has a maximum of 0 and only uses a minimum of 7 bits, while 34 low 7 bits are suitable for control, in this way, each byte is only 128-34 = 94 characters encoded for Chinese characters. The two bytes are 94X94 = 8836 Chinese characters encoded. In the two bytes that indicate a Chinese character, the high byte corresponds to the row number in the encoding table, which is called the area code. The low byte corresponds to the column number in the encoding table, which is called the bit number.
The range of Chinese character country code is expressed in binary: 00100001 00100001 01111110 01111110 (1 + 32) 10 (1 + 32) 10 (94 + 32) 10 (94 + 32) the 10-7 ASCII code is a 128-character set. The encoding value 0 31 (00000000 00011111) does not correspond to any printed characters. It is usually called a control character. It is used for communication control in computer communication or functional control on computer devices. The encoding value 32 (00100000) is a space character sp. The encoding value 127 (1111111) is the delete character Del.
Select 00100001 (33) 10 as the starting binary position of the Chinese character country code to skip the 32 control characters and space characters of the ASCII code. Therefore, the high and low positions of Chinese Character Mark codes are greater than the corresponding location codes (32) 10 or (00100000) 2 or (20) h, namely: country code high position = area code + 20 h (h indicates hexadecimal) Country Code low position = location code + 20 h
The inner code of an English character is an 8-bit ASCII code with a maximum value of 0. In order not to conflict with the 7-bit ASCII code, the maximum bit of each byte of the Country Code is changed from 0 to 1, and the rest of the encoding remains unchanged as the internal code of Chinese characters.
The range of the inner code of the Chinese character machine is expressed in binary: 10100001 10100001 11111110 11111110 the inner code's high and low ratio is higher than the corresponding national standard code's high and low level (128) or 80 h, that is: in-site code high = in-site code high + 80 h, in-site code low = in-site code low + 80 h, and because: in the Country Code high = area code + 20 h, in-site code low = bit code + 20 h. Therefore, the intra-host code Height = area code + a0h, the intra-host code Height = location code + a0h, that is, the intra-host code height and intra-host code low level are greater than the corresponding area code and location code (160) or a0h. For example, the location code of the Chinese character "ah" is "1601", where the area code is (16) or (10) h, and the location code is (01) or (01) H. Server internal code high = 10 h + a0h = b0h server internal code low = 01 H + a0h = a1h so: Server internal code = b0a1h
Output code: This is what we often call "font". It stores the dot matrix fonts of each symbol. A Chinese Character Font (output Code) is used to display and print Chinese characters. It is a digital information of Chinese characters. The inner code of a Chinese character is a digital code that represents a Chinese character. However, in order to display Chinese characters in the output, a Chinese Character Font must be output. In the Chinese character system, the lattice is generally used to represent the font. 32 bytes (16*16/8 = 32) are used for the 16*16 dot matrix font, and 72 bytes are used for the 24*24 dot matrix font (24*24/8 = 72) storage. In general, the larger the dot matrix used for displaying Chinese characters, the better the quality of the Chinese characters. Of course, the larger the storage required for each Chinese Character dot matrix.
The Chinese Character Address Code refers to the logical address used to store Chinese characters in the Chinese Character Library (which mainly refers to an integer matrix library. In Chinese character libraries, the font type information is stored continuously on the storage medium in a certain order (most are arranged in the order of Chinese characters in the standard Chinese Character Exchange Code). Therefore, most Chinese Character address codes are sequential, in addition, it has a simple correspondence with the Chinese character incode to simplify the conversion from the Chinese character incode to the Chinese character address code.
Reference: Chinese character encoding in linwen