Gb2312 Chinese character location code, switching code and in-memory Code Conversion Method (zt) In order to meet the needs of computer processing of Chinese character information, in 1981, China promulgated the gb2312 national standard. This standard selects 6763 frequently-used Chinese characters (including 3755 frequently-used first-level Chinese characters and 3008 second-level Chinese characters) and 682 non-Chinese characters, and sets a standard for each character Code To exchange Chinese characters between different computer systems. The gb2312 Character Set constitutes a 94-row and 94-column two-dimensional table. The row number is called a region number, and the column number is called a location number. The location of each Chinese character or symbol in the code table is represented by its region number and location number. For processing and storage convenience, the area and bit numbers of each Chinese character are represented by one byte in the computer. For example, if the area code of the word "" is 49 and the location code is 07, its location code is 4907. The two-byte binary number is 00110001. The 00000111 Location Code cannot be used for Chinese Character communication, because it may be used with the control code (00h ~ 1fh) (0 ~ 31) a conflict occurs. Iso2022 requires that each Chinese character's area code and location number must be added with 32 (that is, the binary number is 00100000) respectively. After such processing, the Code obtained is called the National Standard Exchange Code, or the Exchange Code for short. Therefore, the GB exchange code for "Learning" is calculated as follows: 00110001 00000111 + 00100000 + 00100000 ---------------------------- 01010001 00100111 expressed as 5127 h in hexadecimal notation. Because Chinese and Spanish characters are generally mixed in text, if the Chinese character information is not identified, it will be confused with the single-byte ASCII code. One solution to this problem is to regard a Chinese character as two extended ASCII codes so that the maximum bits of the two bytes representing the gb2312 Chinese character are 1. The dual-byte Chinese character code with a high value of 1 is the internal code of gb2312 Chinese characters. Therefore, the inner code of "Learning" is: 11010001 10100111 in hexadecimal notation, that is, d1a7h. It should be pointed out that the input encoding of Chinese characters and the internal machine code of Chinese characters are different categories. Regardless of the encoding input method (for example, Pinyin or five-stroke font) used to input a Chinese character, the internal code is the same. Note: This article is based on the book "university computer information technology tutorial" (Nanjing University Press. ----------------------------------------------------------------- The Chinese Character Library in the computer software system is generally called the Chinese character repository. According to different standards, the number of Chinese characters in the font library is different: 1. the GB 2312 Chinese character encoding character set has been in use since 1975. In order to study the usage frequency of Chinese characters, China has carried out large-scale Word Frequency Statistics, content includes publications on industry, agriculture, military, science and technology, politics, economy, literature, art, education, sports, medicine and health, astronomy and geography, nature, chemistry, text reform, archaeology, etc., in the vast literature of hundreds of millions, 6335 different Chinese characters are used, and the cumulative use frequency of more than 3000 Chinese characters reaches 99.9%, in contrast, the cumulative frequency of more than 3000 million characters is less than 0.1%, indicating that the number of frequently-used Chinese characters and sub-frequently-used Chinese characters is less than 7000, which provides a basis for China to develop Chinese Character Library standards. In 1980, the National Standard Exchange Code of "Chinese character encoding Character Set-basic set for information exchange" was issued. The national standard number is: GB2312-80, and 6763 Chinese characters are selected, the first-level font contains 3755 frequently-used Chinese characters, and the second-level font contains 3008 frequently-used Chinese characters. The second-level font also contains 682 characters, including numbers, General symbols, Latin letters, *** Kana, Greek letters, Russian letters, Pinyin characters, and phonetic alphabet. In the past, various Chinese dos versions and Windows 3.2/versions in mainland China were loaded into the first and second-level libraries of the national standard. When you encounter "success, hello, understand, fail, fail ......" And other Chinese characters. Later, the State Bureau of Technical Supervision issued a correspondingly traditional Chinese character set, the full name of "information exchange with Chinese character encoding Character Set auxiliary set", the standard is GB/T12345-90.
This article is transferred from http://blog.21ic.com/user1/1003/archives/2005/3648.html