Chinese character encoding and matrix summary

Source: Internet
Author: User

Chinese character encoding and matrix summary

I. Introduction

Refer to the following two articles during writing:ArticleTo the authors of the two articles.
I admire and thank you.
CodeSome references
Http://www.ugia.cn /? P = 82 author legend
For more information, see
Http://dev.gameres.com/Program/Control/fontDOS.htm author Wu Jin

Ii. Chinese Character inner code, encoding, and Dot Matrix
The basic principle of Chinese character display. Generally, the computer has a Chinese character matrix Library (also known as dot matrix ).
The basic process of displaying Chinese Characters in DOS is:
(1) the computer first obtains the internal code of Chinese characters, which is a hexadecimal number stored in a text file,
(2) The inner code is used to calculate the location code of Chinese characters.
(3) obtain the real location of Chinese character lattice in the font file by location code
(4) Click on the corresponding position of the dot matrix (coordinate) Information on the screen to form the Chinese character we see
1. Internal code
The internal code is the encoding used to store Chinese characters on a computer. For example, if we write a text file, the content is
"Happy Holidays", open it in a hexadecimal editor, the content is
BD da C8 D5 bf ec C0 D6
The BD da here is the inner code of the first Chinese character "section.

2. Location Code
The location code is the Chinese character encoding specified in gb2312 of the national standard. It is used to guide the font library (dot matrix) files.
Different font files (such as 12*12, 16*16, 24*24)
The scale is different, but the sequence of storing Chinese characters is the same. This will be explained later.
What we need to know is that the location code specified by gb2312 is actually a 94 × 94 matrix.
In this square matrix, each row is called a "area", and each column is called a "bit ",
Therefore, this square matrix actually forms a 94 partition (ranging from 1 to 94 in decimal format ),
Each region contains 94 Chinese character sets (ranging from 1 to 94.
The area code and location code of a Chinese character are simply combined to form the "location code" of the Chinese character ".
In the Chinese character location code, the upper two digits are the area code, and the lower two digits are the bit numbers.
For example, the partition code of "section" is
Section 2958
The code is 29, that is, the hexadecimal 1D.
The location code is 58, that is, the hexadecimal 3A.
The location codes of all Chinese characters can be found at the following URL.
Http://www.knowsky.com/resource/gb2312tbm.htm

3. Conversion of inner code and location code
Internal code high = area code + a0 (that is, 160 of the 10-digit System)
Incode low position = bit code + a0

Let's use the word "section" to verify it,
The incode low is da = code 3A + a0
The incode height is BD = bit code 1D + a0
Note that on Intel machines, the internal code is stored at a high level on the low address, and the low level on the high address.

4. Dot Matrix
For computers, each Chinese character is actually a square matrix of points,
Black the position marked as 0 (background color), white the position marked as 1 (foreground color ),
Make the square matrix look like a Chinese character on the screen. For example, the word "1,
The dot matrix information in the 12*12 font file is
000000000000
000000000000
000000000000
000000000000
000000000100
111111111110
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000
When a computer outputs Chinese characters to a display device, it displays the information in the dot matrix on the screen.
Color the corresponding position, color the background color marked with 0 (black by default), color the foreground color marked with 0 (white by default)
It forms the Chinese characters we see on the screen.

5. font files
As mentioned above, the dot matrix information of Chinese characters is stored in the font file in the order of location codes. Therefore, to retrieve the dot matrix information of a Chinese character, you must know its position in the font. The formula for calculating this position is
94 * (area code-1) + location code-1
1 is because the offset in the file starts with 0 and the area code is started with 1. You need to convert it.
However, what we get is the location of the Chinese character in the Chinese character library. We need to get the specific storage location in the model file,
Also multiply the number of bytes occupied by a Chinese character model.
The number of bytes used by a Chinese character dashboard is the number of bytes occupied by the dot matrix,
The number of bytes occupied is 12*12/8 = 16 bytes.

In combination, the formula for calculating the storage location of a Chinese character dot matrix in a font file is
(94 * (area code-1) + location code-1) * (number of bytes in a single mode)
The number of bytes in a single mode = the number of rows in the dot matrix * The number of columns in the dot matrix/The number of digits occupied by one byte

3. Simple character Filling Machine
With the above preparations, we can output Chinese character lattice InformationProgram. The Code is as follows:
# Include "stdio. H"
# Include "string. H"
# Include "stdlib. H"

Const int reglen = 94; // each partition (ROW) has 94 bits (column)
Const int font_width = 12; // single-word dot matrix width (number of columns)
Const int font_height = 12; // single-word dot matrix height (number of rows)
Const int dotsize = font_width * font_height/8; // number of bytes occupied by a Chinese character lattice
Const int subcode = 0xa0; // difference between the incode and the Zone and bit code

Char * font_file_name = "simsun12.fon"; // dot matrix font file name
Char STR [] = "1"; // Chinese character to display dot matrix information
Char bindot [dotsize] = {0}; // array for storing dot matrix information

Void printcharbindot (char * bindot, int dotlen );
Int main (INT argc, char * argv [])
{
File * fp = fopen (font_file_name, "rb ");
Int string_size = font_width * font_height;

Int I = 0, j = 0;
Unsigned char regcode; // area code
Unsigned char bitcode; // bit code

// Calculate the location code,
Regcode = (unsigned char) STR [I]-subcode;
Bitcode = (unsigned char) STR [I + 1]-subcode;

// Calculate the position of the Chinese character in the font, and then obtain the offset of the character dot matrix in the font file.
Int offset = (regcode-1) * reglen + bitcode-1) * dotsize;

// Read the dot matrix data from the font file
Fseek (FP, offset, seek_set );
Fread (bindot, sizeof (bindot), 1, FP );

// Output its dot matrix information
Printcharbindot (bindot, dotsize );

Fclose (FP );
System ("pause ");
Return 0;
}

// Output the information of each dot matrix in order
Void printcharbindot (char * bindot, int Len)
{
Int charnum = 0; // The Current byte number
Int bitnum = 0; // number of digits read
Int bitindex = 0; // current bid
Int bitvalue; // the value of the current BIT
For (charnum = 0; charnum <Len; ++ charnum)
{
// Output each byte in sequence from high to low
For (bitindex = 7; bitindex> = 0; -- bitindex)
{
// Output the bitindex value of the Current byte
Bitvalue = (bindot [charnum]> bitindex) & 0x1 );
Printf ("% C", bitvalue + '0 ');

// Output a row with 12 digits
If (++ bitnum % 12) = 0)
Printf ("/N ");
}
}
}
The font library file simsun12.fon used during running can be downloaded here
Http://www.ugia.cn/wp-data/fontfun.rar

The running result is
000000000000
000000000000
000000000000
000000000000
000000000100
111111111110
000000000000
000000000000
000000000000
000000000000
000000000000
000000000000

From:Http://hi.baidu.com/boger/blog/item/3e36830182e7edd4277fb5fd.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.