GB2312 is the standard code for Simplified Chinese system, which is called Location Code by the concept of "area" and "bit".
A region refers to a large range bit equivalent to an offset.
Two bytes per Chinese character
The range of high byte "is 0xb0-0xf7, and the range of" low byte "is 0xa1-0xfe.
It seems to be in the order of pinyin A to Z.
"Ah" word is the first Chinese character in the GB2312, its location code is 1601
For this we are now outputting a Chinese character in the form of code
C # is little byte order B0 run back.
Copy Code code as follows:
ushort U = 0xa1b0;
Byte[] CHS =bitconverter.getbytes (U);
Console.Write (encoding.getencoding ("GB2312"). GetString (CHS));
The print on the screen is Chinese characters "ah"
But notice does not mean that writing a cycle from 0XBA1 to 0xf7fe can output all the characters, which is very simple, such as highs from 1 to 9 lows from 1 to 9 with only 81 combinations.
Does not mean 99-11 is the result of so two of the problem halo. In fact, in this way the number of Chinese characters in total is 6,768, understand the concept of location code after you know how to deal with gb2312 encoding.
So here's how we're going to print all the characters.
Copy Code code as follows:
gb2312
B0-f7, low byte from A1-fe
byte hi = 0xb0;
byte lo = 0xa1;
for (Byte i = 0xb0 i <= 0xf7; i++)
{
for (Byte j = 0xa1 J <= 0xFE; j + +)
{
byte t = (byte) (j | (byte) 0x01);
Console.Write (encoding.getencoding ("GB2312"). GetString (new byte[] {i, J}));
}
}
An explanation on GB2312: http://www.jb51.net/article/34630.htm
ASCII is the American Standard Code for Information interchange he is from 0~127, a byte 8 bits highest is 255 that is, a byte is not used up.
GB2312 also have letters called Full-width characters, and gb2312 include ASCII codes called Half-width characters.
Full-width characters look a little different, just like this. Full angle: A half angle: a
Full-width characters have no practical effect except in the text system.
The first byte of the full-width character is always set to 163, while the second byte is the same half-width character code plus 128 (excluding spaces).
If the half angle A is 65, then full-width A is 163 (first byte), 193 (second byte, 128+65).
Know this rule then we can also traverse the full-width characters of all ASCII corresponding to the place:
Copy Code code as follows:
/**
* In fact, the first byte of the full-width character is always set to 163,
* and the second byte is the same half-width character code plus 128 (excluding spaces).
* If half angle A is 65, then full-width A is 163 (first byte), 193 (second byte, 128+65).
*/
for (byte k = 0x00 K < 0x7f; k++)
{
byte[] ch = new BYTE[2];
Ch[0] = 163;
CH[1] = (byte) (128 + K);
Console.Write (encoding.getencoding ("GB2312"). GetString (CH));
}
WinXP the default save encoding for text files is ANSI, note that this ANSI his concept is different from GB2312, in addition to the Unicode, Utf-8
The relationship between them is:
Different countries and regions have developed different standards, resulting in the GB2312, BIG5, JIS and other coding standards.
These use 2 bytes to represent a single character in a variety of Chinese character extension encoding, called ANSI encoding.
Under the Simplified Chinese system, ANSI encoding represents GB2312 encoding, and ANSI codes represent JIS codes under Japanese operating systems.
C # for text reading the students are most likely to appear without understanding why text files are garbled when read
Copy Code code as follows:
StreamReader sr = new StreamReader (Application.startuppath + @ "\config.txt");
String line;
while (line = Sr. ReadLine ())!= null)
{
Console.WriteLine (line);
}
Because the way the reading is decoded is different from the text store, it is best to specify the encoding when initializing StreamReader, default is ANSI
Copy Code code as follows:
StreamReader sr = new StreamReader (Application.startuppath + @ "\config.txt", System.Text.Encoding.Default);