String encoding in C #

Source: Internet
Author: User
Tags coding standards

GB2312 is the standard code of the simplified Chinese system. It is represented by the concept of "area" and "bit ".
A Zone refers to a large range of BITs, which is equivalent to an offset.
Each Chinese Character occupies two bytes.
The range of high byte is 0xB0-0xF7, and the range of "low Byte" is 0xA1-0xFE.
The rule seems to be arranged in the order of Pinyin a to z.
"Ah" is the first Chinese Character in GB2312, and its location code is 1601
Therefore, we now use code to output a Chinese character
C # is followed by b0 in little byte order.

Copy codeThe Code is as follows: ushort u = 0xa1b0;
Byte [] chs = BitConverter. GetBytes (u );
Console. Write (Encoding. GetEncoding ("GB2312"). GetString (chs ));

The Chinese character "ah" is output on the screen"
However, note that writing a loop from 0xbA1 to 0xf7fe does not mean that all Chinese characters can be output. This is very simple, for example, there are only 81 combinations between the high position from 1 to 9 and from 1 to 9.
It does not mean that 99-11 is the result of such two problems. In fact, there are a total of 6768 Chinese characters in this way. After understanding the concept of location code, you will know how to process the gb2312 Chinese character encoding.
Next we will use this method to output all Chinese characters.

Copy codeThe Code is as follows: // gb2312
// B0-F7, low byte from A1-FE
// Byte hi = 0xB0;
// Byte lo = 0xA1;
For (byte I = 0xB0; I <= 0xF7; I ++)
{
For (byte j = 0xA1; j <= 0xFE; j ++)
{
// Byte t = (byte) (j | (byte) 0x01 );
Console. Write (Encoding. GetEncoding ("GB2312"). GetString (new byte [] {I, j }));
}
}

Interpretation of GB2312: http://www.jb51.net/article/34630.htm

ASCII is the American Information Exchange Standard Code. It ranges from 0 ~ 127. The maximum value of 8 bytes is 255, which means that one byte is useless.
There are also letters in GB2312 called fullwidth characters, and gb2312 also contains ascii characters called halfwidth characters.
The character in the fullwidth seems A little strange, just like this: A halfwidth: a fullwidth: a halfwidth:
All-round characters have no practical effect except in the text system.
The first byte of the fullwidth character is always set to 163, while the second byte is the same half-width attention code plus 128 (excluding spaces ).
For example, if the half-width A is 65, the full-width A is 163 (the first byte) and 193 (the second byte, 128 + 65 ).
Knowing this rule, we can also traverse all ascii full-angle characters:

Copy codeThe Code is as follows :/**
* In fact, the first byte of the fullwidth character is always set to 163,
* The second byte is the same half-width bytecode plus 128 (excluding spaces ).
* If the halfwidth A is 65, the fullwidth A is 163 (the first byte) and 193 (the second byte, 128 + 65 ).
*/
For (byte k = 0x00; k <0x7f; k ++)
{
Byte [] ch = new byte [2];
Ch [0] = 163;
Ch [1] = (byte) (128 + k );
Console. Write (Encoding. GetEncoding ("GB2312"). GetString (ch ));
}

In winXp, the default storage encoding of text files is ansi. Note that this ansi concept is different from that of GB2312, in addition to unicode and UTF-8.
The relationship between them is:
Different countries and regions have developed different standards, resulting in respective coding standards such as GB2312, BIG5, and JIS.
These two bytes are used to represent the extended Chinese character encoding methods of a single character. They are called ANSI encoding.
In a simplified Chinese system, ANSI encoding represents GB2312 encoding. In a Japanese operating system, ANSI encoding represents JIS encoding.
C # When reading text, new students are most likely to see why text files are garbled.

Copy codeThe Code is as follows: StreamReader sr = new StreamReader (Application. StartupPath + @ "\ config.txt ");
String line;
While (line = sr. ReadLine ())! = Null)
{
Console. WriteLine (line );
}

Because the reading method is different from the decoding method in text storage, it is best to specify the encoding when initializing streamReader. Default is ANSI.Copy codeThe Code is as follows: StreamReader sr = new StreamReader (Application. StartupPath + @ "\ config.txt", System. Text. Encoding. Default );

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.