Bom: byte oder mark
encoding |
BOM value |
UTF-8 |
ef bb bf |
UTF-16 big endian |
Fe FF |
UTF-16 little endian ( Unicode ) |
FF Fe |
UTF-32 big endian |
00 00 Fe FF |
Little endian UTF-32 |
FF Fe 00 00 |
You may not know: A Chinese character is UTF-8 encoded in three bytes. By default, streamwriter in net uses the UTF-8 encoding format to encode strings, but does not write the BOM value ("ef bb bf") corresponding to utf8 into the binary stream. The default constructor of utf8encoding does not contain Bom, while encoding. utf8 includes Bom. DBCS double-byte character set sbcs single-byte character set encoding in Chinese environment. the default property returns dbcscodepageencoding, which is an internal class
Code page id value |
. net name |
936 |
gb2312 |
950 |
big5 |
1200 |
UTF-16 |
52936 |
hz-gb-2312 |
54936 |
gb18030 |
65000 |
utf-7 |
65001 |
UTF-8 |
932 |
JapanShift-JIS |
Private Static encoding createdefaultencoding ()
{
Int ACP = win32native. getacp ();
If (ACP = 0x4e4) /// 1252
{
Return new sbcscodepageencoding (ACP );
}
Return getencoding (ACP );
} Http://www.cnblogs.com/bitfan/archive/2010/11/25/1887590.html Http://en.wikipedia.org/wiki/Code_page
Published by wiz