The following is the basics of character, Byte, and encoding format:
Http://www.regexlab.com/zh/encoding.htm
The program involves several conversions of the encoded format:
1. Code text, string is saved according to the encoding method set by our text
2. During code execution, the string is stored in memory in an encoded way
3. If the code produces a string output, the output to the problem string different IO classes will produce different encoding methods of saving, of course, this can be set to implement the change
Because the same string has a 3-time encoding format switch, the actual bytes stored in the 3 places here may not be the same length.
In addition, if the string is byte-stream processing, to be extra careful, do not confirm the specific encoding method (Kanji: Unicode encoding 2 bytes, UTF8 encoded 3 bytes,ANSI code to see the specific localization code ).
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced. Bo Master Contact: [email protected].
Document string handling and encoding format issues in C + + and C #