When tinyxml parses an XML file in UTF-8 format, if the file contains the following two strings: "<Name> classic document </Name>" and "<Name> News Express </Name>" fail to be resolved.
AnalysisCodeThe cause of the failure is the following code:
Function in tinyxmlparser. cpp file: const char * tixmlbase: readtext ()
Int Len; char Carr [4] = {0, 0, 0}; P = getchar (p, Carr, & Len, encoding); If (LEN = 1) (* Text) + = Carr [0]; // more efficientelsetext-> append (Carr, Len );
The preliminary analysis is to resolve the problem of UTF-8 string.
The following table is used for parsing:
Const int tixmlbase: utf8bytetable [256] = {// 0 1 2 3 4 5 6 7 8 9 a B c d e F1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x001, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, // 0x101, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x201, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x301, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x401, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0X501, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x601, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x70 end of ASCII range1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x80 0x80 to 0xc1 invalid1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x90 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0xa0 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0xb0 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, // 0xc0 0xc2 to 0xdf 2 byte2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, // 0xd03, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, // 0xe0 0xe0 to 0xef 3 byte4, 4, 4, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 // 0xf0 0xf0 to 0xf4 4 byte, 0xf5 and higher invalid };
This table can be accessed by Google on the Internet. The reason for parsing failure is pending.