Interchange between UTF-8 and gb2312
Author: Wu kangbin
I believe that many program developers often encounter character encoding problems, which is also a headache. Because these are potential errors, you must have development experience in this area to identify these errors. Especially when processing XML documents, this problem occurs more frequently. Once a server program is written in Java and the client interaction is written in VC. Interaction protocols are all written in XML. The result is that the data reception is incorrect during communication. Wondering! So I captured the data using the network packet capture tool and later found that the xml header on Java was like this. <? XML version = "1.0" encoding = "UTF-8"?>, The default value for VC is gb2312. Therefore, Chinese character data is incorrect. I have very few articles in this regard. For such problems, I will introduce a conversion program I wrote. Of course, the program is very simple. I hope you will have a smile if you have more fun.
If you are still very unfamiliar with UTF-8, Unicode, gb2312, etc, please view http://www.linuxforum.net/books/4268-unicode.html, I am not a waste of words here. The following describes two winapi functions: widechartomultibyte and multibytetowidechar.
Function prototype:
Int trim (uint codePage, // code pagedword dwflags, // performance and mapping flagslpcwstr lpwidecharstr, // wide-character stringint cchwidechar, // Number of chars in stringlpstr trim, // buffer for new stringint cbmultibyte, // size of bufferlpcstr lpdefaultchar, // default for unmappable charslpbool lpuseddefaultchar // set when default char used ); // convert a wide character to Multiple Narrow characters int multibytetowidechar (uint codePage, // code pagedword dwflags, // character-type optionslpcstr lpmultibytestr, // string to mapint cbmultibyte, // number of bytes in stringlpwstr lpwidecharstr, // wide-character bufferint cchwidechar // size of buffer); // convert Multiple Narrow characters into wide characters
The following functions are required:
Cstring cxmlprocess: hextobin (cstring string) // convert a hexadecimal number to a binary number {If (string = "0") Return "0000 "; if (string = "1") Return "0001"; if (string = "2") Return "0010"; if (string = "3 ") return "0011"; if (string = "4") Return "0100"; if (string = "5") Return "0101 "; if (string = "6") Return "0110"; if (string = "7") Return "0111"; if (string = "8 ") return "1000"; if (string = "9") Return "1001"; if (string = "A") Return "1010 "; if (string = "B") Return "1011"; if (string = "C") Return "1100"; if (string = "D ") return "1101"; if (string = "e") Return "1110"; if (string = "F") Return "1111"; Return "";} cstring cxmlprocess: bintohex (cstring binstring) // convert the binary number to hexadecimal {If (binstring = "0000") Return "0 "; if (binstring = "0001") Return "1"; if (binstring = "0010") Return "2"; if (binstring = "0011 ") return "3"; if (binstring = "0100") Return "4"; if (binstring = "0101") Return "5 "; if (binstring = "0110") Return "6"; if (binstring = "0111") Return "7"; if (binstring = "1000 ") return "8"; if (binstring = "1001") Return "9"; if (binstring = "1010") Return ""; if (binstring = "1011") Return "B"; if (binstring = "1100") Return "C"; if (binstring = "1101 ") return "D"; if (binstring = "1110") Return "e"; if (binstring = "1111") Return "F"; Return "";} int cxmlprocess: bintoint (cstring string) // convert the binary character data to a 10-digit integer {int Len = 0; int tempint = 0; int strint = 0; for (INT I = 0; I <string. getlength (); I ++) {tempint = 1; strint = (INT) string. getat (I)-48; For (int K = 0; k <7-i; k ++) {tempint = 2 * tempint;} Len + = tempint * strint ;} return Len ;}
UTF-8 to gb2312 first convert the UTF-8 to Unicode. Then the Unicode through the function widechartomultibyte to gb2312
Wchar * cxmlprocess: utf_8tounicode (char * ustart) // convert the UTF-8 to Unicode {char char_one; char char_two; char char_three; int hchar; int lchar; char uchar [2]; wchar * Unicode; cstring string_one; cstring string_two; cstring string_three; cstring combistring; char_one = * ustart; char_two = * (ustart + 1); char_three = * (ustart + 2 ); string_one.format ("% x", char_one); string_two.format ("% x", char_two); string_three.format ("% x", char_three); string_three = second (2 ); string_two = string_two.right (2); string_one = string_one.right (2); string_three = hextobin (random (1) + hextobin (string_three.right (1 )); string_two = hextobin (string_two.left (1) + hextobin (string_two.right (1); string_one = hextobin (bytes (1) + hextobin (string_one.right (1 )); combistring = string_one + string_two + string_three; combistring = combistring. right (20); combistring. delete (4, 2); combistring. delete (10, 2); hchar = bintoint (combistring. left (8); lchar = bintoint (combistring. right (8); uchar [1] = (char) hchar; uchar [0] = (char) lchar; Unicode = (wchar *) uchar; return Unicode ;} char * cxmlprocess: unicodetogb2312 (unsigned short udata) // converts Unicode to gb2312 {char * buffer; buffer = new char [sizeof (wchar)]; widechartomultibyte (cp_acp, null, & udata, 1, buffer, sizeof (wchar), null, null); Return buffer ;}
Gb2312 to UTF-8: First gb2312 through the multibytetowidechar function to convert Unicode. Then the Unicode By disassembling Unicode assembled into a UTF-8.
Wchar * cxmlprocess: gb2312tounicode (char * gbbuffer) // convert gb2312 to Unicode {wchar * unichar; unichar = new wchar [1];: multibytetowidechar (cp_acp, mb_precomposed, gbbuffer, 2, unichar, 1); Return unichar;} Char * cxmlprocess: unicodetoutf_8 (wchar * unichar) // convert Unicode to UTF-8 {char * buffer; cstring strone; cstring strtwo; cstring strthree; cstring strfour; cstring strand; buffer = new char [3]; int hint, Lint; hint = (INT) (* unichar)/256 ); lint = (* unichar) % 256; cstring string; string. format ("% x", hint); strtwo = hextobin (string. right (1); string = string. left (string. getlength ()-1); strone = hextobin (string. right (1); string. format ("% x", lint); strfour = hextobin (string. right (1); string = string. left (string. getlength ()-1); strthree = hextobin (string. right (1); strand = strone + strtwo + strthree + strfour; strand. insert (0, "1110"); strand. insert (8, "10"); strand. insert (16, "10"); strone = strand. left (8); strand = strand. right (16); strtwo = strand. left (8); strthree = strand. right (8); * buffer = (char) bintoint (strone); buffer [1] = (char) bintoint (strtwo); buffer [2] = (char) bintoint (strthree); Return buffer ;}
Example: Call to convert gb2312 into a UTF-8:
Char * cxmlprocess: translatechartoutf_8 (char * xmlstream, int Len) {int newcharlen = 0; int oldcharlen = 0; int revcharlen = Len; char * newcharbuffer; char * finalcharbuffer; char * buffer; cstring string; buffer = new char [sizeof (wchar)]; newcharbuffer = new char [int (1.5 * revcharlen)]; // set the maximum buffer while (oldcharlen <revcharlen) {If (* (xmlstream + oldcharlen)> = 0) {* (newcharbuffer + newcharlen) = * (xmlstream + oldcharlen); newcharlen ++; oldcharlen ++ ;} // else {wchar * pbuffer = This-> gb2312tounicode (xmlstream + oldcharlen); buffer = This-> unicodetoutf_8 (pbuffer ); * (newcharbuffer + newcharlen) = * buffer; * (newcharbuffer + newcharlen + 1) = * (buffer + 1); * (newcharbuffer + newcharlen + 2) = * (buffer + 2); newcharlen + = 3; oldcharlen + = 2 ;}} newcharbuffer [newcharlen] = ''/0''; cstring string1; string1.format ("% s", newcharbuffer); finalcharbuffer = new char [newcharlen + 1]; memcpy (finalcharbuffer, newcharbuffer, newcharlen + 1); Return finalcharbuffer ;}
The program is very simple, because it is too poor. I have eaten instant noodles for two days. So now I am dizzy, and I will not write a detailed description of the program. It's rare for programmers to reach the level like me. There is no way to reduce your salary. Hey !!!!