Interchange between UTF-8 and gb2312

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: Wu kangbin

I believe that many program developers often encounter character encoding problems, which is also a headache. Because these are potential errors, you must have development experience in this area to identify these errors. Especially when processing XML documents, this problem occurs more frequently. Once a server program is written in Java and the client interaction is written in VC. Interaction protocols are all written in XML. The result is that the data reception is incorrect during communication. Wondering! So I captured the data using the network packet capture tool and later found that the xml header on Java was like this. <? XML version = "1.0" encoding = "UTF-8"?>, The default value for VC is gb2312. Therefore, Chinese character data is incorrect. I have very few articles in this regard. For such problems, I will introduce a conversion program I wrote. Of course, the program is very simple. I hope you will have a smile if you have more fun.
If you are still very unfamiliar with UTF-8, Unicode, gb2312, etc, please view http://www.linuxforum.net/books/4268-unicode.html, I am not a waste of words here. The following describes two winapi functions: widechartomultibyte and multibytetowidechar.

Function prototype:

Int trim (uint codePage, // code pagedword dwflags, // performance and mapping flagslpcwstr lpwidecharstr, // wide-character stringint cchwidechar, // Number of chars in stringlpstr trim, // buffer for new stringint cbmultibyte, // size of bufferlpcstr lpdefaultchar, // default for unmappable charslpbool lpuseddefaultchar // set when default char used ); // convert a wide character to Multiple Narrow characters int multibytetowidechar (uint codePage, // code pagedword dwflags, // character-type optionslpcstr lpmultibytestr, // string to mapint cbmultibyte, // number of bytes in stringlpwstr lpwidecharstr, // wide-character bufferint cchwidechar // size of buffer); // convert Multiple Narrow characters into wide characters

The following functions are required:

Cstring cxmlprocess: hextobin (cstring string) // convert a hexadecimal number to a binary number {If (string = "0") Return "0000 "; if (string = "1") Return "0001"; if (string = "2") Return "0010"; if (string = "3 ") return "0011"; if (string = "4") Return "0100"; if (string = "5") Return "0101 "; if (string = "6") Return "0110"; if (string = "7") Return "0111"; if (string = "8 ") return "1000"; if (string = "9") Return "1001"; if (string = "A") Return "1010 "; if (string = "B") Return "1011"; if (string = "C") Return "1100"; if (string = "D ") return "1101"; if (string = "e") Return "1110"; if (string = "F") Return "1111"; Return "";} cstring cxmlprocess: bintohex (cstring binstring) // convert the binary number to hexadecimal {If (binstring = "0000") Return "0 "; if (binstring = "0001") Return "1"; if (binstring = "0010") Return "2"; if (binstring = "0011 ") return "3"; if (binstring = "0100") Return "4"; if (binstring = "0101") Return "5 "; if (binstring = "0110") Return "6"; if (binstring = "0111") Return "7"; if (binstring = "1000 ") return "8"; if (binstring = "1001") Return "9"; if (binstring = "1010") Return ""; if (binstring = "1011") Return "B"; if (binstring = "1100") Return "C"; if (binstring = "1101 ") return "D"; if (binstring = "1110") Return "e"; if (binstring = "1111") Return "F"; Return "";} int cxmlprocess: bintoint (cstring string) // convert the binary character data to a 10-digit integer {int Len = 0; int tempint = 0; int strint = 0; for (INT I = 0; I <string. getlength (); I ++) {tempint = 1; strint = (INT) string. getat (I)-48; For (int K = 0; k <7-i; k ++) {tempint = 2 * tempint;} Len + = tempint * strint ;} return Len ;}

UTF-8 to gb2312 first convert the UTF-8 to Unicode. Then the Unicode through the function widechartomultibyte to gb2312

Wchar * cxmlprocess: utf_8tounicode (char * ustart) // convert the UTF-8 to Unicode {char char_one; char char_two; char char_three; int hchar; int lchar; char uchar [2]; wchar * Unicode; cstring string_one; cstring string_two; cstring string_three; cstring combistring; char_one = * ustart; char_two = * (ustart + 1); char_three = * (ustart + 2 ); string_one.format ("% x", char_one); string_two.format ("% x", char_two); string_three.format ("% x", char_three); string_three = second (2 ); string_two = string_two.right (2); string_one = string_one.right (2); string_three = hextobin (random (1) + hextobin (string_three.right (1 )); string_two = hextobin (string_two.left (1) + hextobin (string_two.right (1); string_one = hextobin (bytes (1) + hextobin (string_one.right (1 )); combistring = string_one + string_two + string_three; combistring = combistring. right (20); combistring. delete (4, 2); combistring. delete (10, 2); hchar = bintoint (combistring. left (8); lchar = bintoint (combistring. right (8); uchar [1] = (char) hchar; uchar [0] = (char) lchar; Unicode = (wchar *) uchar; return Unicode ;} char * cxmlprocess: unicodetogb2312 (unsigned short udata) // converts Unicode to gb2312 {char * buffer; buffer = new char [sizeof (wchar)]; widechartomultibyte (cp_acp, null, & udata, 1, buffer, sizeof (wchar), null, null); Return buffer ;}

Gb2312 to UTF-8: First gb2312 through the multibytetowidechar function to convert Unicode. Then the Unicode By disassembling Unicode assembled into a UTF-8.

Wchar * cxmlprocess: gb2312tounicode (char * gbbuffer) // convert gb2312 to Unicode {wchar * unichar; unichar = new wchar [1];: multibytetowidechar (cp_acp, mb_precomposed, gbbuffer, 2, unichar, 1); Return unichar;} Char * cxmlprocess: unicodetoutf_8 (wchar * unichar) // convert Unicode to UTF-8 {char * buffer; cstring strone; cstring strtwo; cstring strthree; cstring strfour; cstring strand; buffer = new char [3]; int hint, Lint; hint = (INT) (* unichar)/256 ); lint = (* unichar) % 256; cstring string; string. format ("% x", hint); strtwo = hextobin (string. right (1); string = string. left (string. getlength ()-1); strone = hextobin (string. right (1); string. format ("% x", lint); strfour = hextobin (string. right (1); string = string. left (string. getlength ()-1); strthree = hextobin (string. right (1); strand = strone + strtwo + strthree + strfour; strand. insert (0, "1110"); strand. insert (8, "10"); strand. insert (16, "10"); strone = strand. left (8); strand = strand. right (16); strtwo = strand. left (8); strthree = strand. right (8); * buffer = (char) bintoint (strone); buffer [1] = (char) bintoint (strtwo); buffer [2] = (char) bintoint (strthree); Return buffer ;}

Example: Call to convert gb2312 into a UTF-8:

Char * cxmlprocess: translatechartoutf_8 (char * xmlstream, int Len) {int newcharlen = 0; int oldcharlen = 0; int revcharlen = Len; char * newcharbuffer; char * finalcharbuffer; char * buffer; cstring string; buffer = new char [sizeof (wchar)]; newcharbuffer = new char [int (1.5 * revcharlen)]; // set the maximum buffer while (oldcharlen <revcharlen) {If (* (xmlstream + oldcharlen)> = 0) {* (newcharbuffer + newcharlen) = * (xmlstream + oldcharlen); newcharlen ++; oldcharlen ++ ;} // else {wchar * pbuffer = This-> gb2312tounicode (xmlstream + oldcharlen); buffer = This-> unicodetoutf_8 (pbuffer ); * (newcharbuffer + newcharlen) = * buffer; * (newcharbuffer + newcharlen + 1) = * (buffer + 1); * (newcharbuffer + newcharlen + 2) = * (buffer + 2); newcharlen + = 3; oldcharlen + = 2 ;}} newcharbuffer [newcharlen] = ''/0''; cstring string1; string1.format ("% s", newcharbuffer); finalcharbuffer = new char [newcharlen + 1]; memcpy (finalcharbuffer, newcharbuffer, newcharlen + 1); Return finalcharbuffer ;}

The program is very simple, because it is too poor. I have eaten instant noodles for two days. So now I am dizzy, and I will not write a detailed description of the program. It's rare for programmers to reach the level like me. There is no way to reduce your salary. Hey !!!!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Interchange between UTF-8 and gb2312

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Interchange between UTF-8 and gb2312

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support