Unicode and UTF8 Mutual conversion (using MultiByteToWideChar)

Last Update:2016-12-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Brief introduction

Recently in the sending network request encountered the problem of Chinese characters garbled, debugging characters in the code is normal, with the grab bag tool caught in the package text character display Normal, that is sent to the server on the display garbled, it will be the client and the server set a uniform encoding (UTF-8), and our program Unicode encoding is generally used, so it is necessary to convert Chinese characters to UTF-8 format, and other English characters and numbers do not need to be transferred. Here's how to tell.

2, code of the way Unicode turn UTF-8

char* UnicodeToUtf8 (Constwchar_t* Unicode) {int Len; Len = WideCharToMultiByte (Cp_utf8,0, Unicode,-1,Null0, null, NULL); char *szutf8 = (char*) malloc (len + 1); Memset ( SzUtf8, 0, Len + 1); WideCharToMultiByte (Cp_utf8, 0, Unicode,-1, SzUtf8, Len, null, null); return SzUtf8;} int Main (int argc, char *argv[]) { wchar_t* wcharunicode = L "China"; char* Ccharutf = UnicodeToUtf8 (Wcharunicode); return 0;}

The results are as follows:

We see a different character when viewed in VS after we have converted to UTF-8. In order to verify that the characters we are turning are correct, we can borrow the notepad++ tool. We created a new file, opened with notepad++, and the file encoding defaults to ANSI format, which shows the same value as when debugging in vs.

We modified the file code to UTF-8 after the look, is not shown normal, so verify that the conversion code is correct.

UTF-8 Turn Unicode

CString Utf82wcs (Constchar* szU8) {Pre-conversion, to get the size of the required space;int wcsLen =:: MultiByteToWideChar (Cp_utf8,NULL, SzU8, strlen (szU8),Null0);Allocate space to ' MultiByteToWideChar ' to leave a space, will not give the ' \ ' spacewchar_t* wszstring = new wchar_t[wcslen + 1"; //Conversion:: MultiByteToWideChar (Cp_utf8, NULL, SzU8, strlen (szU8) , wszstring, WcsLen); //finally added ' wszstring[wcslen ' =  ' n '; CString unicodestring (wszstring); Delete[] wszstring; wszstring = null; return unicodestring;} int Main (int argc, char *argv[]) { wchar_t* wcharunicode = L "China"; char* Ccharutf = UnicodeToUtf8 (Wcharunicode); CString Strunicode = Utf82wcs (CCHARUTF); return 0;}

From the results we see, the success of the UTF-8 encoding to Unicode encoding, the code is very simple, or to think more, practice more, more information.

Give a few small examples and look at the conversion results.

The following is the test code:

Example One:

length;wchar_t* wCharUnicode = L"中国你好";length = wcslen(wCharUnicode);                      // length = 4;char* cCharUtf = UnicodeToUtf8(wCharUnicode);length = strlen(cCharUtf); // length = 12;// 将UTF格式的char*转为CStringCString strUtf(cCharUtf);length = strUtf.GetLength(); // length = 6;CString strUnicode = UTF82WCS(cCharUtf);length = strUnicode.GetLength(); // length = 4;

Example Two:

length;wchar_t* wCharUnicode = L"中国,你好abc";length = wcslen(wCharUnicode);                      // length = 8;char* cCharUtf = UnicodeToUtf8(wCharUnicode);length = strlen(cCharUtf); // length = 16;// 将UTF格式的char*转为CStringCString strUtf(cCharUtf);length = strUtf.GetLength(); // length = 10;CString strUnicode = UTF82WCS(cCharUtf);length = strUnicode.GetLength(); // length = 8;

Here in the Chinese "China" and "hello" added English punctuation, showing normal.

Example Three:

length;wchar_t* wCharUnicode = L"中国，你好abc";length = wcslen(wCharUnicode);                      // length = 8;char* cCharUtf = UnicodeToUtf8(wCharUnicode);length = strlen(cCharUtf); // length = 18;// 将UTF格式的char*转为CStringCString strUtf(cCharUtf);length = strUtf.GetLength(); // length = 10;CString strUnicode = UTF82WCS(cCharUtf);length = strUnicode.GetLength(); // length = 8;

Here the Chinese "Chinese" and "hello" with Chinese punctuation, ccharutf in vs see the value, but can be turned into CString to view its value, the result is correct.

Tail

We see a Chinese character in the test results from three different instances, or punctuation marks, accounting for three bytes (data display UTF-8 encoding: Using variable length byte, 1:ascii, 2: Greek alphabet, 3: Kanji, 4: CJK Super-Large character set, here commonly used Chinese characters occupy 3, Less commonly used Chinese characters take up 4 bytes), Chinese punctuation marks and English punctuation is two bytes, here to pay special attention, and English characters in UTF-8 is a byte.

At the same time we can see the CString type variable to receive Unicode and UTF-8 encoded characters, here we see the length is inconsistent (here the value of the character length, not the number of bytes of characters , Although we see that the UTF-8 encoding is longer than the Unicode encoding, it is not absolute because UTF-8 does not have the same amount of memory when storing different characters, such as storing the ASCII code with only one byte. Unicode requires two bytes, about the coding problem is quite complicated, and it is Unicode storage ASCII also requires two bytes, there are UTF-8, UTF-16, UTF-32 and other different character encoding format, as to why there are so many encoding format, That's also because each encoding format preserves the size of the characters in an inconsistent space, such as UTF-8 to save an English letter with only one byte, while Unicode requires two bytes, but saves a Chinese character,UTF-8 Requires three bytes, while Unicode requires two bytes.

UTF is all called Unicode Transformation format, in fact, UTF-8 is one of the implementation of Unicode, UTF-8 one of the biggest feature is that it is a variable length encoding method . It can use 1~4 bytes to represent a symbol , varying the length of a byte depending on the symbol.

There are a lot of information on the Internet about character issues, but the argument is different, so still have to go through a multi-proof , here need to pay attention.

http://blog.csdn.net/goforwardtostep/article/details/53207804

Unicode and UTF8 Mutual conversion (using MultiByteToWideChar)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Unicode and UTF8 Mutual conversion (using MultiByteToWideChar)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support