VC + + Processing UTF8 encoded string _

VC + + Processing UTF8 encoded string __HTML5

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Open a Notepad under Windows and save the file with four encoding choices below. ANSI, which is the multibyte character set, is the char (char) string in VC. Unicode, which is UTF16, is the WCHAR (wchar_t) string in VC. Unicode big endian, is UTF32, this kind of coding uses relatively few. UTF8, almost all of the pages are Utf8,utf8 with 1-4 bytes to encode all the characters, English only need 1 bytes, Chinese need 3-4 bytes. UTF8 can save network bandwidth as much as possible, because most of the characters that are transmitted on the network are mostly English, compared to UTF16. The UTF16 is at least 2 bytes, with a partial character of 4 bytes.

If we write a VC program, from getting HTML Web page data, the encoding of these data is UTF8, get to our VC program in the char character array will be found, English can be normal display, Chinese all garbled. Because our char string is ANSI-encoded. There are generally two ways to convert UTF8 to ANSI. One is the manual code implementation, Baidu search can find a lot of information, a thorough understanding of these character set coding, you can manually realize the conversion, online also have a lot of other people write a good conversion function. One way is to use a Third-party function library. Since we write programs under the Windows platform, we can use API functions to convert MultiByteToWideChar and WideCharToMultiByte. Using this function, we have to do two transitions, first use MultiByteToWideChar to convert the UTF8 encoded char string into a WCHAR string, the first parameter to indicate the code page we want to convert to Cp_utf8, that is, the meaning of UTF8. Then use the WideCharToMultiByte bar to convert the WCHAR string to a char string, the first parameter using the 936,936 code page means Simplified Chinese. About the code page knowledge can be Baidu encyclopedia.

The two ANSI and UTF8 functions I write are posted below. Parameter is a CString string in MFC, and if you want to pass in a C-style character array string, you need to modify it slightly. //utf8 to ANSI void Utf8toansi (CString &strutf8) { //Get the buffer size required to convert to a multiple-character section, create a multibyte buffer UINT nlen = MultiByteToWideChar (Cp_utf8, Null,strutf8,-1,null,null); WCHAR *wszbuffer = new wchar[nlen+1]; Nlen = MultiByteToWideChar (Cp_utf8,null,strutf8,-1,wszbuffer,nlen); Wszbuffer[nlen] = 0;
nlen = WideCharToMultiByte (936,null,wszbuffer,-1,null,null,null,null); CHAR *szbuffer = new char[nlen+1]; Nlen = WideCharToMultiByte (936,null,wszbuffer,-1,szbuffer,nlen,null,null); Szbuffer[nlen] = 0; strUTF8 = szbuffer; //Clean up memory delete []szbuffer; delete []wszbuffer; }

UTF8 to ANSI
void Utf8toansi (CString &strutf8)
{
	//Get the buffer size required to convert to a multiple-character section, create a multi-byte buffer
	UINT nlen = MultiByteToWideChar (cp_utf8,null,strutf8,-1,null,null);
	WCHAR *wszbuffer = new wchar[nlen+1];
	Nlen = MultiByteToWideChar (Cp_utf8,null,strutf8,-1,wszbuffer,nlen);
	Wszbuffer[nlen] = 0;

	Nlen = WideCharToMultiByte (936,null,wszbuffer,-1,null,null,null,null);
	CHAR *szbuffer = new char[nlen+1];
	Nlen = WideCharToMultiByte (936,null,wszbuffer,-1,szbuffer,nlen,null,null);
	Szbuffer[nlen] = 0;
	
	StrUTF8 = Szbuffer;
	Clean memory
	Delete []szbuffer;
	delete []wszbuffer;
}

//ansi Turn UTF8 void ANSItoUTF8 (CString &stransi) { //Get the buffer size required to convert to wide bytes, create a wide-byte buffer, 936 is the Simplified Chinese GB2312 code page UINT Nlen = MultiByteToWideChar (936,null,stransi,-1,null,null); WCHAR *wszbuffer = new wchar[nlen+1]; Nlen = MultiByteToWideChar (936,null,stransi,-1,wszbuffer,nlen); Wszbuffer[nlen] = 0; //Get the buffer size required to convert to UTF8 and create multibyte buffers Nlen = WideCharToMultiByte (cp_utf8,null,wszbuffer,-1,null,null,null,null); CHAR *szbuffer = new char[nlen+1]; Nlen = WideCharToMultiByte (cp_utf8,null,wszbuffer,-1,szbuffer,nlen,null,null); Szbuffer[nlen] = 0; stransi = szbuffer; //Memory cleanup delete []wszbuffer; delete []szbuffer; }

ANSI UTF8
void ANSItoUTF8 (CString &stransi)
{
	//Get the buffer size needed to convert to a wide byte, create a wide-byte buffer, 936 is a simplified Chinese GB2312 code page
	UINT Nlen = MultiByteToWideChar (936,null,stransi,-1,null,null);
	WCHAR *wszbuffer = new wchar[nlen+1];
	Nlen = MultiByteToWideChar (936,null,stransi,-1,wszbuffer,nlen);
	Wszbuffer[nlen] = 0;
	Gets the buffer size required to convert to UTF8, creating a multi-byte buffer
	Nlen = WideCharToMultiByte (cp_utf8,null,wszbuffer,-1,null,null,null,null);
	CHAR *szbuffer = new char[nlen+1];
	Nlen = WideCharToMultiByte (cp_utf8,null,wszbuffer,-1,szbuffer,nlen,null,null);
	Szbuffer[nlen] = 0;
	
	Stransi = Szbuffer;
	Memory Cleanup
	Delete []wszbuffer;
	delete []szbuffer;
}

It is noteworthy that the UTF8 encoded string is typically stored in a char (char) type array, but not in a WCHAR (wchar_t) type array. Why, then? Because the UTF8 encoded string is 1-4 bytes per character, and some characters only account for 1 bytes, it should be saved with a char-type array. And WCHAR, a WCHAR is two bytes, for a character that needs only one byte, it will go wrong.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

VC + + Processing UTF8 encoded string __HTML5

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

VC + + Processing UTF8 encoded string __HTML5

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support