How to correctly use MultiByteToWideChar and WideCharToMultiByte and detailed descriptions of parameters

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is excerpted from "Windows core programming" (fifth edition) Page26.

The usage of these two functions has been described in detail in this article. I am here only as a memorandum. For more information about function parameters, see Baidu encyclopedia MultiByteToWideChar and WideCharToMultiByte.

Function prototype:

int MultiByteToWideChar( 
　　　　UINT CodePage, 
　　　　DWORD dwFlags, 
　　　　LPCSTR lpMultiByteStr, 
　　　　int cchMultiByte, 
　　　　LPWSTR lpWideCharStr, 
　　　　int cchWideChar 
　　); 
int WideCharToMultiByte(
　　　　UINT CodePage, 
　　　　DWORD dwFlags, 
　　　　LPWSTR lpWideCharStr, 
　　　　int cchWideChar, 
　　　　LPCSTR lpMultiByteStr, 
　　　　int cchMultiByte, 
　　　　LPCSTR lpDefaultChar, 
　　　　PBOOL pfUsedDefaultChar 
　　);

The general steps are as follows:
MultiByteToWideChar:
1) Call MultiByteToWideChar to input null for the lpWideCharStr parameter, 0 for the cchWideChar parameter, and-1 for the cchMultiByte parameter;
2) allocate a memory sufficient to accommodate the converted Unicode string. Its size is the return value of the previous MultiByteToWideChar call multiplied by sizeof (wchar_t );
3) Call MultiByteToWideChar again. This time, the buffer address is passed as the value of the lpWideCharStr parameter, and the return value of the first MultiByteToWideChar call is multiplied by sizeof (wchar_t) the obtained size is passed in as the value of the cchWideChar parameter;
4) use the converted string;
5) release memory blocks occupied by Unicode strings.

WideCharToMultiByte:
The procedure is similar to the previous one. The only difference is that the return value is the number of bytes required to ensure successful conversion, so multiplication is not required.

In chapter 2 of Windows core programming (character and string processing), many character and string standard processing methods are mentioned, such as the problem of string functions. Is it true that C library is used, or use MS to implement the suffix with _ s.

[Appendix] Windows core programming Chapter 2 pdf download: http://dl.dbank.com/c0parcjxsv

　　Detailed description of MultiByteToWideChar and WideCharToMultiByte Parameters
The following part is taken from: http://www.cnblogs.com/wanghao111/archive/2009/05/25/1489021.html#2270293

　　WideCharToMultiByteThis function converts a wide string to a specified new string, such as ANSI and UTF8. The new string does not need to be a multi-byte character set.
Parameters:

CodePage: Specifies the character set code page to be converted to. It can be any installed or built-in character set. You can also use one of the following code pages.
CP_ACP current system ANSI code page
CP_MACCP current system Macintosh code page
CP_OEMCP current system OEM code page, a hardware scanning code of the original device manufacturer
CP_SYMBOL Symbol code page, used in Windows 2000 and later versions. I don't understand what it is
CP_THREAD_ACP the ANSI code page of the current thread, used in Windows 2000 and later versions. I don't understand what it is
CP_UTF7 UTF-7, both lpDefaultChar and lpuseddefachar char must be NULL when this value is set
CP_UTF8 UTF-8, which must be NULL for both lpDefaultChar and lpuseddefachar char
/* I think CP_ACP and CP_UTF8 are the most common ones. The former converts wide characters to ANSI and the latter to UTF8. */

DwFlags: Specifies how to process characters without conversion, but if this parameter is not set, the function runs faster. I set it to 0. The following table lists the configurable values:
WC_NO_BEST_FIT_CHARS converts Unicode characters that cannot be directly converted into multi-byte characters to the Default Characters specified by lpDefaultChar. That is to say, if Unicode is converted into multi-byte characters and then converted back, you do not necessarily get the same Unicode character, because the default character may be used during this period. This option can be used independently or with other options.
WC_COMPOSITECHECK converts the composite characters into premade characters. It can be used in combination with any of the last three options. If it is not in combination with any of them, it is the same as the option WC_SEPCHARS.
WC_ERR_INVALID_CHARS this option will cause the function to fail to return invalid characters, and GetLastError will return the error code ERROR_NO_UNICODE_TRANSLATION. Otherwise, the function automatically discards invalid characters. This option can only be used for UTF8.
WC_DISCARDNS discards characters that do not occupy space during conversion and is used together with WC_COMPOSITECHECK.
Separate characters are generated during WC_SEPCHARS conversion. This is the default conversion option, which is used together with WC_COMPOSITECHECK.
When converting wc_defachar char, use the default character instead of the exception character (the most common is '? '), Used with WC_COMPOSITECHECK.
/* When WC_COMPOSITECHECK is specified, the function converts the composite characters into premade characters. A composite Character consists of a base character and a space-free character (such as the phonetic alphabet of European countries and Chinese pinyin). Each character has a different character value. Premade characters have a single character value that represents the base character and the compositing body that does not occupy space. When you specify the WC_COMPOSITECHECK option, you can also use the last three options listed in the Table above to customize the conversion rules for premade characters. These options determine the behavior of the function when there are no pre-fabricated characters in the synthesis of wide strings. They are used with WC_COMPOSITECHECK. If none of them are specified, the function defaults to WC_SEPCHARS. DwFlags must be 0 for the following code pages; otherwise, the function returns the error code ERROR_INVALID_FLAGS. 50220 50221 50222 50225 50227 50229 52936 54936 to 57002 57011 (UTF7) 42 (Symbol)
For UTF8, dwFlags must be 0 or WC_ERR_INVALID_CHARS. Otherwise, the function returns a failure and sets the error code ERROR_INVALID_FLAGS. You can call GetLastError to obtain it. */

LpWideCharStr: The width string to be converted.
CchWideChar: The length of the string to be converted.-1 indicates the conversion to the end of the string.
LpMultiByteStr: The buffer that outputs the new string after receiving the conversion.
CbMultiByte: Size of the output buffer. If it is 0, lpMultiByteStr will be ignored. The function will return the size of the required buffer instead of lpMultiByteStr.
LpDefaultChar: Pointer to a character. This character is used as the default character when the corresponding character cannot be found in the specified encoding. If it is NULL, the system default character is used. If the dwFlags parameter is required to be NULL, the function returns an error and sets the error code ERROR_INVALID_PARAMETER.
LpUsedDefaultChar: Pointer to the switch variable to indicate whether the default character has been used. If the dwFlags parameter is required to be NULL, the function returns an error and sets the error code ERROR_INVALID_PARAMETER. Both lpDefaultChar and lpuseddefachar Char are set to NULL, and the function is faster.
/* Note: improper use of the function WideCharToMultiByte may affect program security. Calling this function can easily cause memory leakage because the size of the input buffer pointed to by lpWideCharStr is the number of characters in width, and the size of the output buffer pointed to by lpMultiByteStr is the number of bytes. To avoid Memory leakage, make sure to specify an appropriate size for the output buffer. My method is to first make the cbMultiByte 0 call WideCharToMultiByte once to get the required buffer size, allocate space for the buffer, and then call WideCharToMultiByte to fill the buffer again. For details, see the following code. In addition, conversion from Unicode UTF16 to a non-Unicode Character Set may result in data loss because the character set may not be able to find characters that represent specific Unicode data. */

Return Value: if the function is successful and cbMultiByte is not 0, the number of bytes written to lpMultiByteStr (including null at the end of the string) is returned. If cbMultiByte is 0, the number of bytes required for conversion is returned. Function failed. 0 is returned.

　　MultiByteToWideCharIt is a conversion function from multi-byte characters to wide characters.
This function converts a multi-byte string to a wide string (Unicode). The string to be converted is not necessarily multi-byte.
For more information about the parameters, returned values, and precautions of this function, see the preceding WideCharToMultiByte description. Here, we only briefly describe dwFlags.

DwFlags: Whether to convert to premade or merged wide characters, whether to use image text for control characters, and how to process invalid characters.
MB_PRECOMPOSED always uses premade characters. When a single premade character is used, the base character and space character are not used. This is the default function option and cannot be used with MB_COMPOSITE.
MB_COMPOSITE always uses decomposition characters, that is, it always uses the base character + No space characters
MB_ERR_INVALID_CHARS sets this option. If the function encounters an invalid character, it fails and returns the error code ERROR_NO_UNICODE_TRANSLATION. Otherwise, the invalid character is discarded.
MB_USEGLYPHCHARS uses image characters instead of control characters
/* For the following code pages, dwFlags must be 0; otherwise, the function returns the error code ERROR_INVALID_FLAGS. 50220 50221 50222 50225 50227 50229 52936 54936 to 57002 57011 (UTF7) 42 (Symbol ). For UTF8, dwFlags must be 0 or MB_ERR_INVALID_CHARS. Otherwise, the function fails and the error code ERROR_INVALID_FLAGS */

Another example is provided for your reference in the operating environment (vc 6.0, 32-bit pirated win7 flagship edition)

View Code

# Include <windows. h>
Int APIENTRY WinMain (HINSTANCE hInstance,
HINSTANCE hPrevInstance,
LPSTR lpCmdLine,
Int nCmdShow)
{
// TODO: Place code here.
Wchar_t wszTest [] = L "ziwuge ";
Wchar_t wszTestNew [] = L "ziwuge blog Park ";
Int nwszTestLen = lstrlenW (wszTest); // 6
Int nwszTestNewLen = lstrlenW (wszTestNew); // 9
Int nwszTestSize = sizeof (wszTest); // 14
Int nwszTestNewSize = sizeof (wszTestNew); // 20
Int nChar = WideCharToMultiByte (CP_ACP, 0, wszTestNew,-1, NULL, 0, NULL, NULL); // 13, the returned result contains the memory to be occupied by '\ 0'
NChar = nChar * sizeof (char); // 13. In fact, this step is not required. Please refer to the description above.
Char * szResult = new char [nChar];
ZeroMemory (szResult, nChar );
Int I = WideCharToMultiByte (CP_ACP, 0, wszTestNew,-1, szResult, nChar, NULL, NULL); // 13
Int nszResultLen = lstrlenA (szResult); // 12
Int nszResultSize = sizeof (szResult); // 4

Char szTest [] = "ziwuge ";
Char szTestNew [] = "ziwuge blog ";
Int nszTestLen = lstrlenA (szTest); // 6
Int nszTestNewLen = lstrlenA (szTestNew); // 12
Int nszTestSize = sizeof (szTest); // 7
Int nszTestNewSize = sizeof (szTestNew); // 13
Int nWChar = MultiByteToWideChar (CP_ACP, 0, szTestNew,-1, NULL, 0); // 10, the returned result contains the memory occupied by '\ 0'
NWChar = nWChar * sizeof (wchar_t); // 20
Wchar_t * wszResult = new wchar_t [nWChar];
ZeroMemory (wszResult, nWChar );
Int j = MultiByteToWideChar (CP_ACP, 0, szTestNew,-1, wszResult, nWChar); // 10
Int nwszResultLen = lstrlenW (wszResult); // 9
Int nwszResultSize = sizeof (wszResult); // 4
Return 0;
}

[Thank you for your reference]

Http://www.cnblogs.com/wanghao111/tag/%E5% AE %BD%E5%AD%97%E7%AC%A6%E5%BA%93%E5%87%BD%E6%95%B0/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to correctly use MultiByteToWideChar and WideCharToMultiByte and detailed descriptions of parameters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to correctly use MultiByteToWideChar and WideCharToMultiByte and detailed descriptions of parameters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support