MultiByteToWideChar and WideCharToMultiByte usage explanation

Last Update:2016-08-14 Source: Internet

Author: User

Tags control characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The problem that was found when writing the INI file today:

TCHAR temp[ -]; //strcpy_s (temp, request.newversion);MultiByteToWideChar (CP_ACP,0, Request.newversion,-1, temp, -); WritePrivateProfileString (L"deviceinfo", L"firmwareversion", temp/*(LPCWSTR) request.newversion*//*L "1.0.15"*/, Getexpath () + L"Deviceinfo.ini");

Attention:

These two functions are conversion functions provided by Windows and do not have the versatility

The conversion function provided by the C language is mbstowcs ()/wcstombs ()

First, the function is simple introduction

Header files related to:

Function header file: windows.h

#include <windows.h>

wchar_t type Required header file: Wchar.h

#include <wchar.h>

(1) MultiByteToWideChar ()

function function: This function maps a string to a wide character (Unicode) string. The string mapped by the function is not necessarily a multibyte character group.

Function Prototypes:

int MultiByteToWideChar (

UINT CodePage, DWORD dwFlags, LPCSTR lpmultibytestr, int cchmultibyte, LPWSTR lpwidecharstr, int cchwidechar) ;

Parameters:

1> CodePage: Specifies the character set used by the multibyte character that performs the conversion

This parameter can be a value given by any character set that the system has installed or is in effect. You can also specify it as any of the following values:

value	description
CP_ACP	ansi code page
CP_MACCP	not supported
CP _OEMCP	oem code page
cp_symbol	not supported
CP_THREAD_ACP	not Suppo rted
cp_utf7	utf-7 code page
cp_utf8	utf-8 code page

2> DwFlags: A set of bit markers that indicate whether or not to convert to a preset or wide character (if a combination exists), whether to use glyphs instead of control characters, and how to handle invalid characters. You can specify that the following is a combination of tag constants, meaning the following: mb_precomposed: Usually use pre-made characters-that is, a character consisting of a base character and a non-null character has a single character value. This is the default conversion selection. cannot be used with the Mb_composite value. Mb_composite: Typically use combined characters-that is, characters consisting of a basic character and a non-null character have different character values. cannot be used with the mb_precomposed value. Mb_err_invalid_chars: If the function encounters an invalid input character, it will fail to run, and Getlasterro returns the Error_no_unicode_translation value. Mb_useglyphchars: Use glyphs instead of control characters. Combined characters consist of an underlying character and a non-null character, each of which has a different character value. Each pre-made character has a single character value for the composition of the base/non-null character. In character è, E is the base character, and the accent mark is a non-null character. The tags mb_precomposed and mb_composite are mutually exclusive, while the tags mb_useglyphchars and mb_err_invalid_chars can be set regardless of other tokens. Generally does not use these flags, so the value is 0 o'clock. 3> LPMULTIBYTESTR: The buffer that points to the string to be converted. 4> Cchmultibyte: Specifies the number of bytes in the string pointed to by the parameter lpmultibytestr. Can be set to-1, which automatically determines the length of the string specified by Lpmultibytestr (if the string is not aborted with a null character, set to 1 may fail, possibly succeeds), this parameter set to 0 will fail. 5> LPWIDECHARSTR: Pointer to the buffer that receives the converted string. 6> Cchwidechar: Specifies the number of wide bytes of the buffer to which the parameter lpwidecharstr points. If this value is 0, the function does not perform the conversion, but instead returns the number of wide characters required for the target cache lpwidechatstr. return value:

If the function succeeds and Cchwidechar is not 0, the return value is the number of wide characters written in the buffer pointed to by LPWIDECHARSTR;

If the function succeeds and Cchmultibyte is 0, the return value is the size of the wide character number required for the buffer of the string to be converted. (This situation is used to get the number of wchar_t required for conversion)

If the function fails to run, the return value is zero.

To get more error information, call the GetLastError () function. It can return the error code listed below:

Error_insufficient_buffer, Error_invalid_flags, Error_invalid_parameter, Error_no_unicode_translation. (2) WideCharToMultiByte ()

function function: This function maps a Unicode string into a multibyte string.

Function Prototypes:

int WideCharToMultiByte (

UINT CodePage, DWORD DwFlags, LPCWSTR LpwidecharstrInt Cchwidechar, LPSTR LpmultibytestrInt Cchmultibyte, LPCSTR Lpdefaultchar, Lpbool Pfuseddefaultchar);

Parameters:

Similar to the parameters in the MultiByteToWideChar () function, but with two more parameters:

Lpdefaultchar and Pfuseddefaultchar: Only if the WideCharToMultiByte function encounters a wide-byte character, The WideCharToMultiByte function uses both parameters when the character does not have its representation in the code page that is identified by the Ucodepage parameter. (usually null)

1> If a wide-byte character cannot be converted, the function uses the character pointed to by the Lpdefaultchar argument. If the parameter is null (this is the parameter value in most cases), then the function uses the system's default character. The default character is usually a question mark. This is dangerous for the file name because the question mark is a wildcard character.

The 2> Pfuseddefaultchar parameter points to a Boolean variable that, if at least one character in a Unicode string cannot be converted to an equivalent multibyte character, then the function resets the variable to true. If all characters are successfully converted, the function will set the variable to False. The variable can be tested when the function returns to check whether a wide-byte string has been successfully converted.

return value :

If the function succeeds and cchmultibyte is nonzero, the return value is the number of bytes written by the buffer pointed to by LPMULTIBYTESTR;

If the function succeeds and Cchmultibyte is zero, the return value is the number of bytes necessary to receive the buffer for the string to be converted. (This situation is used to get the number of char required for conversion)

If the function fails to run, the return value is zero.

To get more error information, call the GetLastError function. It can return the error code listed below:

Error_insufficient_bjffer;error_invalid_flags; error_invalid_parameter;error_no_unicode_translation.

Ii. Methods of Use

(1) Convert a multibyte string to a wide string:

1) Call the MultiByteToWideChar () function and set the Cchwidechar parameter to 0 (to obtain the desired receive buffer size for the conversion);

2) Gets the size of the input cache as the value of the cchmultibyte; (this is done to save space, you can also give Cchmultibyte value-1 (the string needs to end with a null character, otherwise it will be an error))

3) Allocate enough memory blocks to hold the converted Unicode string;

The size of the memory block is determined by the return value of the front-facing Cchwidechar () function, or it can be used in a different way, but the method saves memory.

4) Call the MultiByteToWideChar () function again, this time to pass the cached address as lpwidecharstr, parameter, and pass the first call to MultiByteToWideChar () The return value of the function is used as the value of the Cchwidechar parameter;

5) Use the converted string;

6) Release the memory block occupied by the receive buffer;

Example code:

voidMain () {Charsbuf[ -]={0}; strcpy (Sbuf,"I'm the best"); //get input Cache size    intSbufsize=strlen (SBUF); //get output Cache size//VC + + uses ANSI by default, so take the first parameter as CP_ACPDWORD Dbufsize=multibytetowidechar (CP_ACP,0, Sbuf, Sbufsize, NULL,0); printf ("need to wchar_t%u a \ n", dbufsize); wchar_t* dbuf=NewWchar_t[dbufsize]; Wmemset (Dbuf,0, dbufsize); //to convert    intNret=multibytetowidechar (CP_ACP,0, Sbuf, Sbufsize, Dbuf, dbufsize); if(nret<=0) {cout<<"conversion failed"<<Endl; DWORD Dwerr=GetLastError (); Switch(dwerr) { Caseerror_insufficient_buffer:printf ("error_insufficient_buffer\n");  Break;  Caseerror_invalid_flags:printf ("error_invalid_flags\n");  Break;  Caseerror_invalid_parameter:printf ("error_invalid_parameter\n");  Break;  Caseerror_no_unicode_translation:printf ("error_no_unicode_translation\n");  Break; }    }    Else{cout<<"Conversion Success"<<Endl; cout<<Dbuf; }    Delete(DBUF);}

Note: When you call Multichartowidechar () two times, the parameter cchmultibyte needs to be the same, or you might get an error such as insufficient receive cache, which can cause the conversion to fail!

(2) Lenient byte to narrow byte string

The steps are similar to (1), so do not repeat

The code examples are as follows:

    //Lenient string Conversion narrow stringwchar_t sbuf[ -]={0}; wcscpy (Sbuf, L"I'm the best"); //Gets the target cache size required by the transformationDWORD Dbufsize=widechartomultibyte (CP_OEMCP,0, Sbuf,-1Null0, NULL, FALSE); //Assigning a target cache    Char*dbuf =New Char[Dbufsize]; memset (Dbuf,0, dbufsize); //Conversion    intNret=widechartomultibyte (CP_OEMCP,0, Sbuf,-1, Dbuf, Dbufsize, NULL, FALSE); if(nret<=0) {printf ("conversion failed \ n"); }    Else{printf ("Conversion Successful \nafter convert:%s\n", DBUF); }    Delete[]dbuf;

Three, MultiByteToWideChar () function garbled problem

Some friends may have found that in the standard WinCE4.2 or WinCE5.0 SDK emulator, this function does not work properly, after the conversion of the character Embox is garbled!

Changing the MultiByteToWideChar () parameter in a timely manner is still the case. But this is not a code problem, its crux is the custom operating system. This can happen if our custom operating system default language is not Chinese.

Because the Standard SDK default language is English, this problem is sure to occur. The solution to this problem cannot be changed simply by changing the "default language" of the "Regional Options" in the control Panel, but by selecting the default language as "Chinese" when the system is customized. System Customization Select the default language location at: Platform----Setting, default language, select "Chinese", then compile.

Unicode: Wide-byte character set
1. How do I get the number of characters that contain both a single-byte character and a double-byte character string?
You can call the run-time library of Microsoft Visual C + + to include the function _mbslen to manipulate multibyte (including both single-byte and double-byte) strings.
Calling the Strlen function does not really understand how many characters are in a string, it can only tell you how many bytes were before the end of 0.
2. How do I manipulate a DBCS (double-byte character set) string?
Function description
Ptstr Charnext (LPCTSTR); Returns the address of the next character of a string
Ptstr Charprev (LPCTSTR, LPCTSTR); Returns the address of one of the characters in the string
BOOL Isdbcsleadbyte (byte); Returns a value other than 0 if the byte is the first byte of a DBCS character
3. Why use Unicode?
(1) It is easy to exchange data between different languages.
(2) enables you to allocate a single binary. exe file or DLL file that supports all languages.
(3) Improve the operation efficiency of the application.
Windows 2000 was developed from scratch using Unicode, and if you call any of the Windows functions and pass an ANSI string to it, the system first converts the string to Unicode and then passes the Unicode string to the operating system. If you want the function to return an ANSI string, the system first converts the Unicode string to an ANSI string, and then returns the result to your application. The conversion of these strings takes up the time and memory of the system. By developing an application from scratch with Unicode, you can make your application run more efficiently.
Windows CE itself is an operating system that uses Unicode and does not support ANSI Windows functions at all
Windows 98 only supports ANSI and can only develop applications for ANSI.
When Microsoft Corporation converted COM from 16-bit Windows to Win32, the company decided that all COM interface methods that needed strings would accept only Unicode strings.
4. How do I write Unicode source code?
Microsoft Corporation has designed WINDOWSAPI for Unicode so that it can minimize the impact of the code. In fact, you can write a single source code file to compile it using or not using Unicode. To define only two macros (UNICODE and _UNICODE), you can modify and recompile the source file.
_UNICODE macros are used for C run-time header files, while Unicode macros are used for Windows header files. When you compile a source code module, you typically have to define both macros at the same time.
5. What are the Unicode data types defined by Windows?
Data type description
WCHAR Unicode characters
Pwstr pointer to a Unicode string
Pcwstr pointer to a constant Unicode string
The corresponding ANSI data types are CHAR,LPSTR and LPCSTR.
The Ansi/unicode universal data type is TCHAR,PTSTR,LPCTSTR.
6. How do I work with Unicode?
Character Set attribute instances
ANSI operation function starts with STR strcpy
The Unicode action function starts with WCS wcscpy
The MBCS action function starts with _mbs _mbscpy
Ansi/unicode action function starts with _tcs _tcscpy (C run-time library)
Ansi/unicode action function starts with LSTR lstrcpy (Windows functions)
All new and obsolete functions have both ANSI and Unicode two versions in Windows2000. The end of the ANSI version function is represented by a, and the Unicode version function ends with W. Windows is defined as follows:
#ifdef UNICODE
#define CREATEWINDOWEX CREATEWINDOWEXW
#else
#define CREATEWINDOWEX Createwindowexa
#endif//! Unicode
7. How do I represent Unicode string constants?
Character Set instances
ANSI "string"
Unicode L "string"
Ansi/unicode T ("string") or _text ("string") if (szerror[0] = = _text (' J ')) {}
8. Why should I use operating system functions as much as possible?
This will help slightly improve the performance of your application because the operating system string functions are often used by large applications such as the shell process of the operating system Explorer.exe. Because these functions are used so much, they may have been loaded into RAM while the application is running.
such as: strcat,strchr,strcmp and strcpy and so on.
9. How do I write an ANSI-and Unicode-compliant application?
(1) treats a text string as an array of characters rather than a chars array or a byte array.
(2) Use common data types (such as TCHAR and PTSTR) for text characters and strings.
(3) Use explicit data types (such as Byte and Pbyte) for byte, byte pointers, and data caches.
(4) Use the text macro for literal characters and strings.
(5) Perform a global substitution (for example, replace PSTR with PTSTR).
(6) Modify the string arithmetic problem. For example, a function typically wants to pass a cache size in a character, rather than a byte. This means that sizeof (szbuffer) should not be passed, but should be passed (sizeof (szbuffer)/sizeof (TCHAR). In addition, if you need to allocate a memory block to a string and have the number of characters in that string, remember to allocate memory in bytes. This means that malloc (ncharacters *sizeof (TCHAR)) should be called instead of calling malloc (Ncharacters).
10. How do I make a selective comparison of strings?
implemented by calling CompareString.
Logo meaning
Norm_ignorecase ignores the case of letters
Norm_ignorekanatype does not distinguish between hiragana and katakana characters
Norm_ignorenonspace Ignore no spacing characters
Norm_ignoresymbols Ignore Symbols
Norm_ignorewidth does not differentiate between single-byte characters and the same character as double-byte characters
Sort_stringsort to handle punctuation as a normal symbol
11. How can I tell if a text file is ANSI or Unicode?
Judge if the first two bytes of the text file are 0xFF and 0xFE, then Unicode, otherwise ANSI.
12. How can I tell if a string is ANSI or Unicode?
Use Istextunicode to judge. Istextunicode uses a series of statistical methods and qualitative methods to guess the contents of the cache. Since this is not an exact scientific method, it is possible for Istextunicode to return incorrect results.
13. How do I convert a string between Unicode and ANSI?
The Windows function MultiByteToWideChar is used to convert a multibyte string into a wide string; the function WideCharToMultiByte converts a wide string into an equivalent multibyte string.

Reference: http://blog.csdn.net/xiaobai1593/article/details/7382984

Http://www.lxway.com/140002681.htm

MultiByteToWideChar and WideCharToMultiByte usage explanation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More