Unicode and ASCII Conversion

Source: Internet
Author: User
Tags ole

At any time, as long as the com method returns a string, this string is a unicode string (this refers to all the methods written to the com standard ). Unicode is a character delimiter set, which is similar to ASCII, but represents a character in two bytes. If you want to better control or operate the string, convert it to a tchar string.
Tchar and functions starting with _ T (such as _ tcscpy () are designed to allow you to process Unicode and ANSI strings with the same source code. In most cases, the code is used to process ANSI strings and ANSI windowsapis. You should be familiar with the tchar type, especially when you read the code written by others, pay special attention to the tchar type.
When you return a unicode string from a com method, you can convert it to a char string using one of the following methods:

1. Call the widechartomultibyte () API.
2. Call the CRT function wcstombs ().
3. Use the cstring constructor or assign values (only for MFC ).
4. Use the ATL String Conversion macro.

1. widechartomultibyte ()
You can use widechartomultibyte () to convert a unicode string into an ANSI string. The prototype of this function is as follows:
Int widechartomultibyte (
Uint codePage,
DWORD dwflags,
Lpcwstr lpwidecharstr,
Int cchwidechar,
Lpstr lpmultibytestr,
Int cbmultibyte,
Lpstr lpdefaultchar,
Lpbool lpuseddefaultchar );

The following describes the parameters:
CodePage
The code page to which Unicode characters are converted. You can pass cp_acp to use the current ANSI code page. The code page contains 256 character sets. The character 0-127 is the same as the ANSI encoding. The character 128-255 is different from the ANSI character. It can contain graphical characters or pronunciation symbols. Each language or region has its own code page, so using the correct code page is important for correctly displaying accent characters.
Dwflags
Dwflags determines how Windows processes "composite" Unicode characters. It is a character with a pronunciation symbol. RUE is a composite character. If these characters are in the code page specified by the codePage parameter, nothing happens. Otherwise, it must be converted in windows.
Pass wc_compositecheck so that this API checks non- ing compound characters.
Passing wc_sepchars enables Windows to divide the characters into two segments, that is, the character plus the pronunciation, such as E '.
Passing wc_discardns causes windows to discard the pronunciation symbol.
Passing wc_defaultchar replaces the composite character with the default characters described in the lpdefaultchar parameter in windows.
The default behavior is wc_sepchars.
Lpwidecharstr
Unicode string to be converted.
Cchwidechar
The length of lpwidecharstr in Unicode characters. Generally, the value-1 indicates that the string ends with 0x00.
Lpmultibytestr
Character buffering of converted strings
Cbmultibyte
Bytes of lpmultibytestr.
Lpdefaultchar
Optional-a single-character ANSI string passed when dwflags contains wc_compositecheck | wc_defaultchar and a Unicode Character cannot be mapped to the same ANSI string, including the inserted "default" character. Null can be passed to allow the API to use the system default character (a question mark ).
Lpuseddefaultchar
(Optional) a pointer to the bool type to indicate whether the default character has been inserted with an ANSI string. Null can be passed to ignore this parameter.

The following example shows how to use this API:
// Assume that you already have a unicode string wszsomestring...
Char szansistring [max_path];

Widechartomultibyte (cp_acp, // ANSI code page
Wc_compositecheck, // check the accent character
Wszsomestring, // original Unicode string
-1, //-1 indicates that the string ends with 0x00
Szansistring, // The destination char string
Sizeof (szansistring), // buffer size
Null, // default fat string
Null); // ignore this parameter
After this function is called, szansistring will contain the ANSI version of the Unicode string.

2, wcstombs ()
This CRT function wcstombs () is a simplified version, but it ends the call of widechartomultibyte (), so the final result is the same. The prototype is as follows:
Size_t wcstombs (
Char * mbstr,
Const wchar_t * wcstr,
Size_t count );

The following describes the parameters:
Mbstr
Receives the character (char) buffer of the ANSI string.
Wcstr
Unicode string to be converted.
Count
The buffer size specified by the mbstr parameter.

Wcstombs () uses the wc_compositecheck | wc_sepchars flag in its call to widechartomultibyte. Use wcstombs () to convert the Unicode string in the preceding example. The result is the same:

Wcstombs (szansistring, wszsomestring, sizeof (szansistring ));

3. cstring
The cstring in MFC contains constructors and values assigned to Unicode strings. Therefore, you can use cstring for conversion. For example:

// Assume there is a unicode string wszsomestring...

Cstring str1 (wszsomestring); // use the constructor to convert
Cstring str2;

Str2 = wszsomestring; // convert with a value assignment

4. ATL macro
ATL has a set of convenient macros for String Conversion. W2a () is used to convert a unicode string to an ANSI string (the memory is "wide to ANSI"-wide to ANSI ). In fact, ole2a () is more accurate. "Ole" indicates a com or OLE string. The following is an example of using these macros:

# Include <atlconv. h>

// Assume that there is a unicode string wszsomestring...

{
Char szansistring [max_path];
Uses_conversion; // declare the local variable to be used by this macro

Lstrcpy (szansistring, ole2a (wszsomestring ));
}

The ole2a () macro "returns" the pointer to the converted string, but the converted string is stored in a temporary stack variable. Therefore, you must use lstrcpy () to obtain your own copy. The other macros are w2t () (Unicode to tchar) and w2ct () (Unicode to constant tchar string ).
A macro is ole2ca () (Unicode to a constant char string). In the preceding example, ole2ca () is actually a more positive macro, because lstrcpy () the second parameter of is a constant char *. This topic will be discussed in detail later.

On the other hand, if you do not want to perform the preceding complex string processing, even if you want to keep it as a unicode string, if you are writing a console application, the full-course variable STD: wcout should be used to output/display Unicode strings, for example:

Wcout <wszsomestring;

But remember, STD: wcout only recognizes Unicode, so if you are a "normal" string, you must use STD: cout to output/display it. For the number of Unicode string texts, use the prefix L, such:

Wcout <L "the Oracle says..." <Endl <wsz?leresponse;

If the persistence string is Unicode, there are two restrictions for programming:

-The wcsxxx () Unicode string processing function must be used, such as wcslen ().
-In Windows 9x, Unicode strings cannot be transmitted in Windows APIs. To write applications that can run on both 9x and NT, The tchar type must be used. For details, see msdn.

String type definition
A ansi character string.
W unicode character string.
T generic character string (equivalent to w when _ Unicode is defined, equivalent to a otherwise ).
Ole character string (equivalent to W ).

Conversion of Chinese Characters
I cannot convert the data using the mbstowbs function.
Use lpcolestr pwzstr = t2cole (lpcstr pszstr)
Conversion successful
Usage:
# Include <atlconv. h>

Uses_conversion;
Lpcolestr pwzstr = t2cole (lpcstr pszstr );

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.