Windows core programming-Character Set

Last Update:2018-12-04 Source: Internet

Author: User

Tags string back

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Unicode and windows

1. The real problem to be solved in software localization is actually how to handle different character sets. We used to use single-byte character sets for programming.
2 Single Character Set: encode a text string as a series of single-byte characters and add a zero character at the end. (Each character is expressed in one byte)
3 double byte character set (d B C S): In the Double Byte Character Set, each character in a string can contain one or two bytes.
4 Unicode Character Set: u n I c o d E provides a simple and consistent string representation method. All characters in the u n I c o d E string are 1 to 6 characters (two bytes ).
5 when m I c r o s o f t converted c o m from 1 6-bit wi n d o W S to Wi N 3 2, the company made a decision, that is, all c o m interface methods that require strings can only accept u n I c o d e strings.
6 C Runtime libraries support Unicode, even Windows 98.
7 Windows 2000's n o t e p a d (Notepad) application allows you to open both the u n I c o d e file and the n s I file, you can also create these files.
The 8 I s te X t u n I c o d e function can help distinguish between ansic characters and UNICODE:
DWORD istextunicode (const pvoid pvbuffer, int CB, pint presult );

The first parameter, p v B U FF E R, is used to identify the cache address to be tested. This data is an invalid pointer, because you do not know if you have a n s I character array or u n I c o d e
Character array.

The second parameter c B is used to set the number of bytes that p v B U FF E R points. Similarly, because you do not know what is stored in the cache, c B is the number of bytes rather than the number of characters. Note that you do not need to set the entire length of the cache. Of course, the more bytes I s te X t u n I c o d ecan be tested, the more accurate the result is.

The third parameter p r e s u l t is an integer address and must be initialized before calling I s te X t u n I c o d e. After the integer is initialized, you can specify which tests you want to perform for I s te X t u n I c o d e. You can also pass n u L for this parameter. In this case, I s te X t u n I c o d e will execute all the tests it can perform (for details, see the Platform SDK documentation ).

9 help functions for operations on the string of d B C S

Function	Description
Ptstr charnext (pctstr pszcurrentchar );	Returns the address of the next character in the string.
Ptstr charprev (pctstr pszstart, pctstr p s z C u r e n t c h a R );	Returns the address of the previous character in the string.
Bool isdbcsleadbytetrue (byte btestchar );	If this byte is the first byte of the DBCS character

10 "m I c r o s o f t support for u n I c o d e ":

• Windows 2000 supports both u n I c o d E and a n s I, so you can develop any application.

• Windows 98 only supports a n s I and can only develop applications for a n s I.

• Windows CE only supports u n I c o d e and can only develop applications for u n I c o d e.

11 wi n d o w s header file definition de uincode Data Type

Data Type	Description
W c h a r	U n I c o d e characters
P W S T R	Pointer to the u n I c o d E string
P C W S T R	Pointer to a constant u n I c o d E string

Example:

#ifdef UNICODE#define CreateWindowEx CreateWindowExW#else#define CreateWindowEx CreateWindowExA#endif //!UNICODE

Converts a string between Unicode and ANSI.

Wi n d o w s function m u l t I B y t e to wi d e c h a r is used to convert a multi-byte string to a wide string. The m u l t I B y t e to wi d e c h a r function is shown below.

int MultiByteToWideChar(    UINT CodePage,          //code page    DWORD dwFlags,          //character-type options    LPCSTR lpMultiByteStr,  //address of string to map    int cchMultiByte,       //number of bytes in string    LPWSTR lpWideCharStr,   //address of wide-character buffer    int cchWideChar         //size of buffer);

The U c o d e p a g e parameter is used to identify a code page number related to a multi-byte string. The d w f l a g s parameter is used to set another control, which can affect characters by distinguishing characters such as accents. These flags are generally not used and 0 is passed in the d w f l a g s parameter. The p m u l t I B y t e S t R parameter is used to set the string to be converted, the C h m u l t I B y t e parameter is used to specify the length of the string (in bytes ). If it is passed-1 for the C h m u l t I B y t e parameter, this function is used to determine the length of the source string.

The converted u n I c o d e version string will be written to the cache in the memory. Its address is specified by the P wi d e c h a R S T R parameter. The maximum value of the cache must be set in the c h wi d e c h a R parameter (measured in characters ). If you call m u l t I B y t e to wi d e c h a r, pass 0 to the C H wi d e c h a R parameter, this parameter will not perform String Conversion, but return the cached value required for successful conversion. In general, you can convert a multi-byte string to a u n I c o d e Equivalent string using the following steps:

1) Call the m u l t I B y t e to wi d e c h a r function, for P wi d e c h a r s t r parameter transfer n u L, for c h wi d e c h a R parameter transfer 0.
2) allocate enough memory blocks to store the converted u n I c o d E string. The size of the memory block is returned from the call to m u l t B y t e to wi d e c h a R.
3) Call m u l t I B y t e to wi d e c h a r again, this time, the cached address is passed as the P wi d e c h a r s t r parameter, and pass the cache size returned when m u l t I B y t e to wi d e c h a r is called for the first time as the C H wi d e c h a R parameter.
4. Use the converted string.
5) release the memory block occupied by the u n I c o d E string.
Function wi d e c h a r to m u l t I B y t e converts a wide string to an equivalent multi-byte string, as shown below:

int WideCharToMultiByte(  UINT CodePage,         // code page  DWORD dwFlags,         // performance and mapping flags  LPCWSTR lpWideCharStr, // address of wide-character string  int cchWideChar,       // number of characters in string  LPSTR lpMultiByteStr,  // address of buffer for new string  int cchMultiByte,      // size of buffer  LPCSTR lpDefaultChar,  // address of default for unmappable                          // characters  LPBOOL lpUsedDefaultChar   // address of flag set when default                              // char. used);

This function is similar to the m u l t I B I t e to wi d e c h a r function. Similarly, the U c o d e p a g e parameter is used to identify the code page related to the newly converted string. D w f l a g s is used to set other controls for conversion. These symbols can act on characters with differentiated symbols and characters that cannot be converted by the system. Generally, you do not need to control the degree of String Conversion. You will pass 0 for the d w f l a g s parameter.

The P wi d e c h a r s t r parameter is used to set the memory address of the string to be converted, the c h wi d e c h a R parameter is used to specify the length of the string (measured by the number of characters ). If you pass-1 for the c h wi d e c h a R parameter, this function is used to determine the length of the source string.

The converted multi-byte version of the string is written to the cache specified by the p m u l t I B y t e S t R parameter. The maximum value of the cache must be set in the C h m u l t I B y t e parameter (measured in bytes ). If 0 is passed as the C h m u l t I B y t e function of the wi d e c h a r to m u l t I B y t e function, this function returns the size value required by the target cache. Generally, you can use a series of similar events introduced when converting a multi-byte string to a wide byte string to a multi-byte string.

You will find that, wi d e c h a r to m u l t I B y t e function accept parameters than m u l t I B y t e to wi d e c h a r Function two more, that is, P d e f a u l t c h a R and p f u s e d e a u l t c h a R. Only when the wi d e c h a r to m u l t I B y t e function encounters a wide byte character, when the character does not have its representation in the code page marked by the U c o d e p a g e parameter, wi d e c h a r to m u l t I B y t e function only use these two parameters. If the wide byte character cannot be converted, this function uses the character pointed to by the P d e f a u l t c h a R parameter. If this parameter is n u L (this is the parameter value in most cases), the function uses the default character of the system. This default character is usually a question mark. This is dangerous for file names, because question marks are wildcards.

The p f u s e d e f a u l t c h a R parameter points to a Boolean variable. if at least one character in a wide string cannot be converted to an equivalent multi-byte character, then the function sets this variable to t r u e. If all characters are successfully converted, this function sets this variable to fa l s e. When the function returns to check whether the wide byte string is successfully converted, you can test the variable. Similarly, n u L is usually passed for this test.

For more information about how to use these functions, see the Platform SDK documentation.

If you use these two functions, you can easily create the u n I c o d E and a n s I versions of these functions. For example, you may have a dynamic link library that contains a function that can convert all characters in a string. You can write the u n I c o d e version of the function as follows:

BOOL StringReverseW(PWSTR pWideCharStr){   //Get a pointer to the last character in the string.   PWSTR pEndOfStr=pWideCharStr+wcslen(pWideCharStr)-1;   wchar_t cCharT;   //Repeat until we reach the center character   //in the string.   while (pWideCharStr < pEndOfStr)   {      //Save a character in a temporary variable.      cCharT=*pWideCharStr;            //Put the last character in the first character.      *pWideCharStr =*pEndOfStr;         //Put the temporary character in the last character.      *pEndOfStr=cCharT;         //Move in one character from the left.      pWideCharStr++;         //Move in one character from the right.      pEndOfStr--;   }   //The string is reversed; return success.   return(TRUE);}

You can compile the n s I version of the function so that the function does not actually convert strings. You can also compile the n s I version of the function so that the function can convert the n s I string to the u n I c o d E string, pass the u n I c o d E string to the s t r I n g r e V E R S E W function, then, convert the converted string to a n s I string. This function is similar to the following:

BOOL StringReverseA(PSTR pMultiByteStr) {   PWSTR pWideCharStr;   int nLenOfWideCharStr;   BOOL fOk = FALSE;   //Calculate the number of characters needed to hold   //the wide_character version of string.   nLenOfWideCharStr = MultiRyteToWideChar(CP_ACP, 0,      pMultiByteStr, -1, NULL, 0);   //Allocate memory from the process's default heap to    //accommodate the size of the wide-character string.   //Don't forget that MultiByteToWideChar returns the    //number of characters,not the number of bytes,so   //you must multiply by the size of wide character.   pWideCharStr = HeapAlloc(GetProcessHeap(), 0,       nLenOfWideCharStr * sizeof(WCHAR));   if (pWideCharStr == NULL)      return(fOk);   //Convert the multibyte string to a wide_character string.   MultiByteToWideChar(CP_ACP, 0, pMulti8yteStr, -1,       pWideCharStr, nLenOfWideCharStr);   //Call the wide-character version of this    //function to do the actual work    fOk = StnngReverseW(pWideCharStr);   if (fOk)   {      //Convert the wide-character string back       //to a multibyte string.      WideCharToMultiByte(CP_ACP, 0, pWideCharStr, -1,          pMultiByteStr, strlen(pMultiByteStr), NULL, NULL);   }   //Free the momory containing the wide-character string.   HeapFree(GetProcessHeap(), 0, pWideCharStr);   return(fOk),}

Finally, in the header file allocated with the dynamic link library, you can create the prototype of the two functions as follows:

BOOL StringReverseW (PWSTR pWideCharStr);BOOL StringReverseA (PSTR pMultiByteStr);#ifdef UNICODE#define StnngReverse StringReverseW#else#define StringRevcrsc StringReverseA#endif // UNICODE

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More