[Conversion] Unicode programming in VC

Source: Internet
Author: User

Transferred from http://www.leewei.org /? P = 1304

Unicode description

Use VC for programming in windows. If a program is to run in a variety of languages (such as Japanese, Chinese, and Portuguese), garbled characters will occur when using the default MBCS compilation option of VC, it even causes program crash. To overcome this disadvantage, you need to use Unicode programming, which briefly describes UNICODE:

Unicode is also a character encoding method. It occupies two bytes (0000h-ffffh) and contains 65536 characters, which can fully accommodate the encoding of all languages in the world. In Unicode, all texts are processed by one character and each character has a unique Unicode code.

Windows NT and subsequent system kernels are Unicode-based. In the Windows Kernel, macro Unicode indicates whether to enable Unicode, and C ++ determines whether to enable Unicode Based on _ Unicode macro. Therefore, we need to write these two macros into the pre-processing parameters in programming.

For example, the tchar. h header file contains the following statement:

#define _T(x)       __T(x) #ifdef  _UNICODEtypedef wchar_t     TCHAR;#define __T(x)      L##x#elsetypedef char        TCHAR;#define __T(x)      x#endif

In the WINNT. h header file, the following data types are defined:

typedef char CHAR, *LPSTR; typedef CONST CHAR *LPCSTR, *PCSTR;  typedef unsigned short WCHAR,*LPWSTR;    // 16-bit UNICODE charactertypedef CONST WCHAR *LPCWSTR, *PCWSTR; //#ifdef  UNICODE typedef WCHAR TCHAR, *PTCHAR;typedef LPWSTR LPTCH, PTCH;typedef LPWSTR PTSTR, LPTSTR;typedef LPCWSTR LPCTSTR;#define __TEXT(quote) L##quote #else       typedef char TCHAR, *PTCHAR;typedef LPSTR LPTCH, PTCH;typedef LPSTR PTSTR, LPTSTR;typedef LPCSTR LPCTSTR;#define __TEXT(quote) quote        #endif /* UNICODE */

In fact, Win32 API has two versions. One version accepts the MBCS string, and the other accepts the Unicode string. For example, there is actually no API function setwindowtext (). On the contrary, there are setwindowtexta () and setwindowtextw (). Suffix A indicates that this is an MBCS function, and suffix W indicates that this is a unicode function. The header files of these API functions are declared in winuser. h. The declaration part of setwindowtext () function in winuser. H is given below:

#ifdef UNICODE#define SetWindowText  SetWindowTextW#else#define SetWindowText  SetWindowTextA#endif // !UNICODE
Unicode practice

Follow these steps to use Unicode in vc6.0:
1. Project-> Settings... -> C/C ++-> Preprocessor definitions, delete _ MBCS, and add _ Unicode and Unicode.
2. Project-> Settings... -> Link-> category: Select output, and enter wwinmaincrtstartup in the entry-point symbol column.

[Note] if it is an EXE Project (such as a DLL or Lib), do not perform the Second Step. Otherwise, the warning lnk4086 error will occur.

C ++ uses wchar_t to represent a wide character. It is internally defined as unsigned short, which occupies two bytes. Compared with common characters, C ++ has a set of wide character manipulation functions. The following table compares wide character processing functions with common functions:

Wide character processing function and common function table character classification: wide character function common C function description iswalnum () isalnum () test whether the character is a number or letter iswalpha () isalpha () test whether the character is iswcntrl () iscntrl () test whether the character is the control letter iswdigit () isdigit () test whether the character is a number iswgraph () isgraph () test whether the character is a visible character iswlower () islower () test whether the character is a lowercase character iswprint () isprint () test whether the character is a printable character iswpunct () ispunct () test whether the character is a punctuation mark iswspace () isspace () test whether the character is a blank symbol iswupper () isupper () test whether the character is an uppercase character iswxdigit () isxdigit () test whether the character is a hexadecimal case-insensitive conversion: wide letter Count common C function description towlower () tolower () converts character to lowercase towupper () toupper () converts character to uppercase character comparison: wide character function common C function description wcscoll () strcoll () compares string Date and Time conversions: wide character function description strftime () sets the format Date and Time wcsftime () based on the specified string format and locale () set the format date and time according to the specified string format and locale, and return the wide string strptime () to convert the string to the time value according to the specified format. This is the reverse process of strftime to print and scan the string: wide character function description fprintf ()/fwprintf () Use vararg parameter formatting output fscanf ()/fwscanf () format read printf () use vararg parameter formatting to output to standard output scanf () read sprintf ()/swpri from standard input formatting Ntf () is formatted into a string sscanf () based on the vararg parameter table and formatted into the vfprintf ()/vfwprintf () using the stdarg parameter table to format the output to the file vprintf () use the stdarg parameter table to format the output to the standard output vsprintf ()/vswprintf () format the stdarg parameter table and write it to the string numeric conversion: wide character function common C function description wcstod () strtodd () convert the initial part of a wide character to a double-precision floating point wcstol () strtol (). Convert the initial part of a wide character to a long integer wcstoul () strtoul () convert the initial part of a wide character to an unsigned long integer multi-byte character and a wide character conversion and operations: the wide character function description mblen () determines the number of bytes of a character based on locale settings mbstowcs () convert a multi-byte string to a wide string mbtowc ()/btowc () and convert a multi-byte string to a wide character wcstombs () Convert a wide string to a multi-byte string wctomb ()/wctob () to convert a wide string to a multi-Byte Character Input and Output: wide character function common C function description fgetwc () fgetc () read a character from the stream and convert it to a wide character fgetws () fgets () read a string from the stream and convert it to a wide string fputwc () fputc () convert wide characters into multi-byte characters and output to standard output fputws () fputs () convert wide strings into multi-byte characters and output to standard output string getwc () GETC () read from standard input and convert to wide getwchar () getchar () read from standard input, and convert to wide character none gets () Use fgetws () putwc () putc () converts a wide character to a multi-Byte Character and writes it to the standard output putwchar () getchar (). Converts a wide character to a multi-Byte Character and writes it to the standard output none puts () using fp Utws () ungetwc () ungetc () put a wide character back to the input stream string operation: wide character function common C function description wcscat () strcat () link a string to the tail of another string. wcsncat () strncat () is similar to wcscat () and specifies the bonding length of the string. wcschr () strchr () finds the first position of the substring wcsrchr () strrchr () searches for the first position where the substring appears wcspbrk () strpbrk () find the position where any character in the other string appears for the first time, wcswcs ()/wcsstr () strchr () find the position where another string appears for the first time in a string wcscspn () strcspn (), and return the initial number of wcsspn () strspn () that does not contain the second string () returns the initial number of wcscpy () strcpy () copy characters that contain the second string. String wcsncpy () strncpy () is similar to wcscpy (), and specify the number of copies wcscmp () strcmp () to compare two wide strings wcsncmp () strncmp () similar to wcscmp (), you also need to specify the number of character strings wcslen () strlen () to obtain the number of wide strings wcstok () strtok () to break the wide string into a series of strings wcswidth () based on the identifier () none to get the width of the wide string wcwidth () None to get the width of the wide character. In addition, wmemcpy (), wmemchr (), wmemcmp (), wmemmove (), wmemset ().

In Unicode programming, if you need to declare a wide string, you need to write it like this:
Wchar_t * wstr = L "Hello ";
The character "L" tells the compiler that you want to construct a wide string. There cannot be spaces between the character "L" and the string.

Although the Code for declaring strings above is correct, this is not recommended because the program portability is too poor.
Do you still remember the macros described above? _ T (X) is extended to L # X when _ Unicode is defined, and is extended to X in general; tchar is replaced with wchar_t and char respectively. Therefore, we can write as follows:
Tchar * STR = _ T ("hello ");
In this way, if _ Unicode macro is defined, it is extended:
Wchar_t * wstr = L "Hello ";
Otherwise, it is extended:
Char * STR = "hello ";

If you need to write a library and provide Unicode and non-Unicode versions separately, you can modify only two Unicode macros without any code.

Migrate to Unicode

If it is unfortunate that your project was not designed to use Unicode at the beginning (the _ T () macro and tchar types were not used ), to enable Unicode support for internationalization, you may encounter numerous compilation errors after adding two Unicode macros and function entry points (I have encountered 566 ). Although the modification method varies depending on the project, there are also some similarities. It is better to make a step-by-step change than to make a non-objective change.

1. Search for all the afxmessagebox and MessageBox functions, and add the _ T () macro to the strings.
2. Search for all Str. format functions and add _ T () macro to the first parameter.
3. Add a _ T () macro to the String constant.
4. Replace strlen, strcpy, and other functions with the same wide character version as wcslen and wcscpy.
5. If the third parameter of a function such as wcsncpy and wcsncmp is sizeof (DST), change it to sizeof (DST)/2 or customize a macro tsizeof.
6. If a function requires parameters of the char * type, use the T2A () macro to convert the parameters and add "uses_conversion;" at the beginning of the function ;".
7. Find all the mandatory conversion codes such as char * P = (lpstr) (lpctstr) cstring, and replace them with char * P = T2A (cstring.

After the above content is modified, errors during re-compilation should be reduced by more than half. Now it is much easier to compare and modify the content one by one.

Finally, the configuration file must be stored in unicode format. Unicode file header has a 0xfeff identifier. If you write the configuration file through: writeprivateprofilestring (), you only need to write the 0xfeff file header to the file by calling this API, after that, writeprivateprofilestring will automatically save subsequent content as Unicode. For simplicity, replace all the called: writeprivateprofilestring () in the program with the following rewrite version:

static BOOL _WritePrivateProfileString(LPCTSTR lpAppName, // section name                   LPCTSTR lpKeyName, // key name                   LPCTSTR lpString,   // string to add                   LPCTSTR lpFileName // initialization file                   ){    FILE *fp;    fp = _tfopen(lpFileName, _T("r"));    if (fp == NULL)    {        fp=_tfopen(lpFileName, _T("w+b"));         wchar_t m_strUnicode[1];        m_strUnicode[0] = wchar_t(0XFEFF);        fputwc(*m_strUnicode,fp);    }    fclose(fp);     return ::WritePrivateProfileString(lpAppName, lpKeyName, lpString, lpFileName);}

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.