Unicode programming using C ++

Source: Internet
Author: User
Unicode programming using C ++
The support for wide characters is actually part of the ansi c standard, used to support multi-byte expression of a character.
The width character is not exactly the same as the Unicode character. Unicode is only a type of width character encoding.

1. Definition of wide characters

In ANSI, the length of a character (char) is one byte ). When Unicode is used, a character occupies one word. c ++ defines the most basic wide character type wchar_t in the wchar. h header file:

 
Typedef unsigned short wchar_t;

Here we can clearly see that the so-called wide character is an unsigned short integer.




2. constant width string

For C ++ProgramConstructing string constants is a regular task. So how to construct a wide Character String constant? Simply add an uppercase L to the String constant, for example:

 
Wchar_t * str1 = l "hello ";

This l is very important. The Compiler only knows that you want to save the string as a character. Note that there must be no space between string and L.


3. Wide string library functions

C ++ specifically defines a set of functions to operate on wide strings. For example, the function to evaluate the length of a wide string is

Size_t _ cdel wchlen (const wchar_t *);

Why do we need to define these functions? The most fundamental reason is that all strings in ANSI are identified by '/0' at the end of the string (UNICODE string ends with "/0/0"
. However, we know that in case of wide characters, a character occupies a space in the memory, which will make the operation of ANSI characters
The string function cannot be operated correctly.

Take the "hello" string as an example. The following five characters are contained in the string:
0x0048 0x0065 0x006c 0x006c 0x006f
In the memory, the actual arrangement is:

 
48 00 65 00 6C 00 6C 00 6f 00

Therefore, when an ANSI string function, such as strlen, encounters the first 00 after 48, it will consider the string to the end, the result of using strlen to evaluate the length of a wide string will always be 1!



4. Macro-based programming for ANSI and Unicode


It can be seen that C ++ has a complete set of data types and functions for Unicode programming, that is, you can use C ++ for Unicode programming.
If we want
Our program has two versions: ANSI and Unicode. Of course, write two setsCodeBoth the ANSI version and the Unicode version are feasible. However
It is very troublesome to maintain two sets of codes for ANSI and Unicode characters. To reduce the programming burden, C ++ defines a series of macros to help you implement ANSI and Unicode
General programming.
The essence of general programming for ANSI and Unicode in C ++ macros is defined based on _ Unicode (note, underline). These macros are expanded to ANSI or Unicode characters (strings ).

Some code in the tchar. h header file is excerpted as follows:

 
# Ifdef _ Unicode

Typedef wchar_t tchar;

# DEFINE _ T (x) L # x

# DEFINE _ T (x) _ T (X)

# Else

# DEFINE _ T (x) x

Typedef char tchar;

# Endif

It can be seen that these macros are expanded to ANSI or Unicode characters based on the definition of "_ Unicode. The macros defined in the tchar. h header file can be divided into two types:

A. We only list the two most common macros that implement the definition of characters and constant strings:


Macro undefined _ Unicode (ANSI character) defines _ Unicode (UNICODE character) tchar wchar_t _ T (x) x L # x

Note:
"#" Is the ansi c pre-processing syntax. It is called "paste symbol", which means to add the preceding L to the macro parameter. That is to say, if we write _ T ("hello"), after expansion, it is l "hello"


B. Macro for calling string functions

C ++ also defines a series of macros for string functions. Similarly, we only give examples of several commonly used macros:

Macro undefined _ Unicode (ANSI character) defines _ Unicode (UNICODE character) _ tcschr strchr wcschr _ tcscmp strcmp wcscmp _ tcslen strlen wcslen

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.