A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
_ T ("") is a macro whose role is to allow your program to support Unicode encoding.
Because Windows uses the ANSI and UNICODE character sets,
The former is the common single-byte mode,
However, it is inconvenient to process double-byte characters such as Chinese characters in this way,
It is easy to have half Chinese characters.
The latter is in dubyte mode to facilitate the processing of dubyte characters.
All character-related functions of Windows NT provide two versions, while Windows 9x only supports ANSI.
If you compile a program in ANSI mode,
_ T does not actually play any role.
If a program is compiled in UNICODE mode, the compiler saves the "Hello" string in UNICODE mode. The difference between _ T and _ L is that _ L is saved in UNICODE no matter how you compile it.
Size_t ,__ T, _ T, TEXT, _ TEXT, and other special macros
Wchar_t is defined in WCHAR. h as follows:
# Define unsigned short wchar_t
Now you should understand it.
Therefore, the wchar_t data type is the same as the unsigned short Integer type, and both are 16-bit width.
To define a variable that contains a wide character, use the following statement:
Wchar_t c = 'a ';
The variable c is A double byte value of 0x0041, which is the Unicode letter. (However, because Intel microprocessor stores multi-byte values starting from the smallest byte, the bytes are actually saved in the memory in the order of 0x41 and 0x00. Note this if you check the computer storage of Unicode text .)
You can also define a pointer to a wide string:
Wchar_t * p = L "Hello! ";
Note the uppercase letter L ("long") next to the first quotation mark 」). This tells the compiler to save the string by wide characters-that is, each character occupies 2 bytes. Generally, the pointer Variable p occupies 4 bytes, while the string variable requires 14 bytes-each character requires 2 bytes, and the end 0 requires 2 bytes.
Similarly, you can use the following statement to define a wide character array:
Static wchar_t a  = L "Hello! ";
This string also requires 14 bytes of storage space, and sizeof (a) will return 14. Index Array a to obtain separate characters. The value of a  is the wide character "e", or 0x0065.
Although it looks more like a printed symbol, the L before the first quotation mark is very important and there must be no space between the two symbols. Only with L can the compiler know that you need to save the string as 2 bytes per character. Later, when we see the wide string instead of variable definition, you will also encounter the L before the first quotation mark. Fortunately, if you forget to include L, the C compiler usually sends a warning or error message.
You can also use the L prefix before a single character to indicate that they should be interpreted as wide characters. As follows:
Wchar_t c = L 'a ';
But this is usually unnecessary. The C compiler will expand the character to make it a wide character.
Wide character Link Library Function
We all know how to get the length of a string. For example, if we have defined a string pointer as follows:
Char * pc = "Hello! ";
We can call
ILength = strlen (pc );
At this time, the variable iLength is equal to 6, that is, the number of characters in the string.
Great! Now let's try to define a pointer to a wide character:
Wchar_t * pw = L "Hello! ";
Call strlen again:
ILength = strlen (pw );
Now we are in trouble. First, the C compiler will display a warning message, which may be:
'Function': incompatible types-from 'unsigned short * 'to 'const char *'
This message indicates that when the strlen function is declared, the function should receive the char type indicator, but it now receives an unsigned short type indicator. You can still compile and execute the program, but you will find that iLength is equal to 1. Why?
String "Hello !」 The six characters in the string take up 16 characters:
0x0048 0x0065 0x006C 0x006C 0x006F 0x0021
The Intel processor saves the following in memory:
48 00 65 00 6C 00 6C 00 6F 00 21 00
Assume that the strlen function is trying to get the length of a string and count 1st bytes as characters. If the next byte is 0, the string ends.
This small exercise clearly illustrates the differences between the C language itself and the linked library functions during execution. The compiler splits the string L "Hello! "Is interpreted as a group of 16-bit short integer data and saved in the wchar_t array. The compiler also processes array indexes and sizeof operators, so these operations can work normally, but the execution period linked library function, such as strlen, is added only when linking. These functions assume that a string consists of single-byte characters. When a wide string is encountered, the function is not executed as we expected.
You may say, "Oh, it's too much trouble !」 Currently, each c-language linked library function must be rewritten to accept wide characters. But in fact, not every c-language linked library function needs to be rewritten, but those functions with string parameters need to be rewritten, and you do not need to complete it. They have been overwritten.
The wide character version of The strlen function is wcslen (wide-character string length: width STRING length), which is described in string. H (strlen) and WCHAR. H. Strlen functions are described as follows:
Size_t _ cdecl strlen (const char *);
The wcslen function is described as follows:
Size_t _ cdecl wcslen (const wchar_t *);
At this time, we know that to obtain the length of a wide string, you can call
ILength = wcslen (pw );
The function returns 6 Characters in the string. Remember, the character length of the character string is not changed after the character segment is changed to a wide character segment, but the length of the bit group is changed.
All the C execution period linked library functions you are familiar with have wide character versions. For example, wprintf is a wide character version of printf. These functions are described in WCHAR. H and in the header file containing the standard function description.
Maintain a single original code
Of course, Unicode also has disadvantages. The first and most important point is that every string in the program occupies twice the storage space. In addition, you will find that the function in the Linked Library is larger than the common function during the wide character execution period. For this reason, you may want to create two versions of programs-one for processing ASCII strings and the other for processing Unicode strings. The best solution is to maintain a single source code file that can be compiled by ASCII and compiled by Unicode.
Although it is only a short program, you need to define different characters because the Linked Library functions have different names during execution, which will cause trouble when processing strings with L characters.
One way is to use the TCHAR. H header file included in Microsoft Visual C ++. This header file is not part of the ansi c standard, so each function and macro definition defined there has a bottom line. TCHAR. H provides a series of alternative names (for example, _ tprintf and _ tcslen) for the Linked Library functions that require Standard execution periods of string parameters ). Sometimes these names are also called "common" function names, because they can point either to the Unicode or non-Unicode version of the function.
If the identifier named _ UNICODE is defined and the program contains the TCHAR. H header file, _ tcslen is defined as wcslen:
# Define _ tcslen wcslen
If UNICODE is not defined, _ tcslen is defined as strlen:
# Define _ tcslen strlen
And so on. TCHAR. H also uses a new data type TCHAR to solve the problem of two character data types. If _ UNICODE identifier is defined, TCHAR is wchar_t:
Typedef wchar_t TCHAR;
Otherwise, TCHAR is Char:
Typedef char TCHAR;
Now we will discuss the question of L in string text. If _ UNICODE identifier is defined, a macro called _ T is defined as follows:
# Define _ T (x) L # x
This is rather obscure syntax, but complies with ansi c-standard Preprocessor specifications. The pair of Well fonts is called "token paste", which adds the letter L to the macro parameter. Therefore, if the macro parameter is "Hello! ", Then L # x is L" Hello! ".
If the _ UNICODE identifier is not defined, the _ T macro is simply defined as follows:
# Define _ T (x) x
In addition, two macros have the same definition as _ T:
# Define _ T (x)
# Define _ TEXT (x) _ T (x)
Which macro is used in the Win32 console program depends on whether you prefer to be concise or detailed. Basically, the string TEXT must be defined in the _ T or _ TEXT macro as follows:
_ TEXT ("Hello! ")
In this case, if _ UNICODE is defined, the string is interpreted as a combination of wide characters, otherwise it is interpreted as an 8-bit character string.
Wide character and Windows
Windows NT supports Unicode from the underlying layer. This means that Windows NT uses a string consisting of 16 characters. Because 16-bit strings are not used in many other parts of the world, Windows NT must often convert strings within the operating system. Windows NT can execute programs that are compiled in combination with ASCII, Unicode, or ASCII and Unicode. That is, Windows NT supports different API function calls. These functions accept 8-bit or 16-bit strings (we will immediately see how this works .)
Compared with Windows NT, Windows 98 has less Unicode support. Only a few Windows 98 function calls support wide strings (these functions are listed in Microsoft Knowledge Base article Q125671; they include MessageBox ). If there is only one program to be released. if the EXE file must be executed in both Windows NT and Windows 98, Unicode should not be used; otherwise, Unicode functions cannot be executed in Windows 98; in particular, Unicode functions cannot be called by programs. In this way, the Unicode version of the program will be released in a more favorable position in the future, you are trying to write the original code for Both ASCII and Unicode compilation. This is how all programs are written in this book.
Windows header file type
As you can see in chapter 1, a Windows program includes the header file WINDOWS. H. This file contains many other header files, including WINDEF. H. This file contains many basic State definitions used in Windows, and also contains WINNT. H. WINNT. H supports basic Unicode processing.
The front of WINNT. H contains the header file CTYPE. H of C, which is one of the many header files of C, including the definition of wchar_t. WINNT. H defines a new data type, called CHAR and WCHAR:
Typedef char CHAR;
Typedef wchar_t WCHAR; // wc
When you need to define 8 or 16 characters, we recommend that you use CHAR and WCHAR in Windows. The comments behind the WCHAR definition are recommended by the Hungarian markup method: a variable based on the WCHAR data type can be appended with a letter wc to describe a wide character.
The WINNT. H header file further defines six data types that can be used as 8-Bit String pointers and four data types that can be used as const 8-Bit String pointers. Here we have selected some useful statements for describing the data type in the header file:
Typedef CHAR * PCHAR, * LPCH, * PCH, * NPSTR, * LPSTR, * PSTR;
Typedef const char * LPCCH, * PCCH, * LPCSTR, * PCSTR;
Prefix N and L indicate "near" and "long", which indicate two indicators of different sizes in 16-bit Windows. In Win32, the near and long indicators are no different.
Similarly, WINNT. H defines six data types that can be used as a 16-Bit String pointer and four data types that can be used as a const 16-Bit String pointer:
Typedef WCHAR * PWCHAR, * LPWCH, * PWCH, * NWPSTR, * LPWSTR, * PWSTR;
Typedef const wchar * LPCWCH, * PCWCH, * LPCWSTR, * PCWSTR;
So far, we have data types CHAR (an 8-bit char) and WCHAR (a 16-bit wchar_t), as well as indicators pointing to CHAR and WCHAR. Like TCHAR. H, WINNT. H defines TCHAR as a general character type. If the identifier UNICODE (no bottom line) is defined, the TCHAR and the indicator pointing to the TCHAR are defined as the WCHAR and the indicator pointing to the WCHAR respectively. If the identifier UNICODE is not defined, TCHAR and the indicator pointing to TCHAR are defined as char and the indicator pointing to char respectively:
# Ifdef UNICODE
Typedef wchar tchar, * PTCHAR;
Typedef lpwstr lptch, PTCH, PTSTR, LPTSTR;
Typedef char TCHAR, * PTCHAR;
Typedef lpstr lptch, PTCH, PTSTR, LPTSTR;
If the TCHAR data type has been defined in a header file or other header files, both the WINNT. H and WCHAR. H header files can prevent repeated definitions. However, whenever other header files are used in the program, WINDOWS. H should be included before all other header files.
The WINNT. H header file also defines a macro that adds L to the first quotation mark of the string. If a UNICODE identifier is defined, a macro called _ TEXT is defined as follows:
# Define _ TEXT (quote) L # quote
If no identifier UNICODE is defined, the _ TEXT macro is defined as follows:
# Define _ TEXT (quote) quote
In addition, the TEXT macro can be defined as follows:
# Define TEXT (quote) _ TEXT (quote)
This is the same as the _ TEXT macro defined in TCHAR. H, but you don't have to worry about the bottom line. I will use the TEXT version of this macro in this book.
These definitions allow you to mix ASCII and Unicode strings in the same program, or compile a program that can be compiled by ASCII or Unicode. If you want to explicitly define 8-character variables and strings, use CHAR, PCHAR (or other), and a string with quotation marks. To explicitly use 16-character variables and strings, use WCHAR and PWCHAR and add L before quotation marks. For 8-or 16-bit variables or strings defined by UNICODE identifiers, TCHAR, PTCHAR, and TEXT macros are used.
-----------------------Maintain a single original codeThe main disadvantage of Unicode is that every string in the program occupies twice the storage space. In addition, functions in the library are larger than common functions during wide-character execution periods. Therefore, it is necessary to create two versions of programs-one for processing ASCII strings and the other for processing Unicode strings. The best solution is to maintain a single source code file that can be compiled by ASCII and compiled by Unicode. One way is to use the TCHAR. H header file included in Microsoft Visual C ++. (This header file is not part of the ansi c standard. Therefore, each function defined in the header file and the definition of a huge set have a strip .) TCHAR. H provides a series of alternative names (for example, _ tprintf and _ tcslen) for library functions that require Standard execution periods of string parameters ). Sometimes these names are also called "common" function names, because they can point either to the Unicode or non-Unicode version of the function. If _ UNICODE is defined and the program contains TCHAR. H header file, then _ tcslen is defined as wcslen: # define _ tcslen wcslen if UNICODE is not defined, _ tcslen is defined as strlen: # define _ tcslen strlen and so on. TCHAR. H also uses a new data type TCHAR to solve the problem of two character data types. If _ UNICODE is defined, TCHAR is wchar_t: typedef wchar_t TCHAR; otherwise, TCHAR is char: typedef char TCHAR; now we will discuss the L problem in the string text. If _ UNICODE is defined, a huge set called _ T is defined as follows: # define _ T (x) L # x, which is a rather obscure syntax, but complies with the ansi c standard Preprocessor specification. The pair of Well fonts is called "token paste", which adds the letter L to the huge set reference. Therefore, if the reference value of a huge set is "Hello! ", Then L # x is L" Hello! ". If no _ UNICODE word is defined, the _ T huge set is simply defined as follows: # define _ T (x) x. In addition, there are two huge sets with the same definition as _ T: # define _ T (x) # define _ TEXT (x) _ T (x) which is used in the Win32 console program, it depends on whether you prefer to be concise or detailed. Basically, you must define the string TEXT in the _ T or _ TEXT huge set as follows: _ TEXT ("Hello! ") In this case, if _ UNICODE is defined, the string is interpreted as a combination of wide characters, otherwise it is interpreted as an 8-bit character string. (
Start building with 50+ products and up to 12 months usage for Elastic Compute Service