ANSI and Unicode encoding, TCHAR | LPSTR | LPCSTR | LPWSTR | LPCWSTR | LPTSTR | meaning of LPCTSTR, lpcstrlpwstr
One character can be expressed in 1-byte, that is, ANSI encoding;
A character can also be expressed in 2-bytes, that is, Unicode encoding (Unicode actually contains more content than 2-bytes ).
Visual C ++ supports char and wchar_t as the original data types of ANSI and Unicode.
For example
char cResponse; // 'Y' or 'N'char sUsername[64];// str* functions
And
wchar_t cResponse; // 'Y' or 'N'wchar_t sUsername[64];// wcs* functions
They can be written in a unified manner.
#include<TCHAR.H> // Implicit or explicit includeTCHAR cResponse; // 'Y' or 'N'TCHAR sUsername[64];// _tcs* functions
TCHAR is determined by the selected Character SetCharOrWchar_tThe character set settings are as follows:
Therefore, TCHAR is defined as follows:
#ifdef _UNICODEtypedef wchar_t TCHAR;#elsetypedef char TCHAR;#endif
In windows, the general prefixTIt indicates that it can adapt to different character sets.
For example:Strcpy,Strlen,Strcat(Including the Security suffix _ s) stands for the ANSI version;
Wcscpy,Wcslen,Wcscat(Including the Security suffix _ s), representing the Unicode version. WC represents the Wide Character;
_ Tcscpy,_ Tcslen,_ TcscatDepends on the situation:
size_t strlen(const char*); //ANSIsize_t wcslen(const wchar_t* ); //Unicdoesize_t _tcslen(const TCHAR* ); //ANSI or Unicode
We know that a string is represented by double quotation marks, which indicates that it is an ANSI-string and each character occupies 1-byte. For example:
"This is ANSI String. Each letter takes 1 byte."
To convert to Unicdeo-string, a prefix is required:L
[__strong__]L"This is Unicode string. Each letter would take 2 bytes, including spaces."
Unicode characters. Each character occupies 2-bytes, even if it can be expressed in 1-byte, such as English letters, numbers, and null characters. Therefore, the bytes occupied by a unicode-string are always multiples of 2-bytes.
In combination with the above mentionedTPrefix, which is applicable to the following two character sets:
"ANSI String"; // ANSIL"Unicode String"; // Unicode_T("Either string, depending on compilation"); // ANSI or Unicode
_ TOrTEXTIs a macro definition, which corresponds to the prefixTIs defined as follows:
// SIMPLIFIED#ifdef _UNICODE #define _T(c) L##c #define TEXT(c) L##c#else #define _T(c) c #define TEXT(c) c#endif
The above # Is called "token-pasting operator ". In Unicode, _ T ("Unicode") is translated into L "Unicode"; in ANSI, _ T ("Unicode") is translated into "Unicode ".
Note:You cannot use _ T to convert a variable (string or character). The following operations are as follows:Not AllowedOf:
char c = 'C';char str[16] = "CodeProject";_T(c);_T(str);
If you compile in ANSI (Multi-Byte,_ T (c),_ T (str)TranslatedC,Str;
However, when compiling in Unicode, an error is returned:
error C2065: 'Lc' : undeclared identifiererror C2065: 'Lstr' : undeclared identifier
Integration_ TDefinition is not difficult to understand.
In windows, almost all APIs that require string or character input have common versions, such:SetWindowTextA/WYou can write it as follows:
BOOL SetWindowText(HWND, const TCHAR*);
But we know thatSetWindowTextIs a macro, which represents one of the following two types:
BOOL SetWindowTextA(HWND, const char*);BOOL SetWindowTextW(HWND, const wchar_t*);
However, both ANSI and Unicode are implemented internally.UnicodeWhen you callSetWindowTextA(Input ANSI-string), it will first convert to Unicode-string, then callSetWindowTextW. Only the Unicode version works!
Therefore, we recommend that you directly call the Unicode api when writing code, although we are more familiar with the string of the ANSI version.
Note:Another typedef exists:WCHAR, Which is equivalentWchar_t.
We knowStrlenDefinition:
size_t strlen(const char*);
It can also be written
size_t strlen(LPCSTR);
So
// Simplifiedtypedef const char* LPCSTR;
Its meaning is as follows:
- LP:Long Pointer
- C:Constant
- STR:String
Long Pointer has the same meaning as Pointer.
Unicode characters include:
size_t wcslen(const wchar_t* szString); // Or WCHAR*size_t wcslen(LPCWSTR szString);
HereLPCWSTRRepresentative
typedef const WCHAR* LPCWSTR;
Its meaning is as follows:
- LP-Pointer
- C-Constant
- WSTR-Wide character String
Further, there areLPCTSTR
- LP-Pointer
- C-Constant
- T = TCHAR
- STR = String
Summary:
- TCHAR-char/wchar_t (depending on the character set)
- LPSTR-char *
- LPCSTR-const char *
- LPWSTR-wchar_t *
- LPCWSTR-const wchar_t *
- LPTSTR-TCHAR *
- LPCTSTR-const TCHAR *
In programming, sometimes the selected character set is different, and the compilation error occurs. The following method is fine in ANSI, but an error is reported in Unicode:
int main(){ TCHAR name[] = "Saturn"; int nLen; // Or size_t lLen = strlen(name);}
- Error C2440: 'initializing': cannot convert from 'const char [7] 'to 'tchar []'
- Error C2664: 'strlen': cannot convert parameter 1 from 'tchar [] 'to 'const char *'
The same problem occurs in:
nLen = wcslen("Saturn");// ERROR: cannot convert parameter 1 from 'const char [7]' to 'const wchar_t *'
Unfortunately, the above errorNoModify by force conversion:
nLen = wcslen((const wchar_t*)"Saturn");
The above writing method will produce incorrect results, often leading to cross-border. The reason is that "Saturn" occupies 7 bytes.
'S'(83) |
'A'(97) |
'T'(116) |
'U'(117) |
'R'(114) |
'N'(110) |
'\ 0'(0) |
However, when it is passed to wcslen, it is assigned 2-bytes for each character. Therefore, the first two bytes [] are considered as one character. value :( 97 <8 | 83) is the character '? '. And so on.
Therefore, if Unicode APIs are used, convert them in advance:
TCHAR name [] = _ T ("Saturn"); // or wcslen (L "Saturn ");
In the previous example, the name in strlen (name) is compiled in Unicode, and each character occupies 2-bytes. If it is forcibly converted to ANSI:
lLen = strlen ((const char*)name);
The problem also occurs. The first byte [83] in ANSI can be correctly translated into's ', however, the second byte [0] is directly translated as '\ 0', ending the entire string.Therefore, strlen returns 1..
In summary, the forced conversion of the C language style does not work here.
To allocate memory, use new in C ++ to specify the number of characters, regardless of the number of bytes allocated:
LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.
However, if you use APIs such as malloc, LocalAlloc, and GlobalAlloc to allocate space, you need to specify the number of bytes:
pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );