ANSI and Unicode encoding, TCHAR | LPSTR | LPCSTR | LPWSTR | LPCWSTR | LPTSTR

Last Update:2017-04-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

One character can be expressed in 1-byte, that is, ANSI encoding;

A character can also be expressed in 2-bytes, that is, Unicode encoding (Unicode actually contains more content than 2-bytes ).

Visual C ++ supports char and wchar_t as the original data types of ANSI and Unicode.

For example

char cResponse; // 'Y' or 'N'char sUsername[64];// str* functions

And

wchar_t cResponse; // 'Y' or 'N'wchar_t sUsername[64];// wcs* functions

They can be written in a unified manner.

#include<TCHAR.H> // Implicit or explicit includeTCHAR cResponse; // 'Y' or 'N'TCHAR sUsername[64];// _tcs* functions

TCHAR is determined by the selected Character SetCharOrWchar_tThe character set settings are as follows:

Therefore, TCHAR is defined as follows:

#ifdef _UNICODEtypedef wchar_t TCHAR;#elsetypedef char TCHAR;#endif

In windows, the general prefixTIt indicates that it can adapt to different character sets.

For example:Strcpy,Strlen,Strcat(Including the Security suffix _ s) stands for the ANSI version;

Wcscpy,Wcslen,Wcscat(Including the Security suffix _ s), representing the Unicode version. WC represents the Wide Character;

_ Tcscpy,_ Tcslen,_ TcscatDepends on the situation:

size_t strlen(const char*); //ANSIsize_t wcslen(const wchar_t* ); //Unicdoesize_t _tcslen(const TCHAR* ); //ANSI or Unicode

We know that a string is represented by double quotation marks, which indicates that it is an ANSI-string and each character occupies 1-byte. For example:

"This is ANSI String. Each letter takes 1 byte."

To convert to Unicdeo-string, a prefix is required:L

[__strong__]L"This is Unicode string. Each letter would take 2 bytes, including spaces."

Unicode characters. Each character occupies 2-bytes, even if it can be expressed in 1-byte, such as English letters, numbers, and null characters. Therefore, the bytes occupied by a unicode-string are always multiples of 2-bytes.

In combination with the above mentionedTPrefix, which is applicable to the following two character sets:

"ANSI String"; // ANSIL"Unicode String"; // Unicode_T("Either string, depending on compilation"); // ANSI or Unicode

_ TOrTEXTIs a macro definition, which corresponds to the prefixTIs defined as follows:

// SIMPLIFIED#ifdef _UNICODE  #define _T(c) L##c #define TEXT(c) L##c#else  #define _T(c) c #define TEXT(c) c#endif

The above # Is called "token-pasting operator ". In Unicode, _ T ("Unicode") is translated into L "Unicode"; in ANSI, _ T ("Unicode") is translated into "Unicode ".

Note:You cannot use _ T to convert a variable (string or character). The following operations are as follows:Not AllowedOf:

char c = 'C';char str[16] = "CodeProject";_T(c);_T(str);

If you compile in ANSI (Multi-Byte,_ T (c),_ T (str)TranslatedC,Str;

However, when compiling in Unicode, an error is returned:

error C2065: 'Lc' : undeclared identifiererror C2065: 'Lstr' : undeclared identifier

Integration_ TDefinition is not difficult to understand.

In windows, almost all APIs that require string or character input have common versions, such:SetWindowTextA/WYou can write it as follows:

BOOL SetWindowText(HWND, const TCHAR*);

But we know thatSetWindowTextIs a macro, which represents one of the following two types:

BOOL SetWindowTextA(HWND, const char*);BOOL SetWindowTextW(HWND, const wchar_t*);

However, both ANSI and Unicode are implemented internally.UnicodeWhen you callSetWindowTextA(Input ANSI-string), it will first convert to Unicode-string, then callSetWindowTextW. Only the Unicode version works!

Therefore, we recommend that you directly call the Unicode api when writing code, although we are more familiar with the string of the ANSI version.

Note:Another typedef exists:WCHAR, Which is equivalentWchar_t.

We knowStrlenDefinition:

size_t strlen(const char*);

It can also be written

size_t strlen(LPCSTR);

// Simplifiedtypedef const char* LPCSTR;

Its meaning is as follows:

LP:Long Pointer
C:Constant
STR:String

Long Pointer has the same meaning as Pointer.

Unicode characters include:

size_t wcslen(const wchar_t* szString); // Or WCHAR*size_t wcslen(LPCWSTR szString);

HereLPCWSTRRepresentative

typedef const WCHAR* LPCWSTR;

Its meaning is as follows:

LP-Pointer
C-Constant
WSTR-Wide character String

Further, there areLPCTSTR

LP-Pointer
C-Constant
T = TCHAR
STR = String

Summary:

TCHAR-char/wchar_t (depending on the character set)
LPSTR-char *
LPCSTR-const char *
LPWSTR-wchar_t *
LPCWSTR-const wchar_t *
LPTSTR-TCHAR *
LPCTSTR-const TCHAR *

In programming, sometimes the selected character set is different, and the compilation error occurs. The following method is fine in ANSI, but an error is reported in Unicode:

int main(){    TCHAR name[] = "Saturn";    int nLen; // Or size_t    lLen = strlen(name);}

Error C2440: 'initializing': cannot convert from 'const char [7] 'to 'tchar []'
Error C2664: 'strlen': cannot convert parameter 1 from 'tchar [] 'to 'const char *'

The same problem occurs in:

nLen = wcslen("Saturn");// ERROR: cannot convert parameter 1 from 'const char [7]' to 'const wchar_t *'

Unfortunately, the above errorNoModify by force conversion:

nLen = wcslen((const wchar_t*)"Saturn");

The above writing method will produce incorrect results, often leading to cross-border. The reason is that "Saturn" occupies 7 bytes.

'S'(83)

'A'(97)

'T'(116)

'U'(117)

'R'(114)

'N'(110)

'\ 0'(0)

However, when it is passed to wcslen, it is assigned 2-bytes for each character. Therefore, the first two bytes [] are considered as one character. value :( 97 <8 | 83) is the character '? '. And so on.

Therefore, if Unicode APIs are used, convert them in advance:

TCHAR name [] = _ T ("Saturn"); // or wcslen (L "Saturn ");

In the previous example, the name in strlen (name) is compiled in Unicode, and each character occupies 2-bytes. If it is forcibly converted to ANSI:

lLen = strlen ((const char*)name);

The problem also occurs. The first byte [83] in ANSI can be correctly translated into's ', however, the second byte [0] is directly translated as '\ 0', ending the entire string.Therefore, strlen returns 1..

In summary, the forced conversion of the C language style does not work here.

To allocate memory, use new in C ++ to specify the number of characters, regardless of the number of bytes allocated:

LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.

However, if you use APIs such as malloc, LocalAlloc, and GlobalAlloc to allocate space, you need to specify the number of bytes:

pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ANSI and Unicode encoding, TCHAR | LPSTR | LPCSTR | LPWSTR | LPCWSTR | LPTSTR | meaning of LPCTSTR, lpcstrlpwstr

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

ANSI and Unicode encoding, TCHAR | LPSTR | LPCSTR | LPWSTR | LPCWSTR | LPTSTR | meaning of LPCTSTR, lpcstrlpwstr

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support