ANSI and Unicode encoding, TCHAR | LPSTR | LPCSTR | LPWSTR | LPCWSTR | LPTSTR | meaning of LPCTSTR, lpcstrlpwstr

Source: Internet
Author: User

ANSI and Unicode encoding, TCHAR | LPSTR | LPCSTR | LPWSTR | LPCWSTR | LPTSTR | meaning of LPCTSTR, lpcstrlpwstr

One character can be expressed in 1-byte, that is, ANSI encoding;

A character can also be expressed in 2-bytes, that is, Unicode encoding (Unicode actually contains more content than 2-bytes ).

Visual C ++ supports char and wchar_t as the original data types of ANSI and Unicode.

For example

char cResponse; // 'Y' or 'N'char sUsername[64];// str* functions

And

wchar_t cResponse; // 'Y' or 'N'wchar_t sUsername[64];// wcs* functions

They can be written in a unified manner.

#include<TCHAR.H> // Implicit or explicit includeTCHAR cResponse; // 'Y' or 'N'TCHAR sUsername[64];// _tcs* functions

 

TCHAR is determined by the selected Character SetCharOrWchar_tThe character set settings are as follows:

Therefore, TCHAR is defined as follows:

#ifdef _UNICODEtypedef wchar_t TCHAR;#elsetypedef char TCHAR;#endif

 

In windows, the general prefixTIt indicates that it can adapt to different character sets.

For example:Strcpy,Strlen,Strcat(Including the Security suffix _ s) stands for the ANSI version;

Wcscpy,Wcslen,Wcscat(Including the Security suffix _ s), representing the Unicode version. WC represents the Wide Character;

_ Tcscpy,_ Tcslen,_ TcscatDepends on the situation:

size_t strlen(const char*); //ANSIsize_t wcslen(const wchar_t* ); //Unicdoesize_t _tcslen(const TCHAR* ); //ANSI or Unicode

 

We know that a string is represented by double quotation marks, which indicates that it is an ANSI-string and each character occupies 1-byte. For example:

"This is ANSI String. Each letter takes 1 byte."

To convert to Unicdeo-string, a prefix is required:L

[__strong__]L"This is Unicode string. Each letter would take 2 bytes, including spaces."

 

Unicode characters. Each character occupies 2-bytes, even if it can be expressed in 1-byte, such as English letters, numbers, and null characters. Therefore, the bytes occupied by a unicode-string are always multiples of 2-bytes.

In combination with the above mentionedTPrefix, which is applicable to the following two character sets:

"ANSI String"; // ANSIL"Unicode String"; // Unicode_T("Either string, depending on compilation"); // ANSI or Unicode

 

_ TOrTEXTIs a macro definition, which corresponds to the prefixTIs defined as follows:

// SIMPLIFIED#ifdef _UNICODE  #define _T(c) L##c #define TEXT(c) L##c#else  #define _T(c) c #define TEXT(c) c#endif

The above # Is called "token-pasting operator ". In Unicode, _ T ("Unicode") is translated into L "Unicode"; in ANSI, _ T ("Unicode") is translated into "Unicode ".

 

Note:You cannot use _ T to convert a variable (string or character). The following operations are as follows:Not AllowedOf:

char c = 'C';char str[16] = "CodeProject";_T(c);_T(str);

If you compile in ANSI (Multi-Byte,_ T (c),_ T (str)TranslatedC,Str;

However, when compiling in Unicode, an error is returned:

error C2065: 'Lc' : undeclared identifiererror C2065: 'Lstr' : undeclared identifier

Integration_ TDefinition is not difficult to understand.

 

In windows, almost all APIs that require string or character input have common versions, such:SetWindowTextA/WYou can write it as follows:

BOOL SetWindowText(HWND, const TCHAR*);

But we know thatSetWindowTextIs a macro, which represents one of the following two types:

BOOL SetWindowTextA(HWND, const char*);BOOL SetWindowTextW(HWND, const wchar_t*);

However, both ANSI and Unicode are implemented internally.UnicodeWhen you callSetWindowTextA(Input ANSI-string), it will first convert to Unicode-string, then callSetWindowTextW. Only the Unicode version works!

Therefore, we recommend that you directly call the Unicode api when writing code, although we are more familiar with the string of the ANSI version.

Note:Another typedef exists:WCHAR, Which is equivalentWchar_t.

 

We knowStrlenDefinition:

size_t strlen(const char*);

It can also be written

size_t strlen(LPCSTR);

So

// Simplifiedtypedef const char* LPCSTR;  

Its meaning is as follows:

  • LP:Long Pointer
  • C:Constant
  • STR:String

Long Pointer has the same meaning as Pointer.

 

Unicode characters include:

size_t wcslen(const wchar_t* szString); // Or WCHAR*size_t wcslen(LPCWSTR szString);

HereLPCWSTRRepresentative

typedef const WCHAR* LPCWSTR;

Its meaning is as follows:

  • LP-Pointer
  • C-Constant
  • WSTR-Wide character String

Further, there areLPCTSTR

  • LP-Pointer
  • C-Constant
  • T = TCHAR
  • STR = String

Summary:

  • TCHAR-char/wchar_t (depending on the character set)
  • LPSTR-char *
  • LPCSTR-const char *
  • LPWSTR-wchar_t *
  • LPCWSTR-const wchar_t *
  • LPTSTR-TCHAR *
  • LPCTSTR-const TCHAR *

 

In programming, sometimes the selected character set is different, and the compilation error occurs. The following method is fine in ANSI, but an error is reported in Unicode:

int main(){    TCHAR name[] = "Saturn";    int nLen; // Or size_t    lLen = strlen(name);}
  • Error C2440: 'initializing': cannot convert from 'const char [7] 'to 'tchar []'
  • Error C2664: 'strlen': cannot convert parameter 1 from 'tchar [] 'to 'const char *'

The same problem occurs in:

nLen = wcslen("Saturn");// ERROR: cannot convert parameter 1 from 'const char [7]' to 'const wchar_t *'

Unfortunately, the above errorNoModify by force conversion:

nLen = wcslen((const wchar_t*)"Saturn");

The above writing method will produce incorrect results, often leading to cross-border. The reason is that "Saturn" occupies 7 bytes.

'S'(83) 'A'(97) 'T'(116) 'U'(117) 'R'(114) 'N'(110) '\ 0'(0)

However, when it is passed to wcslen, it is assigned 2-bytes for each character. Therefore, the first two bytes [] are considered as one character. value :( 97 <8 | 83) is the character '? '. And so on.

Therefore, if Unicode APIs are used, convert them in advance:

TCHAR name [] = _ T ("Saturn"); // or wcslen (L "Saturn ");

 

In the previous example, the name in strlen (name) is compiled in Unicode, and each character occupies 2-bytes. If it is forcibly converted to ANSI:

lLen = strlen ((const char*)name);

The problem also occurs. The first byte [83] in ANSI can be correctly translated into's ', however, the second byte [0] is directly translated as '\ 0', ending the entire string.Therefore, strlen returns 1..

In summary, the forced conversion of the C language style does not work here.

 

To allocate memory, use new in C ++ to specify the number of characters, regardless of the number of bytes allocated:

LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.

However, if you use APIs such as malloc, LocalAlloc, and GlobalAlloc to allocate space, you need to specify the number of bytes:

pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.