Address: http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc
Export C ++ windows programmers get confused over what bizarre identifiers likeTCHAR
,LPCTSTR
Are. In this article, I wocould attempt by best to clear out the fog.
In general, a character can be represented in 1 byte or 2 bytes. Let's say 1-byte character is ANSI character-all English characters are represented through thisEncoding. And let's say a 2-byte character is Unicode, which can represent all ages in the world.
Visual c ++ compiler supportschar
Andwchar_t
As native data-types for ANSI and Unicode characters respectively. Though there is more concrete definitionUnicode, But for understanding assume it as two-byte character which Windows OS uses for multiple language support.
What if you want your C/C ++ code to be independent of character encoding/mode used?
Suggestion: Use generic data-types and names to represent characters and string.
For example, instead of replacing:
Collapse | copy code
char cResponse; // 'Y' or 'N'char sUsername[64];// str* functions
With
Collapse | copy code
wchar_t cResponse; // 'Y' or 'N'wchar_t sUsername[64];// wcs* functions
In order to support multi-lingual (I. e. Unicode) in your language, you can simply code it in more generic manner:
Collapse | copy code
#include<TCHAR.H> // Implicit or explicit includeTCHAR cResponse; // 'Y' or 'N'TCHAR sUsername[64];// _tcs* functions
The following project setting in general page describes which character set is to be used for compilation:
(General-> Character Set)
This way, when your project is being compiled as Unicode,TCHAR
Wocould translatewchar_t
. If it is being compiled as ANSI/MBCS, it wocould be translatedchar
. You are free to usechar
Andwchar_t
, And project settings will not affect any direct use of these keywords.
TCHAR
Is defined:
Collapse | copy code
#ifdef _UNICODEtypedef wchar_t TCHAR;#elsetypedef char TCHAR;#endif
The macro_UNICODE
Is defined when you set character set"Use Unicode Character Set", And thereforeTCHAR
Wocould meanwchar_t
. When character set if set"Use multi-Byte Character Set", Tchar wocould meanchar
.
Likewise, to support multiple character-set using single code base, and possibly supporting multi-language, use specific functions (macros). Instead of usingstrcpy
,strlen
,strcat
(Including the secure versions suffixed_ S); Orwcscpy
,wcslen
,wcscat
(Including secure), You shoshould better use_tcscpy
,_tcslen
,_tcscat
Functions.
As you knowstrlen
Is prototyped:
Collapse | copy code
size_t strlen(const char*);
And,wcslen
Is prototyped:
Collapse | copy code
size_t wcslen(const wchar_t* );
You may better use_tcslen
, Which isLogicallyPrototyped:
Collapse | copy code
size_t _tcslen(const TCHAR* );
WCIs for wide character. Therefore,wcs
Turns to be wide-character-string. This way,_tcs
Wocould mean _ t character string. And you know _ T may bechar
Orwhat_t
, Logically.
But, in reality,_tcslen
(And other_tcs
Functions) are actuallyNotFunctions,Macros. They are defined simply:
Collapse | copy code
#ifdef _UNICODE#define _tcslen wcslen #else#define _tcslen strlen#endif
You shoshould referTCHAR.H
To lookup more macro definitions like this.
You might ask why they are defined as macros, and not implemented as functions instead? The reason is simple: a library or dll may export a single function, with same name and prototype (ignore overloading concept of C ++). For instance, when you export a function:
Collapse | copy code
void _TPrintChar(char);
How the client is supposed to call it?
Collapse | copy code
void _TPrintChar(wchar_t);
_TPrintChar
Cannot be magically converted into function taking 2-byte character. There has to be two separate functions:
Collapse | copy code
void PrintCharA(char); // A = ANSI void PrintCharW(wchar_t); // W = Wide character
And a simple macro, as defined below, wocould hide the difference:
Collapse | copy code
#ifdef _UNICODEvoid _TPrintChar(wchar_t); #else void _TPrintChar(char);#endif
The client wocould simply call it:
Collapse | copy code
TCHAR cChar;_TPrintChar(cChar);
Note that bothTCHAR
And_TPrintChar
Wocould mapEitherUnicode or ANSI, and thereforecChar
And the argument to function wocould be eitherchar
Orwchar_t
.
Macros do avoid these complications, and allows us to use either ANSI or Unicode functions for characters and strings. most of the Windows functions, that take string or a character are implemented this way, and for programmers convenience, only one function (A macro!) Is good.SetWindowText
Is one example:
Collapse | copy code
// WinUser.H#ifdef UNICODE#define SetWindowText SetWindowTextW#else#define SetWindowText SetWindowTextA#endif // !UNICODE
There are very few functions that do not have macros, and are available only with suffixedWOrA. One example isReadDirectoryChangesW
, Which doesn' t have ANSI equivalent.
You all know that we use double quotation marks to represent strings. The string represented in this manner is ANSI-string, having 1-byte each character. Example:
Collapse | copy code
"This is ANSI String. Each letter takes 1 byte."
The string text given above isNotUnicode, and wocould be quantifiable for multi-language support. To represent Unicode string, you need to use prefixL
. An example:
Collapse | copy code
L"This is Unicode string. Each letter would take 2 bytes, including spaces."
NoteLAt the beginning of string, which makes it a unicode string. All characters (I repeatAllCharacters) wocould take two bytes, including all English letters, spaces, digits, and the null character. therefore, length of Unicode string wocould always be in multiple of 2-bytes. A Unicode string of length 7 characters wowould need 14 bytes, and so on. unicode string taking 15 bytes, for example, wocould not be valid in any context.
In general, string wocould be in multiplesizeof(TCHAR)
Bytes!
When you need to express hard-coded string, you can use:
Collapse | copy code
"ANSI String"; // ANSIL"Unicode String"; // Unicode_T("Either string, depending on compilation"); // ANSI or Unicode// or use TEXT macro, if you need more readability
The non-prefixed string is ANSI string,LPrefixed string is Unicode, and string specified in_T
OrTEXT
Wocould be either, depending on compilation.
String classes, like MFC/ATL's cstring implement two versions using macro. There are two classes named cstringa For ANSI, cstringw for Unicode. When you use cstring (which isMacro/typedef), It translates to either of two classes. Okay. The tchar type-definition was for a single character. You can definitely declare an array of tchar. What if you want to expressCharacter-pointer, OrConst-character-pointer-Which one of the following?
Collapse | copy code
// ANSI charactersfoo_ansi(char*);foo_ansi(const char*);/*const*/ char* pString; // Unicode/wide-stringfoo_uni(WCHAR*); // or wchar_t*foo_uni(const WCHAR*);/*const*/ WCHAR* pString; // Independent foo_char(TCHAR*);foo_char(const TCHAR*);/*const*/ TCHAR* pString;
After reading aboutTCHAR
Stuff, you 'd definitely select the last one as your choice. But here is a better alternative. Before that, note thatTchar. hHeader file declaresOnly TCHAR
Datatype and for the following stuff, you need to includeWindows. h(Defined inWinnt. h).
NOTE: If your project implicitly or explicitly has desWindows. h, You need not includeTchar. h
- Char *Replacement:
LPSTR
- Const char *Replacement:
LPCSTR
- Wchar *Replacement:
LPWSTR
- Const wchar *Replacement:
LPCWSTR
(CBeforeW, Sinceconst
Is beforeWCHAR
)
- Tchar *Replacement:
LPTSTR
- Const tchar *Replacement:
LPCTSTR
Now, I hope you understand the following signatures: Collapse | copy code
BOOL SetCurrentDirectory( LPCTSTR lpPathName );DWORD GetCurrentDirectory(DWORD nBufferLength,LPTSTR lpBuffer);
Continuing. You must have seen some functions/methods asking you to passNumber of characters, Or returning the number of characters. Well, likeGetCurrentDirectory
, You need to pass number of characters, andNotNumber of bytes. For example ::
Collapse | copy code
TCHAR sCurrentDir[255]; // Pass 255 and not 255*2 GetCurrentDirectory(sCurrentDir, 255);
On the other side, if you need to allocate number or characters, you must allocate proper number of bytes. In C ++, you can simply usenew
: Collapse | copy code
LPTSTR pBuffer; // TCHAR* pBuffer = new TCHAR[128]; // Allocates 128 or 256 BYTES, depending on compilation.
But if you use memory allocation functions likemalloc
,LocalAlloc
,GlobalAlloc
, Etc; you must specify the number of bytes! Collapse | copy code
pBuffer = (TCHAR*) malloc (128 * sizeof(TCHAR) );
Typecasting the return value is required, as you know. The expression inmalloc
'S argument ensures that it allocates desired number of bytes-and makes up room for desired number of characters. License
This article, along with any associated source code and files, is licensed under the Code project open license (cpol)
About the author
Ajay vijayvargiya Software developer (senior) India Member |
Started programming with gwbasic back in 1996 (those lovely days !). Found the hidden talent! Touched COBOL and Quick Basic for a while. Finally learned C and C ++ entirely on my own, and fell in love with C ++, still in love! Began with Turbo C 2.0/3.0, then to vc6 for 4 years! Finally on vc2008/2010. I enjoy programming, mostly the system programming, but the UI is always on top of MFC! Quite experienced on other environments and platforms, but I prefer VISUAL C ++. zeal to learn, and to share! |