Introduction
Because c-language strings are prone to errors and are difficult to manage, hackers may even use the possible buffer overflow bug to target c-language strings. Therefore, many character string encapsulation classes have emerged. Unfortunately, in some cases, we do not know which string class to use or how to convert a C-style string into a string encapsulation class.
This article describes all the string types that appear in the Win32 API, MFC, STL, wtl, and Visual C ++ runtime libraries. I will describe the usage of each class and show you how to create an object for each class and convert a class to another class. The controlled string and the class in Visual C ++ 7 are completed by Nish.
To better benefit from this article, you must understand different character types and encodings.Part 1.
Rule #1 of string classes
It is not good to use cast to implement type conversion unless it is explicitly stated in some documents that such conversion can be used.
I am prompted to write these two articles because of some common problems in string type conversion. When we use cast to convert a string from Type X to type Z, we do not know why the Code cannot work normally. A wide variety of string types, especially BSTR, are hardly explicitly indicated in any document in any place where cast can be used for type conversion. So I think some people may use cast to implement type conversion and want this conversion to work properly.
Unless the source string is a string packaging class explicitly specified to support conversion operators, cast does not convert the string. Using cast for constant strings does not play any role, so the following code:
void SomeFunc ( LPCWSTR widestr );main(){ SomeFunc ( (LPCWSTR) "C://foo.txt" ); // WRONG!}
It will certainly fail. It can be compiled because the cast operation will cancel the type check of the compiler. However, the code compilation is correct.
In the following example, I will specify when cast is valid.
C-style strings and typedefs
As I mentioned in the first part, Windows APIS is defined by tchars. during compilation, it can be compiled into MBCS or Unicode characters based on whether you define _ MBCS or _ Unicode. You can refer to the complete description of tchar in the first part. For convenience, I listed the typedefs
| Type |
Meaning |
| Wchar |
Unicode character (wchar_t) |
| Tchar |
MBCS or Unicode character, depending on Preprocessor settings |
| Lpstr |
String of char (char *) |
| Lpcstr |
Constant string of char (const char *) |
| Lpwstr |
String of wchar (wchar *) |
| Lpcwstr |
Constant string of wchar (const wchar *) |
| Lptstr |
String of tchar (tchar *) |
| Lpctstr |
Constant string of tchar (const tchar *) |
The added character type is oletype. It indicates the character type used in the automated interface (for example, the interface provided by word that enables you to operate the document. This type is generally defined as wchar_t. However, if you define ole2ansi pre-processing mark, olechar will be defined as char. I know that there is no reason to define ole2ansi (since mfc3, Microsoft has no longer used it), so from now on I will treat olechar as a Unicode character.
Here we will see some typedefs related to olechar:
| Type |
Meaning |
| Olechar |
Unicode character (wchar_t) |
| Lpolestr |
String of olechar (olechar *) |
| Lpcolestr |
Constant string of olechar (const olechar *) |
There are two macro definitions used to enclose strings and character constants. They can make the same code used for MBCS and Unicode builds:
| Type |
Meaning |
| _ T (X) |
Prepends L to the literal in Unicode builds. |
| Olestr (X) |
Prepends L to the literal to make it an lpcolestr. |
In documents or routines, you will also see many _ t variants. There are four equivalent macro definitions: Text, _ text, _ text, and _ T. They are all used in the same way.
Com string -- BSTR and Variant
Many automated and COM interfaces use BSTR to define strings. There are several "traps" in bstrs, so here I use a separate part to describe it.
BSTR is a mixture of the Pascal-style string (the string length is clearly indicated) and the C-style string (the string length must be calculated by finding the terminator. A bstr is a unicode string with a length of prior consideration and a 0 character as the end mark. The following is an example of BSTR:
| 06 00 00 00 |
42 00 |
6f 00 |
62 00 |
00 00 |
| -- Length -- |
B |
O |
B |
EOS |
Note how the string length is added to the string data. The length is DWORD type. It stores the number of bytes in the string, but does not include the end mark. In this example, "Bob" contains three Unicode characters (excluding the terminator), 6 bytes in total. The length of a string is pre-stored so that the COM library knows how much data needs to be transferred when a BSTR is transferred between processes or computers. (On the other hand, a BSTR can store any data block, not just characters. It can also contain 0 characters embedded in the data. However, due to the purpose of this article, I will not consider those situations ).
In C ++, a bstr is actually a pointer to the first character in the string. It is defined as follows:
BSTR bstr = NULL; bstr = SysAllocString ( L"Hi Bob!" ); if ( NULL == bstr ) // out of memory error // Use bstr here... SysFreeString ( bstr );
Naturally, various BSTR encapsulation classes implement memory management for you.
Another variable type used in the automation interface is variant. It is used to transmit data in non-type (typeless) languages such as JScript and VBScript. A variant may contain many different types of data, such as long and idispatch *. When a variant contains a string, the string is saved as a BSTR. When I talk about variant encapsulation classes later, I will introduce variant more.
String encapsulation class
So far, I have introduced various types of strings. The following describes the encapsulation class. For each encapsulation class, I will show how to create an object and convert it into a C-language string pointer. C-style string pointers are often required for API calls or creation of a different string class object. I will not introduce other operations provided by the string class, such as sorting and comparison.
Repeat it. Do not use cast blindly to implement type conversion unless you know exactly what the result code will do.
Class provided by CRT
_ Bstr_t
_ Bstr_t is a complete encapsulation class for BSTR, which actually hides the underlying BSTR. It provides various constructors and operators to access the underlying C-language string. However, _ bstr_t does not access the operator of BSTR itself. Therefore, a _ bstr_t string cannot be passed as an output parameter to a com method. If you need a BSTR * parameter, it is easier to use the ATL class ccombstr.
A _ bstr_t string can be passed to a function whose receiving parameter type is BSTR, because the following three conditions are met at the same time. First, _ bstr_t has a conversion function to wchar_t *. Second, for the compiler, wchar_t * has the same meaning as BSTR due to the definition of BSTR. Third, _ bstr_t internal wchar_t * points to a piece of memory that stores data in the form of BSTR. Therefore, even if there is no documentation, _ bstr_t can be converted to BSTR, which can still be converted normally.
// Constructing_bstr_t bs1 = "char string"; // construct from a LPCSTR_bstr_t bs2 = L"wide char string"; // construct from a LPCWSTR_bstr_t bs3 = bs1; // copy from another _bstr_t_variant_t v = "Bob";_bstr_t bs4 = v; // construct from a _variant_t that has a string // Extracting dataLPCSTR psz1 = bs1; // automatically converts to MBCS stringLPCSTR psz2 = (LPCSTR) bs1; // cast OK, same as previous lineLPCWSTR pwsz1 = bs1; // returns the internal Unicode stringLPCWSTR pwsz2 = (LPCWSTR) bs1; // cast OK, same as previous lineBSTR bstr = bs1.copy(); // copies bs1, returns it as a BSTR // ...SysFreeString ( bstr );
Note that _ bstr_t also provides conversion Operators Between char * And wchar_t. This is a skeptical design, because even if they are non-constant string pointers, you cannot use these pointers to modify the content of the buffer they direct to, because it will destroy the internal BSTR structure.
_ Variant_t
_ Variant_t is a complete encapsulation of variant. It provides many constructors and conversion functions to operate on a large amount of data types that variant may contain. Here, I will only introduce operations related to strings.
// Constructing_variant_t v1 = "char string"; // construct from a LPCSTR_variant_t v2 = L"wide char string"; // construct from a LPCWSTR_bstr_t bs1 = "Bob";_variant_t v3 = bs1; // copy from a _bstr_t object // Extracting data_bstr_t bs2 = v1; // extract BSTR from the VARIANT_bstr_t bs3 = (_bstr_t) v1; // cast OK, same as previous line
Note::
If the type conversion cannot be executed, the _ variant_t method can throw an exception, so you should prepare to catch the _ com_error exception.
Note that:
There is no direct conversion from a _ variant_t variable to an MBCS string. You need to create a temporary _ bstr_t variable, use another string class that provides Unicode to MBCS conversion, or use an ATL conversion macro.
Unlike _ bstr_t, A _ variant_t variable can be directly passed as a parameter to a com method. _ Variant_t
It is inherited from the variant type, so passing a _ variant_t to replace the variant variable is allowed by the C ++ language.
STL class
STL has only one string class, basic_string. A basic_string is used to manage an array of strings ending with 0. The character type is a basic_string modulo parameter. In general, a basic_string type variable should be treated as an opaque object. You can get a read-only pointer to the internal buffer, but any write operation must use the basic_string operator and method.
Basic_string has two predefined types: string type containing char and wstring type containing wchar_t. There is no built-in tchar type, but you can use the Code listed below to implement it.
// Specializationstypedef basic_string
tstring; // string of TCHARs // Constructingstring str = "char string"; // construct from a LPCSTRwstring wstr = L"wide char string"; // construct from a LPCWSTRtstring tstr = _T("TCHAR string"); // construct from a LPCTSTR // Extracting dataLPCSTR psz = str.c_str(); // read-only pointer to str''s bufferLPCWSTR pwsz = wstr.c_str(); // read-only pointer to wstr''s bufferLPCTSTR ptsz = tstr.c_str(); // read-only pointer to tstr''s buffer
Unlike _ bstr_t, A basic_string variable cannot be directly converted between character sets. However, you can pass the pointer returned by c_str () to the constructor of another class (if the constructor of this class accepts this character type ). For example:
// Example, construct _bstr_t from basic_string_bstr_t bs1 = str.c_str(); // construct a _bstr_t from a LPCSTR_bstr_t bs2 = wstr.c_str(); // construct a _bstr_t from a LPCWSTR
ATL class
Ccombstr
Ccombstr is the BSTR encapsulation class in ATL, which is more useful than _ bstr_t in some cases. The most notable thing is that ccombstr allows access to the underlying BSTR, which means you can pass a ccombstr object to the com method. The ccombstr object can automatically manage BSTR memory for you. For example, suppose you want to call the method of the following interface:
// Sample interface:struct IStuff : public IUnknown{ // Boilerplate COM stuff omitted... STDMETHOD(SetText)(BSTR bsText); STDMETHOD(GetText)(BSTR* pbsText);};
Ccombstr has an operator-BSTR method, so it can be directly passed to the settext () function. There is another operation -- &, which returns a BSTR *. Therefore, you can use the & operator for a ccombstr object and pass it to the function that requires the BSTR * parameter.
CComBSTR bs1;CComBSTR bs2 = "new text"; pStuff->GetText ( &bs1 ); // ok, takes address of internal BSTR pStuff->SetText ( bs2 ); // ok, calls BSTR converter pStuff->SetText ( (BSTR) bs2 ); // cast ok, same as previous line
Ccombstr has a constructor similar to _ bstr_t, but does not have a built-in function for String Conversion to MBCS. Therefore, you need to use an ATL conversion macro.
// ConstructingCComBSTR bs1 = "char string"; // construct from a LPCSTRCComBSTR bs2 = L"wide char string"; // construct from a LPCWSTRCComBSTR bs3 = bs1; // copy from another CComBSTRCComBSTR bs4; bs4.LoadString ( IDS_SOME_STR ); // load string from string table// Extracting dataBSTR bstr1 = bs1; // returns internal BSTR, but don''t modify it!BSTR bstr2 = (BSTR) bs1; // cast ok, same as previous lineBSTR bstr3 = bs1.Copy(); // copies bs1, returns it as a BSTRBSTR bstr4; bstr4 = bs1.Detach(); // bs1 no longer manages its BSTR // ... SysFreeString ( bstr3 ); SysFreeString ( bstr4 );
Note that the detach () method is used in the previous example. After this method is called, The ccombstr object no longer manages its BSTR string or its memory. This is why bstr4 needs to call sysfreestring.
In addition, the overloaded & operator means that you cannot directly use the ccombstr variable, such as list, in some STL containers. Container requires the & operator to return a pointer to the class contained in the container, but use the & operator to return BSTR * instead of ccombstr * For the ccombstr variable *. However, there is an ATL class that can solve this problem. This class is cadapt. For example, you can declare a ccombstr list as follows:
std::list< CAdapt<CComBSTR> > bstr_list;
Cadapt provides the operators required by containers, but these operators are transparent to your code. You can use bstr_list as a ccombstr list.
Ccomvariant
Ccomvariant is the encapsulation class of variant. However, unlike _ variant_t, variant is not hidden in ccomvariant. In fact, you need to directly access the variant member. Ccomvariant provides many constructors to process various types that variant can contain. Here, I will only introduce string-related operations.
// ConstructingCComVariant v1 = "char string"; // construct from a LPCSTRCComVariant v2 = L"wide char string"; // construct from a LPCWSTRCComBSTR bs1 = "BSTR bob";CComVariant v3 = (BSTR) bs1; // copy from a BSTR // Extracting dataCComBSTR bs2 = v1.bstrVal; // extract BSTR from the VARIANT
Unlike _ variant_t, there are no conversion operators for various types contained in variant. As described above, you must directly access the variant member and make sure that the variant variable stores the expected type. If you want to convert a ccomvariant data type into a BSTR data type, you can call the changetype () method.
CComVariant v4 = ... // Init v4 from somewhereCComBSTR bs3; if ( SUCCEEDED( v4.ChangeType ( VT_BSTR ) )) bs3 = v4.bstrVal;
Like _ variant_t, ccomvariant does not provide the conversion operation for MBCS String Conversion. You need to create an intermediate variable of the _ bstr_t type, use another string class that provides conversion from Unicode to MBCS, or use an ATL conversion macro.
ATL conversion macro
ATL: conversion macros are a convenient way to convert various character encodings. They are very useful in function calls. The name of the ATL conversion macro is the [Source Type] 2 [New Type] or [Source Type] 2C [New Type] named according to the following pattern. According to the second form, the macro conversion result is a constant pointer (corresponding to "C" in the name "). Various types are abbreviated as follows:
A: MBCS string, char* (A for ANSI)W: Unicode string, wchar_t* (W for wide)T: TCHAR string, TCHAR*OLE: OLECHAR string, OLECHAR* (in practice, equivalent to W)BSTR: BSTR (used as the destination type only)
Therefore, the w2a () macro converts a unicode string into an MBCS string. The t2cw () macro converts a tchar string into a unicode String constant.
To use these macros, you must first include the atlconv. h header file. You can even include this header file in a non-ATL project to use the macro defined in it, because this header file is independent of other parts of ATL and does not need a _ module global variable. When you use a conversion macro in a function, you need to put the uses_conversion macro at the beginning of the function. It defines some local variables required for macro conversion.
When the conversion target type is other than BSTR, the converted string exists in the stack. Therefore, if you want to keep the lifecycle of a string longer than the current function, you need to copy the string to other string classes. When the target type is BSTR, the memory will not be automatically released. You must assign the returned value to a BSTR variable or a BSTR encapsulation class to avoid Memory leakage.
The following are examples of various conversion macros:
// Functions taking various strings:void Foo ( LPCWSTR wstr );void Bar ( BSTR bstr );// Functions returning strings:void Baz ( BSTR* pbstr );#include <atlconv.h>main(){using std::string;USES_CONVERSION; // declare locals used by the ATL macros// Example 1: Send an MBCS string to Foo()LPCSTR psz1 = "Bob";string str1 = "Bob"; Foo ( A2CW(psz1) ); Foo ( A2CW(str1.c_str()) ); // Example 2: Send a MBCS and Unicode string to Bar()LPCSTR psz2 = "Bob";LPCWSTR wsz = L"Bob";BSTR bs1;CComBSTR bs2; bs1 = A2BSTR(psz2); // create a BSTR bs2.Attach ( W2BSTR(wsz) ); // ditto, assign to a CComBSTR Bar ( bs1 ); Bar ( bs2 ); SysFreeString ( bs1 ); // free bs1 memory // No need to free bs2 since CComBSTR will do it for us. // Example 3: Convert the BSTR returned by Baz()BSTR bs3 = NULL;string str2; Baz ( &bs3 ); // Baz() fills in bs3 str2 = W2CA(bs3); // convert to an MBCS string SysFreeString ( bs3 ); // free bs3 memory}
As you can see, It is very convenient to use these conversion macros when you have a string that is different from the parameter types required by the function.
MFC class
Cstring
Because an object of the MFC cstring class contains tchar characters, the exact character type depends on the pre-processing symbol you define. Generally speaking, cstring is similar to STL string, which means you must treat it as an opaque object. You can only use the method provided by cstring to modify the cstring object. Cstring has the advantage that a string does not have: cstring has the constructor that receives both MBCS and Unicode strings, and it also has an lpctstr conversion character, therefore, you can pass the cstring object directly to a function that receives the lpctstr without calling the c_str () function.
// ConstructingCString s1 = "char string"; // construct from a LPCSTRCString s2 = L"wide char string"; // construct from a LPCWSTRCString s3 ( '' '', 100 ); // pre-allocate a 100-byte buffer, fill with spacesCString s4 = "New window text"; // You can pass a CString in place of an LPCTSTR: SetWindowText ( hwndSomeWindow, s4 ); // Or, equivalently, explicitly cast the CString: SetWindowText ( hwndSomeWindow, (LPCTSTR) s4 );
You can load a string from your string table. A cstring constructor and loadstring () function can complete it. The format () method can read a string with a certain format from the string table at will.
// Constructing/loading from string tableCString s5 ( (LPCTSTR) IDS_SOME_STR ); // load from string tableCString s6, s7; // Load from string table. s6.LoadString ( IDS_SOME_STR ); // Load printf-style format string from the string table: s7.Format ( IDS_SOME_FORMAT, "bob", nSomeStuff, ... );
The first constructor looks a bit strange, but this is actually the method described in the document for loading a string. Note that for a cstring variable, the only valid conversion character you can use is lpctstr. Converting to lptstr is incorrect. The habit of converting a cstring variable into lptstr will hurt you, because when your program crashes, you may not know why, because you use the same code everywhere and they happen to work normally at that time. Call the getbuffer () method to obtain a very large pointer to the buffer. The following is an example of correct usage. This code is used to set text for items in a list control:
CString str = _T("new text");LVITEM item = {0}; item.mask = LVIF_TEXT; item.iItem = 1; item.pszText = (LPTSTR)(LPCTSTR) str; // WRONG! item.pszText = str.GetBuffer(0); // correct ListView_SetItem ( &item );str.ReleaseBuffer(); // return control of the buffer to str
The psztext member is an lptstr variable and a very large pointer. Therefore, you need to call getbuffer () for Str (). The getbuffer () parameter is the minimum length that you need cstring to allocate to the buffer. For some reason, you need a modifiable buffer to store 1 K tchars. You need to call getbuffer (1024 ). When 0 is used as a parameter, getbuffer () returns a pointer to the current content of the string.
The preceding underlined statement can be compiled. In this case, it can even work normally. But this does not mean that this line of code is correct. By using extraordinary conversions, you have destroyed the object-oriented encapsulation and made some assumptions about the internal implementation of cstring. If you have this conversion habit, you will eventually fall into a code crash. You may wonder why the Code cannot work normally, because you use the same code everywhere and the code looks correct.
Do you know how many software bugs people complain about? The bug in the software is caused by Incorrect code written by the programmer. Do you really want to write some code that you know is wrong to contribute to the recognition that all software is full of bugs? Take some time to learn how to use cstring to make your code work normally at any time.
Cstring has two functions to create a BSTR from a cstring. They are allocsysstring () and setsysstring ().
// Converting to BSTRCString s5 = "Bob!";BSTR bs1 = NULL, bs2 = NULL; bs1 = s5.AllocSysString(); s5.SetSysString ( &bs2 ); SysFreeString ( bs1 ); SysFreeString ( bs2 );
Colevariant
Colevariant is similar to ccomvariant. Colevariant inherits from variant, so it can be passed to the function that receives variant. Unlike ccomvariant, colevariant only has one lpctstr constructor. There is no constructor for the lpcstr and lpcwstr. In most cases, this is not a problem, because your string may be the lpctstrs in any case, but this is a problem to be aware. Colevariant also has a constructor that receives cstring parameters.
// ConstructingCString s1 = _T("tchar string");COleVariant v1 = _T("Bob"); // construct from an LPCTSTRCOleVariant v2 = s1; // copy from a CString
Like ccomvariant, you must directly access variant members. To convert variant into a string, you should use the changetype () method. However, colevariant: changetype () throws an exception if it fails, instead of returning an hresult code that indicates a failure.
// Extracting dataCOleVariant v3 = ...; // fill in v3 from somewhereBSTR bs = NULL; try { v3.ChangeType ( VT_BSTR ); bs = v3.bstrVal; } catch ( COleException* e ) { // error, couldn''t convert } SysFreeString ( bs );
Wtl class
Cstring
The cstring behavior of wtl is exactly the same as that of MFC cstring, so you can refer to the above section on the cstring of MFC.
System: string is a. Net class used to process strings. Internally, A String object contains an unchangeable string sequence. Any operation on the string object actually returns a New String object, because the original object cannot be changed. One feature of string is that if you have more than one string object that contains the same character sequence, they actually point to the same object. Compared with the use extension of C ++, A New String constant prefix S is added, and S is used to represent a controlled String constant (a managed string literal ). You can pass an uncontrolled string to create a String object, but this may cause a slight loss in efficiency compared to using a controlled string to create a String object. This is because all string instances with the same prefix of S represent the same object, but this is not applicable to uncontrolled objects. The following code makes it clear that the correct method for comparing strings without the S prefix is to use string: compareto () the above two lines of code print 0, indicating that the two strings are equal. String and MFC 7 cstring are easily converted. Cstring has a conversion operation to lpctstr, while string has two constructors that receive char * And wchar_t *. Therefore, you can directly pass a cstring variable to a string constructor. The opposite direction conversion is similar. This may confuse you, but it does work. Since vs. net, cstring has a constructor that receives string objects. For some quick operations, you may want to access the underlying string: ptrtostringchars () and return a const _ wchar_t * pointing to the underlying string. we need to fix it, otherwise, the Garbage Collector may move it when we are managing its content. When using string encapsulation classes in printf () or similar functions, you must be very careful. These functions include sprintf () and its variants, as well as trace and atltrace macros. Because these functions do not check the type of added parameters, you must be careful that they can only be passed to their C-style string pointers, rather than a complete string class. For example, to pass a _ bstr_t string to atltrace (), you must use an explicit conversion (lpcstr) or (lpcwstr ):