About the Cstring class
Copyright & copy;
Stevencao@benq.com
2003-11-6
I read a lot of programs written by myself, including some code, and found that a large part of the bug is about the incorrect usage of CString In the MFC class. the main cause of this error is that you do not know much about the CString implementation mechanism.
CString is a packaging of the string type in the original Standard c. Because, after a long period of programming, we found that many program bugs are related to strings, such as buffer overflow and Memory leakage. These bugs are fatal and cause system paralysis. Therefore, c ++ specifically creates a class to maintain the string pointer. The string class in the Standard c ++ is string, and the CString class is used in the microsoft MFC class library. Using the string class, you can avoid the problems about string pointers in c.
Here we will briefly look at how CString is implemented in Microsoft MFC. Of course, it is best to analyze the Code directly based on the principle. Most of the implementations of CString classes in MFC are in strcore. cpp.
CString is a buffer for storing strings and encapsulation of operations applied to this string. That is to say, CString requires a buffer for storing strings, and a pointer pointing to the buffer, Which is LPTSTR m_pchData. However, some string operations increase or decrease the length of the string. Therefore, in order to reduce frequent memory requests or release the memory, CString first applies for a large memory block to store the string. In this way, when the length of a string increases in the future, if the total length of the added string does not exceed the length of the pre-applied memory block, you do not need to apply for memory. When the length of the added string exceeds the pre-applied memory, CString first releases the original memory and then re-applies for a larger memory block. Similarly, when the string length is reduced, the extra memory space is not released. Instead, the excess memory will be released at a time when the accumulation reaches a certain level.
In addition, when a CString object a is used to initialize another CString object B, to save space, the new object B does not allocate space, all it has to do is direct its pointer to the memory space of object a. Only when the string in object a or object B needs to be modified will it apply for memory space for the new object B, this is called CopyBeforeWrite ).
In this way, only one pointer can not fully describe the specific situation of the memory, and more information is required to describe.
First, you need a variable to describe the total size of the current memory block.
Second, a variable is required to describe the current memory block usage. That is, the length of the current string.
In addition, a variable is required to describe the memory block referenced by other cstrings. If an object references the memory block, the value is added.
CString defines a struct to describe the information:
Struct CStringData
{
Long nRefs; // reference count
Int nDataLength; // length of data (including terminator)
Int nAllocLength; // length of allocation
// TCHAR data [nAllocLength]
TCHAR * data () // TCHAR * to managed data
{Return (tchar *) (This + 1 );}
};
In actual use, the memory block size occupied by the struct is not fixed. The struct is placed in the memory block header of the cstring. The sizeof (cstringdata) byte starting from the memory block header is the real memory space used to store strings. The Application Method for the data structure of this structure is as follows:
Pdata = (cstringdata *) New byte [sizeof (cstringdata) + (nlen + 1) * sizeof (tchar)];
Pdata-> nalloclength = nlen;
Nlen is used to describe the size of the memory space to be applied at one time.
It can be easily seen from the code that if you want to apply for a memory block of 256 tchar for storing strings, the actual applied size is:
Sizeof (cstringdata) bytes + (nlen + 1) tchar
The preceding sizeof (cstringdata) bytes are used to store cstringdata information. The subsequent nlen + 1 tchar is actually used to store strings, and the extra one is used to store '/0 '.
All operations in cstring are for this buffer zone. For example, lptstr cstring: getbuffer (INT nminbuflength), the implementation method is:
First, the cstringdata Object Pointer is obtained through cstring: getdata. This pointer stores the string pointer m_pchdata offset sizeof (cstringdata) to obtain the cstringdata address.
Then, a cstringdata object is re-instantiated based on the value specified by the nminbuflength parameter, so that the string buffer length in the new object can satisfy the nminbuflength.
Then, reset some description values in the new CstringData. C
Finally, the string buffer in the new CStringData object is directly returned to the caller.
These processes are described in C ++ code:
If (GetData ()-> nRefs> 1 | nMinBufLength> GetData ()-> nAllocLength)
{
// We have to grow the buffer
CStringData * pOldData = GetData ();
Int nOldLen = GetData ()-> nDataLength; // AllocBuffer will tromp it
If (nMinBufLength <nOldLen)
NMinBufLength = nOldLen;
AllocBuffer (nMinBufLength );
Memcpy (m_pchData, pOldData-> data (), (nOldLen + 1) * sizeof (TCHAR ));
GetData ()-> nDataLength = nOldLen;
CString: Release (pOldData );
}
ASSERT (GetData ()-> nRefs <= 1 );
// Return a pointer to the character storage for this string
ASSERT (m_pchData! = NULL );
Return m_pchData;
Many times, we often copy and modify large batches of strings. CString uses the CopyBeforeWrite technology. In this way, when one CString object a is used to instantiate another object B, the values of the two objects are actually the same, but if you simply apply for memory for both objects, there is nothing for strings with only a few or dozens of bytes. It is a great waste for a few K or even several M bytes of data.
Therefore, CString simply points the string address m_pchData of the new object B to the string address m_pchData of another object. The additional work is to add CStringData: nRefs to the memory application of object.
CString: CString (const CString & stringSrc)
{
M_pchData = stringSrc. m_pchData;
InterlockedIncrement (& GetData ()-> nRefs );
}
In this way, when modifying the string content of object a or object B, first check the value of CStringData: nRefs. If the value is greater than one (equal to one, only one application of the memory space is available ), this indicates that the object references the memory of another object or its memory is applied by someone else. The object first drops the application value by one and then delivers the memory to other objects for management, apply for a new memory and copy the original memory.
The simple code is as follows:
Void CString: CopyBeforeWrite ()
{
If (GetData ()-> nRefs> 1)
{
CStringData * pData = GetData ();
Release ();
AllocBuffer (pData-> nDataLength );
Memcpy (m_pchData, pData-> data (),
(PData-> nDataLength + 1) * sizeof (TCHAR ));
}
}
Release is used to determine the memory to be referenced.
Void CString: Release ()
{
If (GetData ()! = _ AfxDataNil)
{
If (InterlockedDecrement (& GetData ()-> nRefs) <= 0)
FreeData (GetData ());
}
}
When multiple objects share the same memory, this memory belongs to multiple objects, rather than the object originally applied for this memory. However, at the end of its lifecycle, each object first deletes the reference of the memory and then judges the reference value. If it is less than or equal to zero, it is released. Otherwise, hand it over to another object control that is referencing this memory.
Using this data structure, CString operations on large data volumes can save a lot of time to frequently apply for memory release and improve system performance.
Through the above analysis, we have a general understanding of the internal mechanism of CString. In general, the CString in MFC is relatively successful. However, due to the complicated data structure (using CStringData), many problems occur during use, the most typical one is to describe that the attribute value of the memory block is inconsistent with the actual value. The reason for this problem is that CString provides operations to facilitate some applications. These operations can directly return the string address values in the memory block, you can modify the address pointed to by this address value. However, after the modification, operations1 is not called to make the value in CStringData consistent. For example, you can first get the string address through operations, and then add some new characters to the string to increase the length of the string. However, because it is directly modified by the pointer, therefore, the nDataLength in the CStringData that describes the length of the string is still the original length. Therefore, when the length of the string is obtained through GetLength, the returned value must be incorrect.
Operations with these problems are described below.
1. GetBuffer
The most typical method of many errors is CString: GetBuffer (). After checking MSDN, the operation is described as follows:
Returns a pointer to the internal character buffer for the CString object. The returned LPTSTR is not const and thus allows direct modification of CString contents.
This section clearly explains that we can directly modify the value of the string pointer returned by this operation:
CString str1 ("This is the string 1"); -- 1
Int nOldLen = str1.GetLength (); -- 2
Char * pstr1 = str1.GetBuffer (nOldLen); -- 3
Strcpy (pstr1, "modified"); -- 4
Int nNewLen = str1.GetLength (); -- 5
By setting breakpoints, we can run and track This code. We can see that when we run three times, the value of str1 is "This is the string 1", and the value of nOldLen is 20. When running at 5, it is found that the value of str1 is changed to "modified ". That is to say, for the string pointer returned by GetBuffer, we pass it as a parameter to strcpy and try to modify the address pointed to by this string pointer. The result is modified successfully, in addition, the value of str1 in the CString object also changes to "modified ". However, when we call str1.GetLength () Again, we accidentally find that the returned value is still 20, but the string in str1 has actually changed to "modified ", that is to say, the returned value should be the length of the string "modified" 8! Instead of 20. Now CString is not working properly! What's going on?
Obviously, str1 is not working normally after a string copy of the pointer returned by GetBuffer.
Let's take a look at the operation description on MSDN. We can see the following:
If you use the pointer returned by GetBuffer to change the string contents, you must call ReleaseBuffer before using any other CString member functions.
Previously, you must call ReleaseBuffer after using the pointer returned by GetBuffer to use operations of other cstrings. In the above Code, we add a line of code in 4-5: str2.ReleaseBuffer (), then observe nNewLen, and find that the value is 8 at this time.
From the mechanism of CString, we can also see that GetBuffer returns the first address of the string buffer in the CStringData object. According to this address, we modify the value in this address, only the value in the string buffer in CStringData, other attributes in CStringData that are used to describe the string buffer are not correct. For example, CStringData: nDataLength is obviously the original value of 20, but now the string length is 8. In other words, we also need to modify other values in CStringData. This is why ReleaseBuffer () needs to be called.
As we expected, the ReleaseBuffer source code shows what we guess:
CopyBeforeWrite (); // just in case GetBuffer was not called
If (nNewLength =-1)
NNewLength = lstrlen (m_pchData); // zero terminated
ASSERT (nNewLength <= GetData ()-> nAllocLength );
GetData ()-> nDataLength = nNewLength;
M_pchData [nNewLength] = '/0 ';
CopyBeforeWrite implements the write copy technology.
The following code re-sets the attribute value that describes the string length in the CStringData object. First, get the length of the current string, then get the object pointer of CStringData through GetData (), and modify the nDataLength value in it.
However, the problem is that although we know the cause of the error, we know that when we modify the value indicated by the pointer returned by GetBuffer, we need to call ReleaseBuffer to use other CString operations, we can avoid making this mistake. The answer is no. This is just like everyone who knows a little about programming knows that the new application must be released through delete after it is used. Although it is very simple, finally, the actual result is that the memory leakage occurs due to the forgotten call of delete.
In practice, the value returned by GetBuffer is often modified, but the ReleaseBuffer is forgotten to be called for release. Moreover, because this error is not as important as everyone knows new and delete, there is no inspection mechanism to specifically check it, therefore, errors caused by forgetting to call ReleaseBuffer in the final program are carried to the release version.
There are many methods to avoid this error. However, the simplest and most effective way is to avoid this usage. Most of the time, we don't need this method. We can do it using other security methods.
For example, the above Code can be written as follows:
CString str1 ("This is the string 1 ");
Int nOldLen = str1.GetLength ();
Str1 = "modified ";
Int nNewLen = str1.GetLength ();
But sometimes it is required, such:
We need to perform some conversions to the strings in a CString object. This conversion is done by calling the function Translate in a dll, but what's terrible is that I don't know why, the parameters of this function use char:
DWORD Translate (char * pSrc, char * pDest, int nsclen, int nDestLen );
At this time, we may need this method:
CString strDest;
Int nDestLen = 100;
DWORD dwRet = Translate (_ strSrc. GetBuffer (_ strSrc. GetLength ()),
StrDest. GetBuffer (nDestLen ),
_ StrSrc. GetLength (), nDestlen );
_ StrSrc. ReleaseBuffer ();
StrDest. ReleaseBuffer ();
If (SUCCESSCALL (dwRet ))
{
}
If (FAILEDCALL (dwRet ))
{
}
Indeed, this situation exists. However, we recommend that you do not use this method as much as possible. If you do need it, do not use a special pointer to save the value returned by GetBuffer, this often makes us forget to call ReleaseBuffer. Just like the above code, we can call ReleaseBuffer immediately after calling GetBuffer to adjust the CString object.
2. LPCTSTR
The error about the LPCTSTR often occurs for beginners.
For example
DWORD Translate (char * pSrc, char * pDest, int nsclen, int nDestLen );
Beginners often use the following methods:
Int nLen = _ strSrc. GetLength ();
DWORD dwRet = Translate (char *) (LPCTSTR) _ strSrc ),
(Char *) (LPCTSTR) _ strSrc ),
NLen,
NLen );
If (SUCCESSCALL (dwRet ))
{
}
If (FAILEDCALL (dwRet ))
{
}
His original intention was to put the converted string in _ strSrc. However, when _ strSrc is used after the Translate is called, it is found that _ strSrc is not working properly. Check the Code but you cannot find the problem.
In fact, this problem is the same as the first one. The CString class has already overloaded the LPCTST. In CString, The LPCTST is actually an operation. The call to the LPCTST is actually similar to GetBuffer, and the first address of the string buffer in the CStringData object is directly returned.
Its C ++ code implementation is:
_ AFX_INLINE CString: operator LPCTSTR () const
{Return m_pchData ;}
Therefore, call ReleaseBuffer () after use ().
But who can see this?
In fact, the essence of this problem lies in the type conversion. The Return Value of the LPCTSTR is a const char * type. Therefore, you cannot use this pointer to call the Translate compilation. For a beginner or a person with a long programming experience, the const char * is converted to char * by force type conversion *. In the end, CString is not working properly, and this can easily cause buffer overflow.
Through the above descriptions of the CString mechanism and some easy-to-use errors, we can better use CString.