Cstring working principle and error-prone

Last Update:2018-12-04 Source: Internet

Author: User

Tags strtok

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The working principle of cstring (zz) has read many programs, including some code I have written, A large part of bugs are found in the incorrect usage of cstring In the MFC class. the main cause of this error is that you do not know much about the cstring implementation mechanism.

Cstring is a packaging of the string type in the original Standard C. Because, after a long period of programming, we found that many program bugs are related to strings, such as buffer overflow and Memory leakage. These bugs are fatal and cause system paralysis. Therefore, C ++ specifically creates a class to maintain the string pointer. The string class in the Standard C ++ is string, and the cstring class is used in the Microsoft MFC class library. Using the string class, you can avoid the problems about string pointers in C.

Here we will briefly look at how cstring is implemented in Microsoft MFC. Of course, it is best to analyze the Code directly based on the principle. Most of the implementations of cstring classes in MFC are in strcore. cpp.

Cstring is a buffer for storing strings and encapsulation of operations applied to this string. That is to say, cstring requires a buffer for storing strings, and a pointer pointing to the buffer, Which is lptstr m_pchdata. However, some string operations increase or decrease the length of the string. Therefore, in order to reduce frequent memory requests or release the memory, cstring first applies for a large memory block to store the string. In this way, when the length of a string increases in the future, if the total length of the added string does not exceed the length of the pre-applied memory block, you do not need to apply for memory. When the length of the added string exceeds the pre-applied memory, cstring first releases the original memory and then re-applies for a larger memory block. Similarly, when the string length is reduced, the extra memory space is not released. Instead, the excess memory will be released at a time when the accumulation reaches a certain level.

In addition, when a cstring object A is used to initialize another cstring object B, to save space, the new object B does not allocate space, all it has to do is direct its pointer to the memory space of object a. Only when the string in object a or object B needs to be modified will it apply for memory space for the new object B, this is called copybeforewrite ).

In this way, only one pointer can not fully describe the specific situation of the memory, and more information is required to describe.

First, you need a variable to describe the total size of the current memory block.
Second, a variable is required to describe the current memory block usage. That is, the length of the current string.
In addition, a variable is required to describe the memory block referenced by other cstrings. If an object references the memory block, the value is added.

Cstring defines a struct to describe the information:
Struct cstringdata
{
Long nrefs; // reference count
Int ndatalength; // length of data (including Terminator)
Int nalloclength; // length of allocation
// Tchar data [nalloclength]

Tchar * Data () // tchar * to managed data
{Return (tchar *) (This + 1 );}
};

In actual use, the memory block size occupied by the struct is not fixed. The struct is placed in the memory block header of the cstring. The sizeof (cstringdata) byte starting from the memory block header is the real memory space used to store strings. The Application Method for the data structure of this structure is as follows:
Pdata = (cstringdata *) New byte [sizeof (cstringdata) + (nlen + 1) * sizeof (tchar)];
Pdata-> nalloclength = nlen;
Nlen is used to describe the size of the memory space to be applied at one time.

It can be easily seen from the code that if you want to apply for a memory block of 256 tchar for storing strings, the actual applied size is: sizeof (cstringdata) bytes + (nlen + 1) tchar

The preceding sizeof (cstringdata) bytes are used to store cstringdata information. The subsequent nlen + 1 tchar is actually used to store strings, and the extra one is used to store '/0 '.

All operations in cstring are for this buffer zone. For example, lptstr cstring: getbuffer (INT nminbuflength), the implementation method is:
First, the cstringdata Object Pointer is obtained through cstring: getdata. This pointer stores the string pointer m_pchdata offset sizeof (cstringdata) to obtain the cstringdata address.
Then, a cstringdata object is re-instantiated based on the value specified by the nminbuflength parameter, so that the string buffer length in the new object can satisfy the nminbuflength.
Then, reset some description values in the new cstringdata.
Finally, the string buffer in the new cstringdata object is directly returned to the caller.

These processes are described in C ++ code:
If (getdata ()-> nrefs> 1 | nminbuflength> getdata ()-> nalloclength)
{
// We have to grow the buffer
Cstringdata * polddata = getdata ();
Int noldlen = getdata ()-> ndatalength; // allocbuffer will Tromp it
If (nminbuflength <noldlen)
Nminbuflength = noldlen;
Allocbuffer (nminbuflength );
Memcpy (m_pchdata, polddata-> data (), (noldlen + 1) * sizeof (tchar ));
Getdata ()-> ndatalength = noldlen;
Cstring: release (polddata );
}
Assert (getdata ()-> nrefs <= 1 );

// Return a pointer to the character storage for this string
Assert (m_pchdata! = NULL );
Return m_pchdata;

Many times, we often copy and modify large batches of strings. cstring uses the copybeforewrite technology. In this way, when one cstring object A is used to instantiate another object B, the values of the two objects are actually the same, but if you simply apply for memory for both objects, there is nothing for strings with only a few or dozens of bytes. It is a great waste for a few K or even several M bytes of data.
Therefore, cstring simply points the string address m_pchdata of the new object B to the string address m_pchdata of another object. The additional work is to add cstringdata: nrefs to the memory application of object.
Cstring: cstring (const cstring & stringsrc)
{
M_pchdata = stringsrc. m_pchdata;
Interlockedincrement (& getdata ()-> nrefs );
}

In this way, when modifying the string content of object a or object B, first check the value of cstringdata: nrefs. If the value is greater than one (equal to one, only one application of the memory space is available ), this indicates that the object references the memory of another object or its memory is applied by someone else. The object first drops the application value by one and then delivers the memory to other objects for management, apply for a new memory and copy the original memory.

The simple code is as follows:
Void cstring: copybeforewrite ()
{
If (getdata ()-> nrefs> 1)
{
Cstringdata * pdata = getdata ();
Release ();
Allocbuffer (pdata-> ndatalength );
Memcpy (m_pchdata, pdata-> data (),
(Pdata-> ndatalength + 1) * sizeof (tchar ));
}
}
Release is used to determine the memory to be referenced.
Void cstring: release ()
{
If (getdata ()! = _ Afxdatanil)
{
If (interlockeddecrement (& getdata ()-> nrefs) <= 0)
Freedata (getdata ());
}
}

When multiple objects share the same memory, this memory belongs to multiple objects, rather than the object originally applied for this memory. However, at the end of its lifecycle, each object first deletes the reference of the memory and then judges the reference value. If it is less than or equal to zero, it is released. Otherwise, hand it over to another object control that is referencing this memory.

Using this data structure, cstring operations on large data volumes can save a lot of time to frequently apply for memory release and improve system performance.

Through the above analysis, we have a general understanding of the internal mechanism of cstring. In general, the cstring in MFC is relatively successful. However, due to the complicated data structure (using cstringdata), many problems occur during use, the most typical one is to describe that the attribute value of the memory block is inconsistent with the actual value. The reason for this problem is that cstring provides operations to facilitate some applications. These operations can directly return the string address values in the memory block, you can modify the address pointed to by this address value. However, after the modification, operations1 is not called to make the value in cstringdata consistent. For example, you can first get the string address through operations, and then add some new characters to the string to increase the length of the string. However, because it is directly modified by the pointer, therefore, the ndatalength in the cstringdata that describes the length of the string is still the original length. Therefore, when the length of the string is obtained through getlength, the returned value must be incorrect.

Operations with these problems are described below.

The following are examples of errors to help you better understand them.
1

Cstring strtest = "123 ";
Char * P = strtest. getbuffer (0 );
Int I = atoi (P );
Strtest. releasebuffer ();

Of course there is no error in this usage, but I don't think the getbuffer/releasebuffer here is necessary. Why? Because
The parameter of int _ cdecl atoi (const char *) is const char *, and the internal data of cstring will not be modified.
Therefore, the above code can be directly written

Cstring strtest = "123 ";
Int I = atoi (lpctstr) strtest );

By the way, the getbuffer parameter is a constant such as getbuffer (5) getbuffer (10) In many examples on the Internet. In reality, the program cannot be so easy to know in advance, so there is strtest. getbuffer (strtest. getlength. actually, getbuffer (0) is enough. the source code of getbuffer can be used for verification.

Cstring strtest = "123 45 ";

// Some other code
Cstring strtest2 = strtest;
Char SEPs [] = "";
Char * ptoken = 0;
// Char * pstr = strtest2.getbuffer (0 );
Ptoken = strtok (char *) (lpctstr) strtest2, SEPs );
// Ptoken = strtok (pstr, SEPs );
While (ptoken)
Ptoken = strtok (null, SEPs );
// Strtest2.releasebuffer (0 );

Run the code above and we can see that the value of strtest has also changed. This is the origin of some strange problems related to cstring in the program. if the getbuffer/releasebuffer method in the comment is used, there is no problem at all.
Similarly, for the releasebuffer parameter, the default value is-1, but I do not recommend. because-1 indicates that the new length is determined by the current 00 Terminator position. in the above example, strtok will reset the 00 Terminator. Therefore, the safe method is to set the length of the cstring to 0, releasebuffer (0 ), its content has changed, and no one has to use it.
Note: The getbuffer/releasebuffer method can only ensure that strtest remains unchanged, and strtest2 still changes. therefore, for a member variable, for example, m_strtest2 calling releasebuffer requires more attention, so there is no need to think so much about local variables.
So how do I realize that the program was wrong from the very beginning? In the code above (char *) (lpctstr), it is very dangerous to remove Const. Otherwise, strtok cannot be compiled, and the importance of const is also illustrated from one aspect.

This article from: Take the Wind Original Program (http://www.qqcf.com) detailed source reference: http://study.qqcf.com/web/522/97976.htm

And http://www.cppblog.com/flyingxu/archive/2006/03/21/4430.aspx

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More