Copy-on-write Technology _c Language for standard C + + class string

Source: Internet
Author: User
Tags first string strlen

Standard C + + class std::string memory sharing and Copy-on-write technology Chenhao

1. Concept

Scott Meyers in "more effective C + +" an example, I wonder if you still remember? When you were still at school, your parents asked you not to watch TV, but to review your lessons, so you lock yourself in your room and make a review of your lessons, but you're doing something else like writing a love letter to a girl in your class, and once your parents come out in your room to check if you're reviewing, You really picked up your textbook and read it. This is "delaying tactics" until you have to do it.

Of course, this kind of thing often happens in real life, but it becomes the most useful technology in the programming world, just as it is possible to declare variables everywhere in C + +, Scott Meyers recommends that we declare variables (allocating memory) when we really need a storage space. This will get the minimum memory cost of the program at run time. Execution to that will be done to allocate memory this time consuming work, this will give our program to run a better performance. Must have, 20% of the program runs 80% of the time.

Of course, delaying tactics is not just such a kind of technology that is widely used by us, especially in the operating system, when a program is finished, the operating system is not anxious to clear out the memory, because it is possible that the program will run again immediately (from disk to load the program into memory is a very slow process), And only when the memory is not enough, will these also reside in the memory of the program to clear out.

Write-only copy (Copy-On-Write) technology, is the programming community "lazy behavior"-delaying tactics of the product. For example, we have a program to write files, constantly based on the data from the network to write, if each fwrite or fprintf to do a disk I/O operations, is simply a huge performance loss, so the usual practice is, Each write-file operation is written in a particular sized piece of memory (disk cache) and is written to disk only when we close the file (which is why if the file is not closed, what is written will be lost). What's more, when the file is closed without writing a disk, and wait until the shutdown or memory is not enough to write the disk, UNIX is such a system, if the abnormal exit, then the data will be lost, the file will be damaged.

Oh, in order to performance we need to take such a big risk, fortunately our program is not too busy to forget that there is a piece of data need to write to disk, so this practice, or very necessary.

2, the standard C + + class std::string Copy-on-write

The string class in the STL Standard Template Library that we often use is also a class that has a write-only copy technique. C + + has been widely questioned and blamed on performance issues, and in order to improve performance, many of the classes in STL adopt Copy-on-write technology. This lazy behavior does make the use of STL programs have a relatively gaoyao performance.

Here, I would like to from the C + + class or design mode for you to uncover the Copy-on-write technology in string implementation of the veil, for you to use C + + for class library design to do a little reference.

Before I tell you about this technique, I'd like to briefly explain the concept of string class memory allocation. Often, the string class must have a private member, a char*, where the user records the address of the memory allocated from the heap, allocates memory at construction time, and frees up memory at the time of the destructor. Because the memory is allocated from the heap, the string class is extremely cautious in maintaining this block of memory, and the string class returns only the Const char*, which is read-only, if you want to write, you can only overwrite the data with the method provided by string.

2.1, Characteristics

Youbiaojili, from perceptual to rational, we first look at the surface characteristics of the copy-on-write of the string class. Let's write down the following procedure:

Copy Code code as follows:

#include
#include
using namespace Std;

Main ()
{
String str1 = "Hello World";
String str2 = str1;

printf ("Sharing the memory:/n");
printf ("/tstr1 ' s Address:%x/n", Str1.c_str ());
printf ("/tstr2 ' s Address:%x/n", Str2.c_str ());

str1[1]= ' Q ';
Str2[1]= ' W ';

printf ("After copy-on-write:/n");
printf ("/tstr1 ' s Address:%x/n", Str1.c_str ());
printf ("/tstr2 ' s Address:%x/n", Str2.c_str ());

return 0;
}


The intent of this program is to have the second string constructed from the first string, then print out the memory address where the data resides, and then modify the contents of the str1 and str2, and then check the address where the memory resides. The output of the program is like this (I got the same result in VC6.0 and g++ 2.95):
Copy Code code as follows:

> g++-o stringtest stringTest.cpp
>/stringtest
Sharing the Memory:
Str1 ' s ADDRESS:343BE9
STR2 ' s ADDRESS:343BE9
After Copy-On-Write:
Str1 ' s ADDRESS:3407A9
STR2 ' s ADDRESS:343BE9


From the results we can see that after the first two statements, STR1 and str2 the same address for the data, and after modifying the content, STR1 address changed, and STR2 address is original. From this example, we can see the copy-on-write technique of the string class.

2.2, in-depth

Before you go into this, through the above demo, we should know that in the string class, to achieve write-only copy, need to solve two problems, one is memory sharing, one is Copy-on-wirte, these two themes will let us have a lot of questions, or let us take such a few questions to learn it:
1, what is the principle of copy-on-write?
2. Under what circumstances will the string class share memory?
3. Under what circumstances does a string class trigger a write Copy (Copy-On-Write)?
4, Copy-on-write, what happened?
5, what is the specific implementation of Copy-on-write?

Oh, you said just look at the STL stirng source code You can find the answer. Of course, of course, I also refer to the string of the parent template class basic_string source code. However, if you feel that look at the STL source code is like looking at the machine code, and seriously hit your confidence in C + +, or even produced whether you understand the question of C + +, if you have such a feeling, then continue to look down my article.

OK, let's discuss one question at a time, and all the technical details will come to the surface slowly.

2.3, what is the principle of copy-on-write?

Programmers with some experience must know that Copy-on-write must have used "reference count", yes, there must be a variable similar to refcnt. When the first class is constructed, the string's constructor allocates memory from the heap based on the parameters passed in. When there are other classes that need this memory, the count is automatically cumulative, and when there is class destructor, the count is reduced by one until the last class destructor, at which point the refcnt is 1 or 0, The program will actually free this chunk of memory allocated from the heap.

Yes, the reference count is the principle of copying in the string class when it is written!

But the question is, where does this refcnt exist? If you are in a string class, each instance of a string has its own set, and you cannot have one refcnt at all, if you declare it as a global variable, or a static member, all of the string classes share one, and that's not all we need is a "democracy and concentration" A solution for this. How is this done? Oh, life is a confused after the discovery, know and confused after the cycle of the process. Don't worry, I'll give you one by one in the back.

Under what circumstances does the 2.3.1 and string class share memory?

The answer to this question should be obvious, according to common sense and logic, if a class uses the data of another class, it can share the memory of the class being used. This is very reasonable, if you do not need my, then do not share, only you use my, only to share.

When you use data from another class, there are only two cases, 1 constructs yourself in another class, and 2 assigns a value to another class. The first case triggers the copy constructor, and the second triggers the assignment operator. In both cases, we can implement their corresponding methods in the class. In the first case, you just have to do a little processing in the copy constructor of the string class to accumulate the reference count; Similarly, for the second case, you only need to overload the assignment operator of the string class and add a bit of processing to it.


To nag a few words:

1) The difference between construction and assignment

For these two sentences in the previous routine:
String str1 = "Hello World";
String str2 = str1;

Do not think that there is "=" is the assignment operation, in fact, these two statements are equivalent to:

String str1 ("Hello World"); The constructor is called
String str2 (STR1); The copy constructor is called

If the str2 is the case below:

String str2; The call parameter defaults to the constructor of an empty string: String str2 ("");
STR2 = str1; Call str2 Assignment operation: str2.operator= (STR1);

2) a different situation
Char tmp[]= "Hello World";
string str1 = tmp;
String str2 = tmp;
Does this trigger the sharing of memory? Taken for granted, should be shared. However, according to the shared memory situation we mentioned earlier, the declarations and initial statements of two string classes do not conform to the two cases I mentioned, so they do not occur in memory sharing. Also, the existing features of C + + do not allow us to share the memory of classes in this situation.



2.3.2, String class under what circumstances does it trigger a write-only copy (Copy-On-Write)?

Oh, when will you find the copy when you write? Obviously, copy-on-write occurs when the content of a class that shares the same block of memory changes. For example, the String class [], =, + =, +, operator assignment, and some string classes such as INSERT, replace, append, and other member functions, including the destructor of the class.

Modify the data will trigger the Copy-on-write, no modification of course will not be changed. This is the true meaning of Toyen tactics, and not to do it until it is done.

2.3.3, Copy-on-write, what happened?

Depending on the access count, we may decide whether to need a copy, see the following code:

Copy Code code as follows:

If (refcnt>0) {
char* tmp = (char*) malloc (strlen (_ptr) +1);
strcpy (TMP, _PTR);
_PTR = tmp;
}


The above code is an imaginary copy method, and if there are other classes in the reference (check the reference count to know) the memory, then you need to change the class to "copy" this action.

We can encapsulate the operation of this copy into a function that is used by the member functions that change the content.

What is the concrete realization of 2.3.4 and copy-on-write?

Last of all, we mainly solve the problem of "centralization of democracy". Please look at the following code first:

Copy Code code as follows:

string h1 = "Hello";
String h2= H1;
String H3;
H3 = H2;

string w1 = "World";
String W2 ("");
W2=W1;


Obviously, we want to let H1, H2, H3 share the same block of memory, so that W1, W2 share the same block of memory. Because, in H1, H2, H3, we want to maintain a reference count, in W1, W2 we have to maintain a reference count.

How do you use an ingenious method to generate these two reference counts? We thought that the memory of the string class was dynamically allocated on the heap, and since each class of shared memory points to the same memory area, why don't we allocate a little more space on this block to hold the reference count? As a result, all classes that share a single memory area have the same reference count, and since the address of the variable is on the shared area, all classes that share that memory can be accessed, and the number of references to that memory will be known.

Take a look at the picture below:



So, with this mechanism, whenever we allocate memory for a string, we always have to allocate more space to hold the value of the reference count, and the value of the memory is incremented as long as the copy construct is assigned. When content is modified, the string class to see if the reference count is 0, if not zero, means that someone is sharing the memory, then they need to make a copy, and then subtract the reference count by one, and then copy the data over. The following several pieces of the program illustrate these two actions:

Copy Code code as follows:

Constructor (split memory)
string::string (const char* TMP)
{
_len = strlen (TMP);
_ptr = new Char[_len+1+1];
strcpy (_PTR, TMP);
_ptr[_len+1]=0; Set reference count
}

Copy construction (Shared memory)
string::string (const string& STR)
{
if (*this!= str) {
This->_ptr = Str.c_str (); Shared memory
This->_len = Str.szie ();
THIS->_PTR[_LEN+1] + +; Reference count plus One
}
}

Copy Copy-on-write when writing
char& string::operator[] (unsigned int idx)
{
if (idx > _len | | _ptr = 0) {
static char Nullchar = 0;
return Nullchar;
}

_ptr[_len+1]--; Reference count minus One
char* tmp = new char[_len+1+1];
strncpy (TMP, _PTR, _len+1);
_PTR = tmp;
_ptr[_len+1]=0; Set reference count for new shared memory

return _PTR[IDX];
}

Some processing of destructor function
~string ()
{
_ptr[_len+1]--; Reference count minus One

The reference count is 0 o'clock, freeing memory
if (_ptr[_len+1]==0) {
Delete[] _ptr;
}
}


Haha, the entire technical details surfaced completely.

However, this and STL basic_string implementation details are a little bit different, when you open the STL source, you will find that the reference count is through such access: _ptr[-1], the standard library, Allocate the memory of this reference count to the front (the code I give you is assigning the reference count to the back, this is a bad idea, and the advantage of allocating it is that when the length of a string expands, it only needs to extend its memory later, without needing to move the memory storage location of the reference count, which saves a little time.

The memory structure of a string in the STL like the one I pictured earlier, _ptr is pointing to the data area, and the refcnt is at _ptr-1 or _ptr[-1].


2.4. Bug bugs

Who said, "Where there is the sun there will be darkness"? Perhaps many of us are superstitious about standard things, that they are time-tested and impossible to make mistakes. Oh, don't have this superstition, because any design is good, code is good in a certain case will have bug,stl the same, the string of this shared memory/write-time copy technology is no exception, and this bug may also let your entire program crash off!

Don't believe it?! So let's look at a test case:

Suppose that there is a dynamic link library (called MyNet.dll or mynet.so) in which a function returns a String class:

Copy Code code as follows:

String getipaddress (string hostname)
{
static string IP;
......
......
return IP;
}


And your main program dynamically loads the dynamic link library and invokes the function in it:

Copy Code code as follows:

Main ()
{
Load a function in a dynamic-link library
hDLL = Loadlibraray (...);
Pfun = GetModule (hDLL, "getipaddress");

Calling functions in a dynamic-link library
String ip = (*pfun) ("host1");
......
......
Releasing a dynamic link library
FreeLibrary (hDLL);
......
cout << IP << endl;
}


Let's take a look at this code, where the program loads the functions in the dynamic-link library dynamically, then invokes the function in the dynamic-link library as a function pointer, puts the return value in a String class, and then frees up the dynamic link library. When released, enter the contents of the IP.

According to the definition of the function, we know that the function is "value return", so when the function returns, it will call the copy constructor, and according to the memory sharing mechanism of the string class, The variable IP in the main program is shared memory with the static string variable inside the function (this memory area is in the address space of the dynamic link library). And we assume that the IP value has not been modified throughout the main program. Then the shared memory area is released when the main program releases the dynamic link library. Therefore, the future access to IP, the inevitable result of the memory address access is illegal, resulting in program crash. Even if you do not use the IP variable later, a memory access exception occurs when the main program exits, because the IP is destructor when the program exits, and memory access exceptions occur during the destructor.

Memory access exception, which means two things: 1 no matter how beautiful your program is, it will be tarnished by the error, and your reputation will be lost because of this error. 2 for some time in the future, you will be suffering from this system-level error (in C + + world, it is not easy to find and eliminate this memory error). This is the sink, Shan, and the pain of the heart of a C + + programmer forever. And if you don't know the character of the string, it's a nightmare to find such a memory exception in thousands of lines of code.

Note: There are a number of ways to correct these bugs, and here is a reference:
String ip = (*pfun) ("Host1"). CStr ();

3, PostScript

The article should also end here, this article mainly has the following several purposes:

1) to introduce you to write when the copy/memory sharing technology.
(2) Take the String class in STL as an example to introduce a design pattern.
3 in the C + + world, no matter how sophisticated your design, the code is stable, it is difficult to take care of all the situation. The smart pointer is a typical example of a very serious bug, no matter how you design it.
4 C + + is a double-edged sword, only understand the principle, you can better use of C + +. Otherwise, will be burned. If you are designing and using the class library there is a "play C + + like playing with fire, must be careful" feeling, then you get started, and so you can put the "fire" control of the handy, that is the finished.

Finally, or use this after the second, introduce yourself. I am currently engaged in all UNIX platform software research and development, mainly to do system-level product software development, the next generation of computer revolution-Grid computing is very interested in the distributed computing, Peer-to-peer, Web Service, Java technology Direction is also very interested in, in addition, for project implementation, Team management, project management is also a small experience, I hope that the same fight with me in the "technology and management" on the front of the younger generation, can communicate with me a lot. My msn and Mail is: haoel@hotmail.com.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.