STD: String Memory Sharing and copy-on-write Technology

Source: Internet
Author: User

1. Concepts

 

Scott Meyers gave an example in more effective C ++. Do you still remember this? When you are still at school, your parents want you not to watch TV, but to review your lessons, so you keep yourself in the room and make a look at your homework, in fact, you are doing other things, such as writing love letters to a girl in the class. Once your parents come out and check whether you are reviewing in your room, you can really pick up textbooks and read books. This is the "Procrastination tactic", which is not done until you have to do it.

 

Of course, this kind of thing often happens in real life, but it has become the most useful technology in the programming world, just as variables can be declared everywhere in C ++, Scott Meyers recommends that you declare variables (allocate memory) only when you really need a bucket ), in this way, the minimum memory consumption of the program during running is obtained. The time-consuming work such as allocating memory will be done only after execution, which will give our program better performance during running. 20% of the programs run for 80% of the time.

 

Of course, the procrastination tactics are not just such a type. This technology is widely used, especially in the operating system. When a program is running, the operating system does not rush to clear the memory, because it is possible that the program will run again immediately (loading the program into the memory from the disk is a very slow process ), only when the memory is insufficient can these programs that still reside in the memory be cleared out.

 

The copy-on-write technology is the product of the "lazy behavior" in the programming world-delaying tactics. For example, if we have a program that needs to write files and constantly write data from the network, if each fwrite or fprintf requires a disk I/O operation, this is simply a huge performance loss. Therefore, the common practice is to write each file write operation in a memory (disk cache) of a specific size. Only when we close the file, write to the disk (this is why writing is lost if the file is not closed ). What's more, when files are closed, they are not written to the disk, but they are written to the disk until they are shut down or the memory is insufficient. UNIX is such a system. If the system unexpectedly exits, data will be lost, the file is damaged.

 

Well, we need to take such a big risk for the sake of performance. Fortunately, our program won't be too busy to forget that another piece of data needs to be written to the disk. Therefore, this practice is still necessary.

 

 

2. Standard C ++ STD: String Copy-on-write

 

The string class in the STL standard template library that we often use is also a class with the copy technology when writing. C ++ has been widely questioned and accused of performance issues. To improve performance, many classes in STL adopt the copy-on-write technology. This kind of lazy behavior does make STL programs have relatively high performance.

 

Here, I want to unveil the implementation of copy-on-write technology in string from the perspective of C ++ classes or design patterns, for your reference when using C ++ for class library design.

 

Before talking about this technology, I would like to briefly describe the concept of memory allocation for the string class. Normally, there must be a private member in the string class, which is a char *. The user records the address of the memory allocated from the stack. The user records the memory allocated during the construction and releases the memory during the analysis. Because the memory is allocated from the heap, the string class is very careful in maintaining this memory. When the string class returns this memory address, only the const is returned.
Char *, that is, read-only. If you want to write, you can only rewrite the data using the method provided by string.

 

2.1. Features

 

From the table to the inside, from the sensibility to the rationality, let's take a look at the copy-on-write surface features of the string class. Let's write down the following program:

 


# Include

# Include

Using namespace STD;

 

Main ()

{

String str1 = "Hello World ";

String str2 = str1;

Printf ("sharing the memory:/N ");

Printf ("/tstr1's address: % x/N", str1.c _ STR ());

Printf ("/tstr2's address: % x/N", str2.c _ STR ());

Str1 [1] = 'q ';

Str2 [1] = 'W ';

 

Printf ("after copy-on-write:/N ");

Printf ("/tstr1's address: % x/N", str1.c _ STR ());

Printf ("/tstr2's address: % x/N", str2.c _ STR ());

 

Return 0;

}

 

The intention of this program is to let the second string be constructed through the first string, then print out the memory address of the data, and then modify the content of str1 and str2 respectively, check the memory address. The program output is as follows (I got the same result in vc6.0 and G ++ 2.95 ):

 


> G ++-O stringtest. cpp

>./Stringtest

Sharing the memory:

Str1's address: 343be9

Str2's address: 343be9

After copy-on-write:

Str1's address: 3407a9

Str2's address: 343be9

 

From the results, we can see that after the first two statements, the addresses of str1 and str2 storing data are the same, and after the modified content, the address of str1 has changed, the str2 address is still the original one. In this example, we can see the copy-on-write technology of the string class.

2.2. In-depth

Before proceeding to this step, we should know that in the string class, to copy data only during write, we need to solve two problems: Memory sharing, one is copy-on-wirte. These two topics will give us a lot of questions. Let's take these questions to learn:

1. What is the principle of copy-on-write?

2. Under what circumstances does the string class share the memory?

3. Under what circumstances does the string class copy (copy-on-write) when writing is triggered )?

4. What happened during copy-on-write?

5. What is the specific implementation of copy-on-write?

 

Well, you just need to look at the stirng source code in STL and you can find the answer. Of course, I also referred to the source code of the string parent template class basic_string. However, if you feel that the source code of STL is like a machine code, and you have a serious blow to your self-confidence in C ++ and even have questions about whether you understand C ++, if you feel this way, continue to read my article.

 

Okay. Let's discuss one problem one by one. All the technical details will gradually emerge.

 

2.3. What is the principle of copy-on-write?

 

Programmers with some experience must know that copy-on-write must use "reference count". Yes, there must be a variable similar to refcnt. When the first class is constructed, the string constructor allocates memory from the stack based on the input parameters. When other classes need this memory, this count is automatically accumulated, when there is a class structure, this count will be reduced by one until the last class structure, at this time the refcnt is 1 or 0, at this time, the program will actually free the memory allocated from the stack.

 

Yes, reference counting is the principle that is copied only when writing in the string class!

 

However, the problem arises again. Where should the refcnt exist? If it is stored in the string class, each string instance has its own set. There cannot be a refcnt in total. If it is declared as a global variable or a static member, that is to say, all the string classes share one. This is not enough. We need a solution that is "democratic and centralized. How is this done? Hehe, life is a process of exploring and knowing the future and the future. Don't worry, don't worry. I will give it to you later.

 

2.3.1. Under what circumstances does the string class share the memory?

 

The answer to this question should be obvious. According to common sense and logic, if a class uses data of another class, it can share the memory of the class being used. This is quite reasonable. If you don't need me, you don't need to share it. Only you can use mine to share it.

 

When using data of other classes, there are only two cases: 1) construct itself with other classes, 2) assign values with other classes. In the first case, the copy constructor is triggered, and in the second case, the value assignment operator is triggered. In both cases, we can implement the corresponding method in the class. In the first case, you only need to perform point processing in the copy constructor of the string class to accumulate the reference count. Similarly, in the second case, you only need to overload the value assignment operator of the string class and add a bit of processing to it.

 

 

Nagging:

 

1) differences between construction and assignment

For the two sentences in the previous routine:

String str1 = "Hello World ";

String str2 = str1;

Do not think that "=" is a value assignment operation. In fact, these two statements are equivalent:

String str1 ("Hello World"); // The constructor is called.

String str2 (str1); // call the copy constructor.

 

If str2 is like this:

String str2; // The default call parameter is an empty string constrfunction: String str2 ("");

Str2 = str1; // call str2's value assignment operation: str2.operator = (str1 );

 

2) another situation

Char TMP [] = "Hello world ";

String str1 = TMP;

String str2 = TMP;

In this case, will Memory sharing be triggered? Share it. However, according to the shared memory we mentioned earlier, the declaration of the two string classes and the initial statement do not conform to the two situations mentioned above, so Memory Sharing does not occur. In addition, the existing features of C ++ cannot allow us to share class memory in this case.

 

 

2.3.2. Under what circumstances does the string class copy (copy-on-write) when writing is triggered )?

 

Oh, when will I find that I only copy data when I write it? Obviously, copy-on-write occurs only when the content of classes that share the same memory changes. For example, the [], =, + =, +, operator assignment of the string class, and some member functions such as insert, replace, and append in the string class, including the class destructor.

 

Copy-on-write is triggered only when data is modified. This is the true meaning of the extended tactics.

 

2.3.3. What happened during copy-on-write?

We may decide whether to copy data based on the access count. refer to the following code:

 

If (refcnt> 0 ){

Char * TMP = (char *) malloc (strlen (_ PTR) + 1 );

Strcpy (TMP, _ PTR );

_ PTR = TMP;

}

 

The above code is a hypothetical COPY method. If there are other classes in the reference (check the reference count to learn) memory, you need to "copy" the modified class.

 

We can encapsulate the copy operation into a function for the member functions that change the content.

2.3.4. What is the specific implementation of copy-on-write?

Finally, we mainly solve the "democratic concentration" problem. Take a look at the following code:

 

String H1 = "hello ";

String H2 = h1;

String H3;

H3 = h2;

 

String W1 = "world ";

String W2 ("");

W2 = W1;

 

Obviously, we want H1, H2, and H3 to share the same memory and W1 and W2 to share the same memory. In H1, H2, and H3, we need to maintain a reference count, and in W1 and W2, we need to maintain a reference count.

 

How can we use a clever method to generate the two reference counts? We thought that the memory of the string class is dynamically allocated on the heap. Since the classes of the shared memory point to the same memory zone, why don't we allocate more space in this area to store this reference count? In this way, all classes that share a memory zone have the same reference count, and the address of this variable is in the shared zone, then all the classes that share this memory can be accessed, so that you can know the number of referers of this memory.

 

See:

 

Therefore, with such a mechanism, every time we allocate memory for the string, we always need to allocate another space to store the reference count value, as long as the copy structure occurs but the value is assigned, the memory value will be added. When the content is modified, the string class checks whether the reference count is 0. If it is not zero, it indicates that someone is sharing the memory, so you need to make a copy first, then, subtract one from the reference count and copy the data. The following program snippets illustrate these two actions:

 

 

// Constructor (memory splitting)

String: string (const char * TMP)

{

_ Len = strlen (TMP );

_ PTR = new char [_ Len + 1 + 1];

Strcpy (_ PTR, TMP );

_ PTR [_ Len + 1] = 0; // sets the reference count.

}

 

// Copy structure (shared memory)

String: string (const string & Str)

{

If (* This! = Str ){

This-> _ PTR = Str. c_str (); // shared memory

This-> _ Len = Str. szie ();

This-> _ PTR [_ Len + 1] ++; // reference count plus one

}

}

 

// Copy-on-write only when writing

Char & string: operator [] (unsigned int idx)

{

If (idx> _ Len | _ PTR = 0 ){

Static char nullchar = 0;

Return nullchar;

}

_ PTR [_ Len + 1] --; // The reference count minus one.

Char * TMP = new char [_ Len + 1 + 1];

Strncpy (TMP, _ PTR, _ Len + 1 );

_ PTR = TMP;

_ PTR [_ Len + 1] = 0; // sets the reference count for the new shared memory.

Return _ PTR [idx];

}

 

// Process the destructor

~ String ()

{

_ PTR [_ Len + 1] --; // The reference count minus one.


// When the reference count is 0, the memory is released

If (_ PTR [_ Len + 1] = 0 ){

Delete [] _ PTR;
}
 

}

 

Haha, the entire technical details have completely surfaced.

 

However, this is a little different from the Implementation Details of basic_string in STL. When you open the source code of STL, you will find that the reference count is accessed through the following method: _ PTR [-1]: In the standard library, the memory of the reference count is allocated in front of it (the code I gave is to allocate the reference count following it, this is not good). The advantage of allocation is that when the length of a string is extended, you only need to expand its memory later, without moving the memory storage location of the reference count, this saves a little time.

 

The memory structure of string in STL is like the figure I drew above. _ PTR points to the data zone, while refcnt points to the _ Ptr-1
Or _ PTR [-1.

 

 

2.4 bug

 

Who said, "Where there is a sun, there will be darkness "? Perhaps many of us are superstitious about standard things and think that they have been tested for a long time and cannot make mistakes. Well, do not be superstitious, because any well-designed code or code may have bugs in a specific situation. The same is true for STL, the shared memory/Write-only copy technology of the string class is no exception, and this bug may cause your entire program to crash!

 

Believe it ?! Let's look at a test case:

 

Assume that there is a dynamic link library (mynet. dll or mynet. So) that such a function returns the string class:

 

String getipaddress (string hostname)

{

Static string IP;

......

......

Return IP;

}

 

 

While your main program dynamically loads this dynamic link library and calls this function:

 

 

Main ()

{

// Load functions in the dynamic link library

Hdll = loadlibraray (.....);

Pfun = getmodule (hdll, "getipaddress ");

 

// Call functions in the dynamic link library

String IP = (* pfun) ("host1 ");

......

......

// Release the dynamic link library

Freelibrary (hdll );

......

Cout <IP <Endl;

}

 

 

 

Let's take a look at this Code. The program loads the functions in the dynamic link library dynamically, calls the functions in the dynamic link library using the function pointer, and places the returned values in a string class, then the dynamic link library is released. After the IP address is released, enter the content of the IP address.

 

According to the definition of the function, we know that the function is "Return Value". Therefore, when the function returns, it will certainly call the copy constructor, and according to the Memory Sharing Mechanism of the string class, in the main program, the variable IP address shares the memory with the static string variable in the function (this memory area is in the address space of the dynamic link library ). However, we assume that the IP address value has not been modified throughout the main program. Then, when the master program releases the Dynamic Linked Library, the shared memory zone is also released. Therefore, in the future, access to IP addresses will inevitably lead to illegal access to memory addresses, resulting in program crash. Even if you do not use the IP address variable in the future, memory access exceptions may occur when the main program exits, because the IP address will analyze the structure when the program exits, memory Access exceptions may occur during structure analysis.

 

Memory Access exceptions mean two things: 1) No matter whether your program is beautiful, it will become invisible because of this error, and your reputation will also be lost because of this error. 2) In the future, you will suffer from this system-level error (in the C ++ world, it is not easy to find and eliminate this memory error ). This is the heart pain of C/C ++ programmers forever. It is a treasure of a thousand miles, and it is broken by the ant nest. If you do not know the features of the string class, finding such a Memory exception in thousands of lines of code would be a nightmare.

 

Note: There are many methods to correct the above bug. Here is a method for reference only:

String IP = (* pfun) ("host1"). CSTR ();

 

 

3. Postscript

 

The article should end here. The main purposes of this Article are as follows:

 

1) introduce the copy/Memory Sharing Technology During writing.

2) taking the string class in STL as an example, we introduce a design pattern.

3) in the C ++ world, no matter how elaborate your design and code are, it is hard to take care of all the situations. Smart pointers are a typical example. No matter how you design them, there will be very serious bugs.

4) C ++ is a double-edged sword. Only by understanding the principles can you better use C ++. Otherwise, you will be taken over. If you have a feeling of "playing C ++ is like playing with fire, you must be careful" when designing and using a class library, you will get started, when you are able to control the fire, you can learn it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.