Standard C ++ STD: String Memory Sharing and copy-on-write Technology

Source: Internet
Author: User
Tags first string
1. Concepts
Scott Meyers gave an example in more effective C ++. Do you still remember this? When you are still at school, your parents want you not to watch TV, but to review your lessons, so you keep yourself in the room and make a look at your homework, in fact, you are doing other things, such as writing love letters to a girl in the class. Once your parents come out and check whether you are reviewing in your room, you can really pick up textbooks and read books. This is the "Procrastination tactic", which is not done until you have to do it.

Of course, this kind of thing often happens in real life, but it has become the most useful technology in the programming world, just as variables can be declared everywhere in C ++, Scott Meyers recommends that you declare variables (allocate memory) only when you really need a bucket ), in this way, the minimum memory consumption of the program during running is obtained. The time-consuming work such as allocating memory will be done only after execution, which will give our program better performance during running. After all, 20% of the programs run 80% of the time.

Of course, the procrastination tactics are not just such a type. This technology is widely used, especially in the operating system. When a program is running, the operating system does not rush to clear the memory, because it is possible that the program will run again immediately (loading the program into the memory from the disk is a very slow process ), only when the memory is insufficient can these programs that still reside in the memory be cleared out.

The copy-on-write technology is the product of the "lazy behavior" in the programming world-delaying tactics. For example, if we have a program that needs to write files and constantly write data from the network, if each fwrite or fprintf requires a disk I/O operation, this is simply a huge performance loss. Therefore, the common practice is to write each file write operation in a memory (disk cache) of a specific size. Only when we close the file, write to the disk (this is why writing is lost if the file is not closed ). What's more, when files are closed, they are not written to the disk, but they are written to the disk until they are shut down or the memory is insufficient. UNIX is such a system. If the system unexpectedly exits, data will be lost, the file is damaged.

For the sake of performance, we need to take such a big risk. Fortunately, our program will not be too busy to forget that another piece of data needs to be written to the disk. Therefore, this practice is necessary.

2. Standard C ++ STD: String Copy-on-write

We often use the string class in the STL standard template library, which is also a class with the copy technology when writing. C ++ has been widely questioned and accused of performance issues. To improve performance, many classes in STL adopt the copy-on-write technology. This kind of lazy behavior does make STL programs have relatively high performance.

Briefly describe the concept of memory allocation for the string class. Generally, there must be a private member in the string class, which is a char *. The user records the address of memory allocated from the stack. The user records the memory allocated during the construction and releases the memory during the analysis. Because the memory is allocated from the heap, the string class is very careful in maintaining this memory. When the string class returns this memory address, it only returns const char *, that is, read-only, if you want to write data, you can only rewrite the data using the method provided by string.

2.1. Features

From the table to the inside, from the sensibility to the rationality, let's take a look at the copy-on-write surface features of the string class. Let's write down the following program:

# Include
# Include
Using namespace STD;

Main ()
{
String str1 = "Hello World ";
String str2 = str1;

Printf ("sharing the memory:/N ");
Printf ("/tstr1's address: % x/N", str1.c _ STR ());
Printf ("/tstr2's address: % x/N", str2.c _ STR ());

Str1 [1] = 'q ';
Str2 [1] = 'W ';

Printf ("after copy-on-write:/N ");
Printf ("/tstr1's address: % x/N", str1.c _ STR ());
Printf ("/tstr2's address: % x/N", str2.c _ STR ());

Return 0;
}

The intention of this program is to let the second string be constructed through the first string, then print out the memory address of the data, and then modify the content of str1 and str2 respectively, check the memory address. The program output is as follows (vc6.0 and G ++ 2.95 get the same result ):

> G ++-O stringtest. cpp
>./Stringtest
Sharing the memory:
Str1's address: 343be9
Str2's address: 343be9
After copy-on-write:
Str1's address: 3407a9
Str2's address: 343be9

From the results, we can see that after the first two statements, the addresses of str1 and str2 storing data are the same, and after the modified content, the address of str1 has changed, the str2 address is still the original one. In this example, we can see the copy-on-write technology of the string class.

2.2. In-depth
We should know that in the string class, we need to solve two problems: Memory Sharing and copy-on-wirte, these two topics give us a lot of questions:
1. What is the principle of copy-on-write?
2. Under what circumstances does the string class share the memory?
3. Under what circumstances does the string class copy (copy-on-write) when writing is triggered )?
4. What happened during copy-on-write?
5. What is the specific implementation of copy-on-write?

2.3. What is the principle of copy-on-write?

Programmers with some experience must know that copy-on-write must use "reference count". Yes, there must be a variable similar to refcnt. When the first class is constructed, the string constructor allocates memory from the stack based on the input parameters. When other classes need this memory, this count is automatically accumulated, when there is a class structure, this count will be reduced by one until the last class structure, at this time the refcnt is 1 or 0, at this time, the program will actually free the memory allocated from the stack.

Reference counting is the principle that data is copied only when written in the string class!

However, the problem arises again. Where should the refcnt exist? If it is stored in the string class, each string instance has its own set. There cannot be a refcnt in total. If it is declared as a global variable or a static member, that is to say, all the string classes share one. This is not enough. We need a solution that is "democratic and centralized.

2.3.1. Under what circumstances does the string class share the memory?

The answer to this question should be obvious. According to common sense and logic, if a class uses data of another class, it can share the memory of the class being used. This is quite reasonable. If you don't need me, you don't need to share it. Only you can use mine to share it.

When using data of other classes, there are only two cases: 1) construct itself with other classes, 2) assign values with other classes. In the first case, the copy constructor is triggered, and in the second case, the value assignment operator is triggered. In both cases, we can implement the corresponding method in the class. In the first case, you only need to perform point processing in the copy constructor of the string class to accumulate the reference count. Similarly, in the second case, you only need to overload the value assignment operator of the string class and add a bit of processing to it.

Note:
1) differences between construction and assignment
For the two sentences in the previous routine:
String str1 = "Hello World ";
String str2 = str1;
Do not think that "=" is a value assignment operation. In fact, these two statements are equivalent:
String str1 ("Hello World"); // The constructor is called.
String str2 (str1); // call the copy constructor.

If str2 is like this:
String str2; // The default call parameter is an empty string constrfunction: String str2 ("");
Str2 = str1; // call str2's value assignment operation: str2.operator = (str1 );

2) another situation
Char TMP [] = "Hello world ";
String str1 = TMP;
String str2 = TMP;
In this case, will Memory sharing be triggered? Share it. However, according to the shared memory we mentioned earlier, the declaration of the two string classes and the initial statement do not conform to the two situations mentioned above, so Memory Sharing does not occur. In addition, the existing features of C ++ cannot allow us to share class memory in this case.

2.3.2. Under what circumstances does the string class copy (copy-on-write) when writing is triggered )?

Obviously, copy-on-write occurs only when the content of classes that share the same memory changes. For example, the [], =, + =, +, operator assignment of the string class, and some member functions such as insert, replace, and append in the string class, including the class destructor.

Copy-on-write is triggered only when data is modified. This is the true meaning of the extended tactics.

2.3.3. What happened during copy-on-write?

We may decide whether to copy data based on the access count. refer to the following code:

If (refcnt> 0 ){
Char * TMP = (char *) malloc (strlen (_ PTR) + 1 );
Strcpy (TMP, _ PTR );
_ PTR = TMP;
}

The above code is a hypothetical COPY method. If there are other classes in the reference (check the reference count to learn) memory, you need to "copy" the modified class.

We can encapsulate the copy operation into a function for the member functions that change the content.

2.3.4. What is the specific implementation of copy-on-write?

Finally, we mainly solve the "democratic concentration" problem. Take a look at the following code:

String H1 = "hello ";
String H2 = h1;
String H3;
H3 = h2;

String W1 = "world ";
String W2 ("");
W2 = W1;

Obviously, we want H1, H2, and H3 to share the same memory and W1 and W2 to share the same memory. In H1, H2, and H3, we need to maintain a reference count, and in W1 and W2, we need to maintain a reference count.

How can we use a clever method to generate the two reference counts? We thought that the memory of the string class is dynamically allocated on the heap. Since the classes of the shared memory point to the same memory zone, why don't we allocate more space in this area to store this reference count? In this way, all classes that share a memory zone have the same reference count, and the address of this variable is in the shared zone, then all the classes that share this memory can be accessed, so that you can know the number of referers of this memory.

See:

Therefore, with such a mechanism, every time we allocate memory for the string, we always need to allocate another space to store the reference count value, as long as the copy structure occurs but the value is assigned, the memory value will be added. When the content is modified, the string class checks whether the reference count is 0. If it is not zero, it indicates that someone is sharing the memory, so you need to make a copy first, then, subtract one from the reference count and copy the data. The following program snippets illustrate these two actions:

// Constructor (memory splitting)
String: string (const char * TMP)
{
_ Len = strlen (TMP );
_ PTR = new char [_ Len + 1 + 1];
Strcpy (_ PTR, TMP );
_ PTR [_ Len + 1] = 0; // sets the reference count.
}

// Copy structure (shared memory)
String: string (const string & Str)
{
If (* This! = Str ){
This-> _ PTR = Str. c_str (); // shared memory
This-> _ Len = Str. szie ();
This-> _ PTR [_ Len + 1] ++; // reference count plus one
}
}

// Copy-on-write only when writing
Char & string: operator [] (unsigned int idx)
{
If (idx> _ Len | _ PTR = 0 ){
Static char nullchar = 0;
Return nullchar;
}

_ PTR [_ Len + 1] --; // The reference count minus one.
Char * TMP = new char [_ Len + 1 + 1];
Strncpy (TMP, _ PTR, _ Len + 1 );
_ PTR = TMP;
_ PTR [_ Len + 1] = 0; // sets the reference count for the new shared memory.

Return _ PTR [idx];
}

// Process the destructor
~ String ()
{
_ PTR [_ Len + 1] --; // The reference count minus one.

// When the reference count is 0, the memory is released
If (_ PTR [_ Len + 1] = 0 ){
Delete [] _ PTR;
}

}

The entire technical details have completely surfaced.

However, this is a little different from the Implementation Details of basic_string in STL. When you open the source code of STL, you will find that the reference count is accessed through the following method: _ PTR [-1]: In the standard library, the memory of the reference count is allocated in front of it (the code I gave is to allocate the reference count following it, this is not good). The advantage of allocation is that when the length of a string is extended, you only need to expand its memory later, without moving the memory storage location of the reference count, this saves a little time.

The memory structure of the string in STL is like the figure I drew above. _ PTR points to the data zone, while refcnt points to the _ Ptr-1 or _ PTR [-1.

2.4 bug

Who said, "Where there is a sun, there will be darkness "? Perhaps many of us are superstitious about standard things and think that they have been tested for a long time and cannot make mistakes. Well, do not be superstitious, because any well-designed code or code may have bugs in a specific situation. The same is true for STL, the shared memory/Write-only copy technology of the string class is no exception, and this bug may cause your entire program to crash!

Assume that there is a dynamic link library (mynet. dll or mynet. So) that such a function returns the string class:

String getipaddress (string hostname)
{
Static string IP;
......
......
Return IP;
}

While your main program dynamically loads this dynamic link library and calls this function:

Main ()
{
// Load functions in the dynamic link library
Hdll = loadlibraray (.....);
Pfun = getmodule (hdll, "getipaddress ");

// Call functions in the dynamic link library
String IP = (* pfun) ("host1 ");
......
......
// Release the dynamic link library
Freelibrary (hdll );
......
Cout

  • Previous Article: C ++ (heavy load, coverage, and hiding)
  • Next article: libxml library parsing XML files
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.