C ++ string implementation principle, string implementation principle

Source: Internet
Author: User

C ++ string implementation principle, string implementation principle

C ++ programmers often use the string (wstring) class during coding. Have you ever thought about its internal implementation details. For example, how is the iterator of this class implemented? How many bytes of memory does an object occupy? Is there any internal virtual function? How is the memory allocated? What is the cost of construction and analysis? I will give a brief introduction to the source code and personal understanding I have read over the past two days. I hope the readers will point out the error.

First, let's take a look at the definitions of the string and wstring classes:

typedef basic_string<char, char_traits<char>, allocator<char> > string;typedef basic_string<wchar_t, char_traits<wchar_t> allocator<wchar_t> > wstring;

From this definition, we can see that string and wstring are the template class basic_string for char and wchar_t special.

Let's take a look at the inheritance relationship of the basic_string class (the class method is not listed ):


The top-level class is _ Container_base, which is also the base class of the STL container. It contains a member of _ Iterator_base * and points to the initial element of the container, so that the container can be traversed.

This class actually only defines two functions

void _Orphan_all() const;// orphan all iteratorsvoid _Swap_all(_Container_base_secure&) const;// swaps all iterators
The _ String_base class has no data members and only defines three functions for exception handling:

static void _Xlen();// report a length_errorstatic void _Xran();// report an out_of_range errorstatic void _Xinvarg();
_ String_val contains an alloctor object. This class is also very simple. Except for constructors, no other functions are defined.
The above three base classes are defined very easily, while the implementation of the basic_string class is very complicated. However, like most standard libraries, it divides complex functions into several parts for implementation, fully reflecting the low coupling of modules.

Iterator-related operations are implemented by the _ String_iterator class, element-related operations are implemented by the char_traits class, and memory allocation is implemented by the allocator class.

The inheritance relationships of the _ String_iterator class are as follows:


This class implements the general operations of the iterator, such:

reference  operator*() const;pointer operator->() const_String_iterator & operator++()_String_iterator operator++(int)_String_iterator& operator--()_String_iterator operator--(int)_String_iterator& operator+=(difference_type _Off)_String_iterator operator+(difference_type _Off) const_String_iterator& operator-=(difference_type _Off)_String_iterator operator-(difference_type _Off) constdifference_type operator-(const _Mybase& _Right) constreference operator[](difference_type _Off) const

With the implementation of the iterator, you can easily use the functions in the algorithm library, for example, convert all characters to lowercase:

string s("Hello String");transform(s.begin(), s.end(), s.begin(), tolower);

The char_traits class diagram is as follows:


This class defines character assignment, copy, comparison, and other operations. You can also redefine this class if you have special requirements.

The allocator class diagram is as follows:


This class uses new and delete to allocate and release memory. You can also define your own allocator. msdn describes which methods must be defined.

Let's take a look at the data members of the basic_string class:

_ Mysize indicates the actual number of elements. The initial value is 0;

_ Myres indicates the maximum number of elements that can be stored currently (more than this size will be re-allocated memory), the initial value is _ BUF_SIZE-1;

_ BUF_SIZE is an enum type:

enum{// length of internal buffer, [1, 16]_BUF_SIZE = 16 / sizeof (_Elem) < 1 ? 1: 16 / sizeof(_Elem)};

According to this definition, the values of char and wchar_t are 16 and 8, respectively.

_ Bxty is a union:

union _Bxty{// storage for small buffer or pointer to larger one_Elem _Buf[_BUF_SIZE];_Elem *_Ptr;} _Bx;

Why DEFINE _ Bxty like that? Let's look at the following code:

_Elem * _Myptr(){// determine current pointer to buffer for mutable stringreturn (_BUF_SIZE <= _Myres ? _Bx._Ptr : _Bx._Buf);}
This function returns the element pointer inside basic_string (c_str function calls this function ).

Therefore, if the number of elements is smaller than _ BUF_SIZE, no memory is allocated. The _ Buf array is used directly, and _ Myptr Returns _ Buf. Otherwise, the memory will be allocated. _ Myptr Returns _ Ptr.

But what is the memory allocation policy? Is it doubled every time like a vector? The answer is no. Check the following code:

void _Copy(size_type _Newsize, size_type _Oldlen){// copy _Oldlen elements to newly allocated buffersize_type _Newres = _Newsize | _ALLOC_MASK;if (max_size() < _Newres)_Newres = _Newsize;// undo roundup if too bigelse if (_Newres / 3 < _Myres / 2 && _Myres <= max_size() - _Myres / 2)_Newres = _Myres + _Myres / 2;// grow exponentially if possible//other code}
The value of _ ALLOC_MASK is _ BUF_SIZE-1. This code looks a bit complicated. The simple description is: _ BUF_SIZE is added to _ Myres each time, and the value is increased by half every hour when it reaches a certain value.

For char and wchar_t, the critical value of memory allocation is (more than these values need to be re-allocated ):

Char: 105,157,235,352,528,792,118...

Wchar_t: 7, 15, 23, 34, 51, 76,114,171,256,384,576,864,129 6, 1944...

After reallocation, old elements are first copied to the new memory address. Therefore, when the processing length increases and the maximum size is known, you can call the reserve Function to pre-allocate the memory to improve efficiency.

How many bytes of memory does the string class occupy?

_ Container_base contains a pointer in 4 bytes. The _ String_val class contains an allocator object. The string class uses the default allocator class. This class has no data members, but it occupies 4 bytes based on the principle of byte alignment. The total number of members of the basic_string class is 24, so the total number is 32 bytes. Wstring is also 32 bytes. The cause has been analyzed.


To sum up, the string and wstring classes use _ String_iterator to implement iterator operations, which both occupy 32 bytes of memory and have no virtual functions. The cost of construction and analysis is low and the memory allocation is flexible.

There are also a lot of inconveniences when using the string class. I have written an extension class. You are welcome to provide valuable comments.

Extension link: http://blog.csdn.net/passion_wu128/article/details/38354541


What is the internal structure of the string type in C ++?

The answer upstairs is incorrect.
In c ++, string is a class. Here, a is an object,
When you use sizeof to evaluate the length of an object,
This will take into account the following situations,
1. attributes of the string class. Different sdks are different,
2. Whether there are virtual functions. To register a virtual base table, you must also open up the memory,
3. Memory alignment is also used in the class. Int a, B; double c; the declared order is different from that of int a; double B, int c,
Considering these three points, you can get the result of sizeof.
What I want to tell you is the principle. Of course, the development packages in different compilers are different, and the string members may be different,
But the principle of sizeof is as above.
Reference: ogin_u

In java, String a = "a" + "B" + "c" + "d" creates several objects (the principles are described below)

One
Because "a", "B", "c", "d" are constants, the value of a can be determined during the compilation period. This sentence is equivalent:
String a = "abcd ";
During the compilation process, the constant "abcd" is first found in the constant pool. If no "abcd" constant is found, an object is created and no object is created in the heap.

Upstairs is equivalent to String a = new String (A); this is not accurate, because new String () will allocate memory space in the heap, obviously, no space is allocated in the heap.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.