. NET StringBuilder implementations differ in 4.0 and 2.0

Source: Internet
Author: User

. Net 4.0 reconstructs the implementation of StringBuilder, using new data storage methods, not only greatly improve the efficiency, but also completely avoid the intermediate processing process of temporary string objects into the Loh large object heap situation. This paper analyzes this.

Review the StringBuilder implementation of. Net 2.0

Reflector view the implementation of StringBuilder:

Its internal data storage structure is a string (corresponding member variable m_stringvalue). The StringBuilder constructor can be initialized according to the specified string and capacity, and the default is an empty string (string. Empty), the default capacity is 16. The constructor uses the Getstringforstringbuilder method of string to assign a value to the M_stringvalue variable:

Viewing the implementation of Getstringforstringbuilder, M_stringvalue is treated as a char array whose size is determined by the capacity capacity, which can be greater than the size of the actual stored string object:

In the Append method, if the current character array is found to be unable to meet the storage requirements, call the Getnewstring method extension, otherwise use wstrcpy copy characters to the storage area:

Getnewstring expands the character array by doubling the capacity, and if it finds that the doubling extension does not meet the requirements, it is set directly to the Requiredlength size :

From the above analysis, capacity growth is not necessarily strictly 16-based growth, such as 16, 32, 64, 128, a certain expansion may lead to changes in cardinality. For example, if you initialize a StringBuilder with the string "ABCDE", the character array has a capacity of 16 and currently uses 5 of these locations. Then append a string of length 40, since doubling to 32 still does not meet the requirements, the capacity=45 will be adjusted, and the capacity growth sequence will become 16, 45, 90, 180.

The capacity expansion of the. Net 2.0 StringBuilder has several drawbacks:

1. Each capacity expansion generates a new string object whose contents are derived from copying the old string object. After that, the old string object is discarded. In this procedure, if the string object is greater than 85k (strictly speaking 85K/2, because it is a Unicode character), it will enter the Loh large Object heap

2. With increased capacity, such as greater than 1M, it can be difficult to find contiguous memory space in the Loh large object heap, especially if the Loh is fragmented

3. There is a slight loss of performance in the presence of additional copy consumption

4. To handle thread safety issues, including the m_stringvalue itself is defined as volatile, there will be a slight loss of performance

The StringBuilder, which is implemented by. Net 2.0, is visible, and the internal storage structure is a simple string object. when the internal string object has insufficient capacity to accommodate the new append data, it needs to expand capacity by doubling the capacity. After the capacity expansion, the old string object and the new append data are copied to the new memory area. In this process, a new string object is generated and the old string object is discarded. If the size of the string object is already greater than 85k, then each capacity expansion will have two string objects entered into the Loh. essentially, instead of connecting two string objects directly with the "+" method, StringBuilder only drastically reduces the generation frequency of temporary string objects, and does not completely circumvent the generation of temporary string objects .

New StringBuilder implementation of. Net 4.0

The way that. Net 4.0 implements StringBuilder is wonderful:

The storage structure is no longer a string object, but is explicitly a character array (corresponding to the member variable M_chunkchars). The maximum size of each StringBuilder character array is limited by max_chunksize, which is 8000 (corresponding to the 16-0x1f40). If you need to store more than 8000 of the data, how to deal with it? The mystery is on the m_chunkprevious member variable, each StringBuilder maintains a pointer to the StringBuilder instance, which forms a StringBuilder linked list. That is, a logical storage, which is actually shared by all StringBuilder instances on the linked list.

The constructor initializes the M_chunkchars code as follows, and the default character array size is still 16:

Capacity expansion is done in the following ways:

when capacity expands, let m_chunkprevious point to yourself! a wonderful implementation that avoids the copy overhead of. Net 2.0 is simply the assignment of a pointer:

The Math.max calculation, which determines the allocation of new character arrays, is controlled within 8000, ensuring that the Loh large object heap is not entered.

Overall, the main implementation flaws of. Net 2.0 are optimized. Of course, due to the introduction of linked lists, the processing of some methods becomes complex. For example:

But to achieve the same wonderful, do not need to start copying characters from beginning to end, from the tail is efficient practice.

Summarize

The StringBuilder implementation of. Net 4.0 optimizes the storage of the original single string object to a series of memory Chunk. Each StringBuilder contains a block of memory that, by referencing other StringBuilder objects, forms a linked list of memory blocks. By limiting the maximum size of each memory block, ensure that the StringBuilder does not produce large objects during the capacity expansion process. From the implementation algorithm, the most commonly used append method is more efficient than. NET 2.0, ToString is roughly equivalent, and the Insert method is less efficient than. NET 2.0. Overall, the implementation is quite wonderful.

Personally, StringBuilder default capacity of 16 is too small, often resulting in the initial need for multiple capacity expansion. If 16 of the capacity can meet the requirements, it will not actually use StringBuilder, and directly with the string+. Individuals think that 512 or 1024 is the more appropriate option.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.