C/C ++ string processing check: char */string/stringbuilder/textpool/rope

Source: Internet
Author: User
C/C ++ string processing check: char */string/stringbuilder/textpool/rope

Xu Shiwei
2008-3-20

Summary

When introducing stdext, I once mentioned that STL is well-designed, but the following parts are still insufficient (or missing ):

  • Allocator (Memory Management)
  • String (string processing/Text Processing)
  • Parallel Programming (Parallel Programming)

We have already talked a lot about memory management. Here we will focus on the issues related to string processing and text processing. This is the first article in the series "complete string processing reference.

History

String processing/text processing is a long and complex topic. From simple to string comparison (compare), connection (Concat), to complex text editing, regular expressions, and parsing of HTML text content are all related fields.

In the C language era, the C Library provides string processing functions based on char * data types, typical representatives such as strlen, strcpy, strcat, etc. Primitive and error-prone, which is a typical feature of such string processing methods. In addition, strcat is not efficient (Borland introduced strecpy to solve this problem. In fact, the general version of strecpy is STD: copy in STL, and string search (strstr) is also the most primitive method.

The appearance of STL string (basic_string) improves this situation to a certain extent. At least the C ++ programmer has a "friendly" string class. However, the string class can be said to be the most controversial Class in STL (which will be explained in detail below ). These disputes at least prove that the STL string class has design defects.

The rope class is introduced in sgi stl. This is a heavyweight string class. Rope is originally a rope. String is a line in English. Therefore, rope is a heavyweight string, and this name is very good.

When the stdext Library began to consider string processing support, I introduced the following four classes: STD: String/STD: stringbuilder/STD: textpool/STD: rope. STD: String/STD: stringbuilder is actually a function split of the STL string class. STD: string is a regular string, while STD: stringbuilder is responsible for modifying strings. It is clear that the concept of string/stringbuilder is introduced from Java, I always think that the design of the Java string processing class is much more reasonable than that of the string implementation that combines the two in C ++. STD: textpool/STD: rope is a heavyweight Implementation of the string class, used to process giant strings.

STL string (basic_string) Defects

To sum up, the string class of STL has the following controversial points:

  • There are too many interfaces and the specifications are not consistent with those of other STL containers. For example, string: Find uses subscript instead of iterator as the iteration position, which is different from other containers.
  • Memory fragments. The memory fragmentation of the system is serious due to frequent string construction and analysis.
  • Copy-on-write and multi-thread security. String (basic_string) is based on the copy-on-write technology because the value assignment of string is designed to be low overhead. However, considering the security of multiple threads, copy-on-write will spend a lot of time on the lock overhead. Some new STL implementations (such as sgi stl) discard the string implementation based on copy-on-write.
Check the stdext string class: String/stringbuilder/textpool/rope

Why do we need so many string classes? One reason: The application environment of string processing is very complex and needs to be adapted to local conditions. It is impossible to expect a string class to travel all over the world.

In terms of the size of supported strings, string/stringbuilder focuses on solving the problem of small strings (especially stringbuilder, in the case of large strings, there must be performance bottlenecks ). While textpool and rope focus on solving the problem of giant strings.

In terms of implementation, string/stringbuilder is linear memory. The textpool and rope strings are not physically consecutive. They are logical strings.

String is a regular string for supported operations. stringbuilder/textpool mainly supports set and append operations, but insert operations are not recommended. In terms of scalability, textpool is better than stringbuilder, while rope focuses on optimizing string-level complex operations, such as getting sub-strings, inserting and deleting, however, the modification and acquisition cost for a single character is slightly higher (compared with string/stringbuilder/textpool ).

The following sections describe these components.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.