Profile
In introducing Stdext, I have mentioned that the STL is well designed, but the following pieces are still not designed (or missing):
Allocator (Memory management)
String (Strings handling/Text processing)
Parallel programming (parallel programming)
We've talked a lot about memory management. Here we focus on the issues related to string processing/text processing. This is the first of the series, "complete reference for string processing".
History
string processing/Text processing is a long and complex topic. From simple to string comparisons (Compare), joins (Concat), to complex text edits, regular expressions, and parsing of HTML text content, all belong to the relevant category.
In the C language era, C library provides string processing functions based on char* data types, typically representing such as strlen,strcpy,strcat. Primitive, error-prone, is a typical feature of such string processing methods. In addition, the strcat efficiency is not high (Borland introduced strecpy to solve the problem. In fact, the generalization version of this strecpy, is later in the STL std::copy), and String lookup (STRSTR) is also used in the most primitive way.
The appearance of the STL string (basic_string) has improved this situation to some extent. At least C + + programmers have a string (string) class that uses the "friendly" interface. However, the string class can be said to be the most controversial class in the STL (as we explained in detail below). These controversies at least prove that the STL string class has a design flaw.
In the SGI STL, the Rope class is introduced. This is a heavyweight string class. Rope English is meant to be a rope. String English is intended to be a line. So rope is a heavyweight string, the name is very image, very much in place.
When the Stdext library began to consider string processing support, I introduced the following four classes: Std::string/std::stringbuilder/std::textpool/std::rope. Among them, Std::string/std::stringbuilder is actually the function of the STL String class splits. Std::string is a constant string, and Std::stringbuilder is responsible for the modification of the string. As we all know, the concept of String/stringbuilder is introduced in Java, and I've always thought that Java string-processing classes are much more logical than C + +, which combines the two. Std::textpool/std::rope is a heavyweight implementation of the String class, which is used to deal with mega-strings.
Defect of the STL string (basic_string)
To sum up, the STL string class mainly has the following points of contention:
There are too many interfaces and there is no good consistency between specifications and other STL containers. For example, String::find uses subscript instead of iterator as an iterative position, which is not the same as other containers.
Memory fragmentation. The memory fragmentation of the system is serious due to too frequent string constructs and destructors.
Copy-on-write and multithreading security. String (basic_string) is based on Copy-on-write technology because the assignment of string is designed to be low overhead. But once multithreaded security is taken into account, Copy-on-write spends a lot of time spending on locks. Some new STL implementations, such as the SGI STL, discard string implementations based on Copy-on-write.
String class for counting Stdext: String/stringbuilder/textpool/rope
Why do we need so many string classes? One reason: The application environment for string processing is complex and needs to be tailored to the fact that it is impossible to expect a string class to go through the world.
From the size of the supported string, String/stringbuilder focuses on solving the problem of small strings (especially StringBuilder, which is bound to have performance bottlenecks in large string situations). and Textpool, rope focus on solving the problem of mega-strings.
In terms of implementation, String/stringbuilder is linear memory. and Textpool, rope strings are not physically contiguous, they are logical strings.
In terms of supported operations, string is a constant string; Stringbuilder/textpool primarily supports overtype (set), add (append) operations, but does not recommend inserting (insert) operations, in terms of scalability, Textpool better than StringBuilder; rope's operations focus on optimizing string-level complex operations, such as fetching substrings, inserting, deleting, and so on, but a single character is slightly more expensive to modify and fetch (compared to the string/stringbuilder/ Textpool).
We'll expand on these components later in this article.