Boost Learning Series 5-string processing-(on)

Source: Internet
Author: User
Tags first string

 I. Overview

Recently, I started to work again, and the chance to learn boost is also reduced. In addition, there are many compilation errors when using boost, which makes the process of writing articles unpredictable. However, I still look forward to this part. This is the most common application in normal times, and it is also the expert skill of boost. I will introduce it in detail. In standard C ++std::stringClass, which provides many string operations, including searching for functions with specified characters or substrings. Although
std::stringIt includes more than functions and is one of the most bloated classes in the Standard C ++, but it still cannot meet the needs of many developers in their daily work. For example, a function provided in Java that converts strings to uppercase letters,std::stringThere is no corresponding function. The boost C ++ library tries to make up for this defect.

Ii. region settings

Before entering the topic, you need to take a look at the problem of region settings. Many functions mentioned in this chapter require an additional region setting parameter. The region settings encapsulate content related to cultural customs in the Standard C ++, including currency symbols, date and time formats, symbols that separate the integer part and the score part (base character) and the delimiter (kilobytes) when there are more than three digits ).

In terms of string processing, the region settings are related to the character order and the description of special characters in a specific culture. For example, whether the alphabet contains a variant vowel and its location in the alphabet is determined by the language and culture. If a function is used to convert a string to an upper-case string, its implementation steps depend on the specific region settings. In German, the letter 'ä' is obviously to be converted to 'ä', but it is not necessarily in other languages.

When Using STD: string, the region settings can be ignored because its functions do not depend on specific languages. However, in this chapter, to use the boost C ++ library, knowledge about region settings is essential. In the C ++ standard, the class STD: locale is defined in the locale file. Each C ++ program automatically has an instance of this type, that is, global region settings that cannot be accessed directly. To access it, you need to use the default constructor to construct an object like STD: locale and use the same attributes as the global region settings for initialization. As follows:

#include <locale> #include <iostream> int main() {   std::locale loc;   std::cout << loc.name() << std::endl; }

The above program outputs C in iostream, which is the name of the basic region settings. It includes the default description used in programs written in C. This is also the default global region setting for each c ++ application, which includes the description used in American culture. If the currency symbol uses the dollar symbol, the base character is an English ending, and the month in the date is written in English. You can use the static function global () in the STD: locale class to change the global region settings.

#include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::locale loc;   std::cout << loc.name() << std::endl; }

The static function global receives an object of the STD: locale type as a unique parameter. The constructor of another version of this class accepts a string of the const char * type, you can create a region setting object for a special culture. However, except for region C settings, the names set in other regions are not standardized, depending on the C ++ standard library that accepts the region settings. The vs 2008 language string document specifies that the language string can be used.
"German" is defined as a German culture.

The output of the above program is german_germany.1252. If the specified language string is "German", German culture is selected as the main language and sub-language. Here, character ing 1252 is selected. Similarly, if you want to specify sub-language settings that are different from German culture, such as Swiss, you need to use different language strings.

#include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German_Switzerland"));   std::locale loc;   std::cout << loc.name() << std::endl; } 

Now the program will output german_switzerland.1252.

After a preliminary understanding of region settings and how to change global settings, the following example shows how region settings affect string operations.

#include <locale> #include <iostream> #include <cstring> int main() {   std::cout << std::strcoll("ä", "z") << std::endl;   std::locale::global(std::locale("German"));   std::cout << std::strcoll("ä", "z") << std::endl; } 

In this example, the STD: strcoll () function defined in the file cstring is used to compare whether the first string is smaller than the second string in alphabetical order. That is to say, which of the two strings is in the front of the Dictionary (depressed, VC does not let the input "ä, automatically changed '? '). Run the program. The result is 1 and-1. Although the parameters of the function are the same, different results are obtained. The reason is simple. When STD: strcoll () is called for the first time, global C is used.
Region settings. In the second call, the global region settings are changed to the German culture. From the output, we can see that in the two region settings, the order of characters 'ä' and 'Z' is different.

Many C functions and C ++ streams are related to region settings. Although functions in the STD: string class work independently of the region settings, the functions mentioned in the following sections do not. Therefore, this chapter also mentions the relevant content of regional settings.

Iii. Boost. stringalgorithms

The boost C ++ string algorithm library provides many character operation functions. The string type of the operation can be STD:; string, STD: wstring, or any other template-type instance of STD: basic_string. The header file boost/algorithm/string must be included during use. HPP. Many functions in this library can accept objects of the STD: local type as additional optional parameters. If this parameter is not set, the default global region setting is used. Let's take a look at the Germany region:

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> #include <clocale> int main() {   std::setlocale(LC_ALL, "German");   std::string s = "Boris Schäling";   std::cout << boost::algorithm::to_upper_copy(s) << std::endl;   std::cout << boost::algorithm::to_upper_copy(s, std::locale("German")) << std::endl; }

The to_upper_copy function is used to convert a string to uppercase. It returns the converted string. The above Code uses the default global region settings for the first call, and the region is explicitly set to the German culture for the second call. Obviously, the latter conversion is correct, because the lowercase letter 'ä' corresponds to the upper-case format 'ä. In Area C, ä' is an unknown character, so it cannot be converted. To get the correct result, you must explicitly pass the correct region Setting Parameter or call boost: algorithm: to_upper_copy ()
Previously changed the global region settings. It can be noted that the program uses the function STD: setlocale () defined in the header file clocale to set the region for the C function, because STD: cout uses the C function to display information on the screen. After the correct region is set, the vowel letters such as 'ä' and 'ä' can be correctly displayed. In addition, the setlocale function in the program can be replaced by STD: locale: Global, which is used as the global region setting operation.

The boost. stringalgorithms Library also provides several functions to delete individual letters from strings, allowing you to specify where to delete and how to delete them. For example, you can use the function boost: algorithm: erase_all_copy () to delete a specific character from the entire string. to delete a character only when it appears for the first time, you can use the function boost :: algorithm: erase_first_copy (). To delete several characters in the header or tail of a string, use the boost: algorithm: erase_head_copy () and boost: algorithm: erase_tail_copy ():

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "Boris Schäling";   boost::iterator_range<std::string::iterator> r = boost::algorithm::find_first(s, "Boris");   std::cout << r << std::endl;   r = boost::algorithm::find_first(s, "xyz");   std::cout << r << std::endl; } 

The following functions boost: algorithm: find_first (), boost: algorithm: find_last (), boost: algorithm: find_nth (), boost: algorithm :: find_head () and boost: algorithm: find_tail () can be used to search for substrings in strings.

The above program also uses a boost: iterator_range, which is the return type of all these functions. This class originated from the boost. Range library of boost C ++. It defines the "range" in the concept of the iterator ". Because the operator <is overloaded by the boost: iterator_range class, the results of a single search algorithm can be directly written to the standard output stream. The above program uses Boris as the first result output, and the second result is a null string.

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> #include <vector> int main() {   std::locale::global(std::locale("German"));   std::vector<std::string> v;   v.push_back("Boris");   v.push_back("Schäling");   std::cout << boost::algorithm::join(v, " ") << std::endl; } 

The function boost: algorithm: Join () accepts a string container as the first parameter and connects these strings according to the second parameter. In this example, Boris schäling is output.

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "Boris Schäling";   std::cout << boost::algorithm::replace_first_copy(s, "B", "D") << std::endl;   std::cout << boost::algorithm::replace_nth_copy(s, "B", 0, "D") << std::endl;   std::cout << boost::algorithm::replace_last_copy(s, "B", "D") << std::endl;   std::cout << boost::algorithm::replace_all_copy(s, "B", "D") << std::endl;   std::cout << boost::algorithm::replace_head_copy(s, 5, "Doris") << std::endl;   std::cout << boost::algorithm::replace_tail_copy(s, 8, "Becker") << std::endl; } 

Boost. the stringalgorithms library not only provides functions for searching substrings or deleting letters, but also provides functions for replacing substrings with strings, including boost: algorithm: replace_first_copy (), boost :: algorithm: replace_nth_copy (), boost: algorithm: replace_last_copy (), boost: algorithm: replace_all_copy (),
Boost: algorithm: replace_head_copy () and boost: algorithm: replace_tail_copy. They are used in the same way as the search and delete functions. The difference is that an alternative string is needed as an additional parameter.

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "\t Boris Schäling \t";   std::cout << "." << boost::algorithm::trim_left_copy(s) << "." << std::endl;   std::cout << "." <<boost::algorithm::trim_right_copy(s) << "." << std::endl;   std::cout << "." <<boost::algorithm::trim_copy(s) << "." << std::endl; } 

You can use the trim function boost: algorithm: trim_left_copy (), boost: algorithm: trim_right_copy (), and boost: algorithm: trim_copy () and so on. What is a space character depends on the global region settings.

The function of the boost. stringalgorithms library can accept an additional predicate parameter to determine which characters the function acts on the string. The predicate version of the TRIM function is named boost: algorithm: trim_left_copy_if (), boost: algorithm: trim_right_copy_if () and boost: algorithm: trim_copy_if ().

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "--Boris Schäling--";   std::cout << "." << boost::algorithm::trim_left_copy_if(s, boost::algorithm::is_any_of("-")) << "." << std::endl;   std::cout << "." <<boost::algorithm::trim_right_copy_if(s, boost::algorithm::is_any_of("-")) << "." << std::endl;   std::cout << "." <<boost::algorithm::trim_copy_if(s, boost::algorithm::is_any_of("-")) << "." << std::endl; } 

The above program calls an auxiliary function boost: algorithm: is_any_of (), which is used to generate a predicate to verify whether the character passed as a parameter exists in the given string. After the boost: algorithm: is_any_of function is used, as shown in the example, the characters in the trim string are specified as hyphens. The boost. stringalgorithms class also provides many auxiliary functions that return common predicates.

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "123456789Boris Schäling123456789";   std::cout << "." << boost::algorithm::trim_left_copy_if(s, boost::algorithm::is_digit()) << "." << std::endl;   std::cout << "." <<boost::algorithm::trim_right_copy_if(s, boost::algorithm::is_digit()) << "." << std::endl;   std::cout << "." <<boost::algorithm::trim_copy_if(s, boost::algorithm::is_digit()) << "." << std::endl; } 

The predicate returned by the boost: algorithm: is_digit () function returns a Boolean value true when the character is a number. The secondary functions that check whether the characters are uppercase or lowercase are boost: algorithm: is_upper () and boost: algorithm: is_lower (). All these functions use global region settings by default, unless other region settings are specified in the parameter.

In addition to the predicates of individual characters, the boost. stringalgorithms Library also provides functions to process strings.

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "Boris Schäling";   std::cout << boost::algorithm::starts_with(s, "Boris") << std::endl;   std::cout << boost::algorithm::ends_with(s, "Schäling") << std::endl;   std::cout << boost::algorithm::contains(s, "is") << std::endl;   std::cout << boost::algorithm::lexicographical_compare(s, "Boris") << std::endl; } 

Function boost: algorithm: starts_with (), boost: algorithm: ends_with, boost: algorithm: contains and boost: algorithm: lexicographical_compare () both strings can be compared.

Next we will introduce a string cutting function.

#include <boost/algorithm/string.hpp> #include <locale> #include <iostream> #include <vector> int main() {   std::locale::global(std::locale("German"));   std::string s = "Boris Schäling";   std::vector<std::string> v;   boost::algorithm::split(v, s, boost::algorithm::is_space());   std::cout << v.size() << std::endl; } 

After a Delimiter is given, you can use the boost: algorithm: Split () function to split a string into a string container. It needs to specify a predicate as the third parameter to determine where the string should be separated. This example uses the Helper function boost: algorithm: is_space () to create a predicate that splits the string at each space character.

Many functions in this section have versions that ignore string case sensitivity. These versions generally have names similar to those of the original function. The difference is that they only start with 'I. For example, the function boost: algorithm: erase_all_copy () corresponds to the function boost: algorithm: ierase_all_copy ().

Finally, it is worth noting that many functions in the boost. stringalgorithms class support regular expressions. The following program uses the function boost: algorithm: find_regex () to search for a regular expression.

#include <boost/algorithm/string.hpp> #include <boost/algorithm/string/regex.hpp> #include <locale> #include <iostream> int main() {   std::locale::global(std::locale("German"));   std::string s = "Boris Schäling";   boost::iterator_range<std::string::iterator> r = boost::algorithm::find_regex(s, boost::regex("\\w\\s\\w"));   std::cout << r << std::endl; } 

To use regular expressions, this program uses boost: RegEx In the boost C ++ library, which will be introduced in the next section.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.