C + + string participle

Source: Internet
Author: User
Tags strtok
C + + string participle

Dompo

qq:84638372 A Brief introduction

A string participle that divides a complete string into more fields according to a rule. In the C library, Strtok/wcstok provides similar functionality, and the C + + standard library is compatible with the C. Library. The StringStream of C + + has similar functionality, and Boost.string_algorithm also provides similar generic algorithms. In addition, in boost, which specializes in providing boost.tokenizer to do this work, its implementation is a good interpretation of C + + generic design, of course, it is far from the perfect degree. Matthew Wilson also provides similar components in its stlsoft, Stlsoft.string_tokeniser. They each have their own characteristics, and then we do some discussion and research.

two C library

The C library provides strtok/wcstok to achieve similar functionality, but they have obvious drawbacks:

1. Non-reentrant. This is because it uses internal static variables to hold the associated state. If TLS is not considered by the C library implementation, there is also the issue of competitive conditions (refer to <windows via C + +, fifth edition> Chapter 21:thread-local Storage) for more information.

2. The parameters must be writable.

3. The parameter must be a C-style string.

4. Always skip blank.

The following is a routine of an early string function (adapted from the Matthew Wilson Extended STL, Volume 1 Chapter 27):

#include <iostream>

using namespace Std;

int main ()

{

Char str[] = "ABC,DEF;GHI,JKL;;";

char* outer = NULL;

char* inner = NULL;

for (outer = strtok (str, ";"); NULL!= outer; Outer = strtok (NULL, ";"))

{

printf ("Outer token:%s\n", Outer);

for (inner = strtok (outer, ","); NULL!= Inner; Inner = strtok (NULL, ","))

//{

printf ("Inner token:%s\n", Inner);

//}

}

return 0;

}

As the above program, if the annotation of that piece of code will cause the work is not normal, and will not achieve the purpose we want, the output may be as follows:

Outer Token:abc,def

Inner TOKEN:ABC

Inner Token:def

Please press any key to continue ...

Under Windows, Visual C + + 2005 provides a new security version of functions that can be addressed to some extent (Strtok_r has similar functionality under UNIX systems) because they are reentrant. The following is an upgraded version of the previous routine:

#include <iostream>

using namespace Std;

int main ()

{

Char str[] = "ABC,DEF;GHI,JKL;;";

char* outer = NULL;

char* inner = NULL;

char* pout = NULL;

char* pIn = NULL;

for (outer = strtok_s (str, ";", &pout); NULL!= outer; Outer = strtok_s (NULL, ";", &pout))

{

printf_s ("Outer token:%s\n", Outer);

for (inner = strtok_s (outer, ",", &pin); NULL!= Inner; Inner = strtok_s (NULL, ",", &pin))

{

printf_s ("Inner token:%s\n", Inner);

}

}

return 0;

}

The output on my machine is as follows:

szres:<abc> | | Ptok: <abc>

szres:<abc> | | Ptok: <def>

szres:<abc> | | Ptok: <g>

szres:<abc> | | Ptok: <efg>

Please press any key to continue ...

But even so, we can't solve the other problems it has, such as the 2, 3, 4 points mentioned just now.

three C + + StringStream

This is also a method can be used to participle, but in fact, not much, and the function is not strong enough, and many people can not very good grasp of stringstream, because we usually use too little.

#include <iostream>

#include <string>

#include <sstream>

using namespace Std;

int main ()

{

StringStream Str ("ABCD EFG KK DD");

String Tok;

while (Getline (str, Tok, '))

{

Cout<<tok << Endl;

}

return 0;

}

The output is as follows:

Abcd

Efg

Kk

Dd

Please press any key to continue ...

four boost string algorithm library

It has been introduced in my "C + + string in-depth 2.0", but that is not enough and will cover more of the content in the next 3.0 release. There are also generic algorithms and iterators that provide string segmentation in the string algorithm library. 4.1 Generic Algorithm

This approach, based on the concept of range, requires that we provide a container that can hold a split string, and here is a simple routine for it:

#include <iostream>

#include <vector>

#include <string>

using namespace Std;

#include <boost/algorithm/string.hpp>

int main ()

{

String SS ("helloworld! He.lloworld!he ");

vector<string> tmp;

separated by punctuation.

vector<string>& TT = boost::algorithm::split (tmp, SS, Boost::algorithm::is_punct ());

ASSERT (Boost::addressof (tmp) = = Boost::addressof (TT));

Copy (Tt.begin (), Tt.end (),ostream_iterator< string > (cout, "\ n"));

return 0;

}

Output:

HelloWorld

He

Lloworld

He

Please press any key to continue ...

We can see that we are not controllable with the entire split process, even if at some point we may only be interested in the first two words after the split, we also have to use the container to get and save all the results, which is unacceptable for a long string, perhaps we should be "on demand", So there's the iterator approach. 4.2 iterators

Boost::algorithm::split_iterator can be used to split strings, and it also needs to be paired with some finder (such as Token_finder) and an assertion (or a judgment). Of course we can also DIY a finder. The following is a simple routine:

#include <iostream>

#include <string>

using namespace Std;

#include <boost/algorithm/string.hpp>

int main ()

{

String str ("abc@ d*dd a");

boost::algorithm::split_iterator< string::iterator > Istr (

Str

Boost::algorithm::token_finder (boost::algorithm::is_any_of ("@*"))

);

boost::algorithm::split_iterator< string::iterator> end;

while (ISTR!= end)

{

cout<< *istr << Endl;

++ISTR;

}

return 0;

}

Output:

Abc

D

Dd

A

Please press any key to continue ...

This output may not be what we want, and the code can be modified a little bit:

boost::algorithm::split_iterator< string::iterator > Istr (

Str

Boost::algorithm::token_finder (

Boost::algorithm::is_any_of ("@*"),

BOOST::ALGORITHM::TOKEN_COMPRESS_ON)

);

This time will open the compression, the output may be as follows:

Abc

D

Dd

A

Please press any key to continue ...

Compared to Boost.tokenizer, the string algorithm library provides less segmentation techniques, if more functions, we still need to DIY a finder. DIY A finder is not complicated, we just need to make sure our finder has the same overloads as this:

template< TypeName Forwarditeratort >

Iterator_range<forwarditeratort>

Operator () (

Forwarditeratort Begin,

Forwarditeratort end) const;

It is up to you to decide what information you want to keep inside this finder. This is similar to the strategy adopted by Boost.tokenizer, so their two extensibility is strong. You should have said more about the boost string algorithm library because, after all, TR2 has it, but that's not the point of this document.

Five Boost.tokenizer

Boost.tokenizer is a specially provided string-word segmentation, which is itself made up of view containers and some iterators and iterator views. Although I think that the library may be removed as the boost string algorithm library matures and grows, there are some valuable things to study and learn about.

5.1 Components

5.1.1 Tokenizer

Tokenizer is a view container that itself does not contain specific data, and it exists in BOOST\TOKENIZER.HPP.

Template <

TypeName Tokenizerfunc = Char_delimiters_separator<char>,

TypeName iterator = Std::string::const_iterator,

TypeName Type = std::string

>

Class Tokenizer

Tokenizerfunc: A split tool that conforms to the TOKENIZERFUNC concept.

Iterator: An iterator used to access each split data.

Type: Used to hold the split data.

5.1.2 Token_iterator

Token_iterator is an iterator that is used to access our split data, which is located in the boost\token_iterator.hpp.

Template <class Tokenizerfunc, class iterator, class type>

Class Token_iterator

: Public iterator_facade<

Token_iterator<tokenizerfunc, iterator, type>

, Type

, TypeName detail::minimum_category<

Forward_traversal_tag

, TypeName Iterator_traversal<iterator>::type

>::type

, const type&

>

This is its statement that if you do not understand the concept of a new iterator and template metaprogramming, it may be difficult to understand this code, but it doesn't matter. What I'm simply telling you is that deriving from Iterator_façade makes it easy to get some iterator behavior, and you just need a derived class to implement some of the necessary member functions to fit the concept. A later template element procedure only guarantees that our iterator can only be a forward iterator, even if we use a random iterator to present the token_iterator as a forward iterator. Typically, deriving from iterator_façade requires that the class iterator_core_access be set to a friend class.

The iterator Token_iterator saves the following data members:

Tokenizerfunc f_;

Iterator Begin_;

Iterator End_;

BOOL Valid_;

Type Tok_;

They are: Word Tool class object, start position, end position, valid bit and result.

In the implementation of Token_iterator, these two functions are important:

void Increment () {

Boost_assert (VALID_);

Valid_ = F_ (Begin_,end_,tok_);

}

Const type& dereference () const {

Boost_assert (VALID_);

return tok_;

}

They correspond to the iterator's self augmentation and lifting operation respectively. So we can know that the pull operation returns only a constant reference, and that there is not much overhead, and for the increment, its overhead depends on the implementation of the Splitter tool. More directly, it depends on the implementation of the operator () of the split tool.

The conceptual model of 5.1.3 Word tool class (Tokenizerfunc)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.