Introduction to boost tokenizer
-------------------------
1. Introduction
Boost tokenizer provides the ability to convert character sequences into a set of tokens. Of course, you can also define the tokenizerfunction to define the splitting symbols of sequences. If this parameter is not specified, it is separated by spaces by default, remove some punctuation marks.
2. A few simple examples
The following is a simple example:
// simple_example_1.cpp#include<iostream>#include<boost/tokenizer.hpp>#include<string>int main(){ using namespace std; using namespace boost; string s = "This is, a test"; tokenizer<> tok(s); for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){ cout << *beg << "\n"; }}
The result is as follows:
This
Is
A
Test
Punctuation marks have been filtered out here.
The following is an example of splitting by character step:
// Simple_example_3.cpp # include <iostream> # include <boost/tokenizer. HPP >#include <string> int main () {using namespace STD; using namespace boost; string S = "12252001"; int offsets [] = {, 4 }; // three steps are specified here: offset_separator F (offsets, offsets + 3); tokenizer <offset_separator> Tok (S, f); For (tokenizer <offset_separator> :: iterator beg = Tok. begin (); beg! = Tok. End (); ++ beg) {cout <* beg <"\ n ";}}
The result is as follows:
12
23
2001
3. What is tokenizerfunction?
Tokenizerfunction is a token used to query matching requirements. Currently, three tokenizerfunction templates are provided,
* Escaped_list_separator is mainly used to parse strings in CSV format.
Explicit escaped_list_separator (char E = '\', char c = ',', char q = '\"')
Escaped_list_separator (string_type E, string_type C, string_type q ):
* Offset_separator is mainly used to parse requirements based on specific step sizes.
Template <typename ITER>
Offset_separator (ITER begin, ITER end, bool bwrapoffsets = true, bool breturnpartiallast = true)
* Char_separator is mainly used to parse the requirements based on specific character segmentation.
Explicit char_separator (const char * dropped_delims,
Const char * kept_delims = "",
Empty_token_policy empty_tokens = drop_empty_tokens)
4. A simple example of parsing/etc/passwd
/** * @auth lemo.lu * @date 2011.11.03 * * example of Boost tokenizer template usage,This example uses delimiter * separator. */// stl header#include <iostream> // iostream#include <string> // string#include <fstream> // ifstream// boost#include <boost/tokenizer.hpp> // boost Tokenizerint main(){ std::ifstream passwdFile; passwdFile.open("/etc/passwd",std::ifstream::in); // store password line char passwdString[256]; typedef boost::tokenizer<boost::char_separator<char> > passwdTokenizer; // set a TokenizerFunction , dropped delimiters ":" and keep delimiters "" boost::char_separator<char> tokenSep(":", "", boost::keep_empty_tokens); // passwd format information static const char* passwd_st[] = { "Account","password","UID","GID","GECOS","Dir","Shell" }; // iterator the passwd file while(passwdFile.good()) { // get line passwdFile.getline(passwdString,256); passwdTokenizer tok(std::string(passwdString), tokenSep); int passwd_c = 0; for(passwdTokenizer::iterator curTok=tok.begin(); curTok!=tok.end(); ++curTok) std::cout << passwd_st[passwd_c++] << ":" << *curTok << std::endl; std::cout << "---------------------" << std::endl; }passwdFile.close();}
Some results are as follows:
Account: Root
Password: x
UID: 0
GID: 0
GECOS: Root
Dir:/root
Shell:/bin/bash
---------------------
Account: Daemon
Password: x
UID: 1
GID: 1
GECOS: Daemon
Dir:/usr/sbin
Shell:/bin/sh
---------------------
5. Reference
Http://www.boost.org/doc/libs/1_47_0/libs/tokenizer/