keywords in this article:Regular expression C + + Python software A regular expression is a pattern-matching form that is typically used in a processed text program. For example, the grep tool that we use frequently, or the Perl language uses regular expressions.
A regular expression is a pattern-matching form that is typically used in a processed text program. For example, the grep tool that we use frequently, or the Perl language uses regular expressions. The traditional C + + processing regular expression is very troublesome, this also becomes the laughingstock of many other language lovers, now the situation is different, because has the boost.
Boost is a template-based development source code base, where there are many sub-libraries to efficiently handle all aspects of the problem, such as string splitting, formatting, threading and so on, boost for every C + + enthusiast should know, for C + + If the builder user is skilled in using the VCL in the case of the use of boost, I think it will be more powerful.
In general, the use of boost is very simple, and the use of other STL libraries is not very different, but the use of boost regular expression library is not so easy, because this library also requires us to compile separately, I will detail how to use.
If you don't know or have boost yet, you can go to www.boost.org to download the latest version, the author uses version 1.30. Unzip the downloaded zip package [1] into any directory you like, such as D:boost.
Compiling the regular expression library
As mentioned earlier, this library needs to be compiled separately to use, why not compile and publish it together? The main consideration is that different compilers require different link library files and link libraries too large. Under command line, enter the [%boost]libsregexbuild directory, and directly tap make– Fbcb6.mak command to start compiling, please note here, if you have BCB5 installed on your computer, please be sure to set the path to the BCB6 Bcc32.exe program is located in the directory, otherwise you may use the BCB5 make program, although it can compile but finally can not be used.
The compilation process is time-consuming, you need to wait patiently, the final compilation is completed, the [%boost]libsregexbuild directory generated a BCB6 directory, in this directory generated a lot of lib files and DLL files, all DLL files copied to the Windows system directory, So the Lib file is copied to the Bcb6lib directory. If you don't want to be so troublesome to copy files, you can add the install parameters at compile time, like this make–fbcb6.mak install, but the author prefers the previous one, so I can know exactly what files were generated. Now that the compilation is complete, you can show the magic of boost.
A test program
Create the console program in BCB6 and write the following code:
#include <deque> #include <iostream> #include <algorithm> #include <boost/regex.hpp> int main () { using namespace boost; using namespace Std; Regex expression ("s+hrefs*=s*" ([^ "]*)" ", regbase::normal|regbase::icase); String s= "<a href=" index.html "></a>"; deque<string> result; Regex_split (Std::back_inserter (Result), s,expression); Copy (Result.begin (), Result.end (),ostream_iterator<string> (cout, "n")); int C; cin>>c; return 0; } |
Set the BCB6 project properties of the Lib path and include path for you to install the Boost directory and run the results you will see:
Index.html
You can see that index.html has been raised from the string, so why is that so?
The core part of the code is:
Regex expression ("s+hrefs*=s*" ([^ "]*)" ", regbase::normal|regbase::icase);
It is used to set how to match a string, the above messy string is difficult to understand, if you do not understand the regular expression rules, the above code can be comparable with the heavenly book.
Regbase::normal|regbase::icase is the resolution parameter setting, which can be referenced in the Boost help documentation.
Rules for the writing of regular expressions
Specific writing rules, you can see the boost of the documentation, I here to do a brief description:
. (dot) |
Used to match any one character, but not to include characters on new lines |
* |
Closed, self-repeating connection with any finite number of times |
+ |
Limited-time self-repeating connection, but at least one occurrence |
{} |
Specify the number of possible repetitions |
For example: ba* matching b ba Baa baaa, etc. ba+ matching ba baa baaaaaaaaa, etc. ba matching ba baa baaa baaaa baaaaa |
|
The escape character, which has many uses, varies according to the parameter settings, most commonly similar to C language usage |
S |
Match spaces |
W |
Match a word |
D |
Match numbers |
() |
There are two ways to use it: 1 is the effect of merging, for example (AB) * Matching AB abab ababab etc. 2 is to determine the match, meaning that the characters in the () will be eventually disassembled. |
According to the above table, we can easily know how to explain the previous paragraph of the Heavenly Book.
A practical example
There was a post on the csdn a while ago, the problem is that there is a file structure such as (similar):
@People {
Age=19
Speek= "Hay,,how is You"
}
Ask how to split a string to get the name after the @, = property name and attribute value on both sides, and the name of {} in quotation marks.
It is more appropriate to solve this problem with regular expressions.
Based on the analysis, we can construct matching rules like this:
"@(.*?) s*{"match at the beginning of the word a, the following two types of how to construct the matching rules left to everyone to think about it."
So we can easily disassemble this example.
Performance analysis
Through the discussion above, you have already learned the power of boost, and what about that performance? So let's actually split up a complex HTML code to see how much time it takes.
In order to save space, the HTML code is not listed here, but can tell you, this is a word generated by the size of 186K HTML file, the file used a lot of <table> tags, so i test here to split all <table> The Width property of the label. The test code is as follows:
#include <deque> #include <iostream> #include <algorithm> #include <boost/regex.hpp> #include <vcl.h> int main () { using namespace boost; using namespace Std; tstringlist* html=new tstringlist (); Html->loadfromfile ("d:1.htm"); Regex expression ("s+width= ([^"]*) s+ ", regbase::normal|regbase::icase); DWORD Start=gettickcount (); for (int n=0;n { String S=html->strings[n].c_str (); deque<string> result; Regex_split (Std::back_inserter (Result), s,expression); Copy (Result.begin (), Result.end (),ostream_iterator<string> (cout, "n")); Result.clear (); } Start=gettickcount ()-start; Delete html; cout<<start; int C; cin>>c; return 0; } |
The output is 671 milliseconds, split to get 1072 width property values, we can see that boost efficiency is very high, although compared with some corner of the language is still slow, but can meet most programming requirements. In addition, the author's computer configuration is not very high, I believe that to get now any mainstream configuration of the computer will be better than the results of the author.
Conclusion
In fact, the power above is just the tip of the iceberg of boost, if you don't realize it yourself, you can hardly imagine the power of boost. There are many libraries used in boost, such as formatted output, string splitting, type conversion, and so on, which are easy to use, so you can refer to the boost documentation yourself. There are two libraries in these libraries that need to be compiled themselves, they are python and thread libraries, and the compilation of these libraries requires a dedicated tool jam, so we compile the jam tool when compiling these libraries, and the compilation Jam tool is not a happy thing, Trouble also occurs if you have multiple compilers installed, and if the reader is interested, you can try it yourself.
However, BCB6 does not support all boost libraries, and it can be seen from the compiler support table provided by boost (2],BCB6 or quite a few libraries are not supported, the best support is gcc/g++ compiler, but not all support. Hopefully Borland the next C + + compiler that will be published can support more C + + standards.
[1] There are other types of packages in fact, but under Windows system, you'd better download the zip package
[2] The boost provided by the compiler support table is for BCB5, and for BCB6 's support authors have not been tested in detail, if the reader is interested in testing the test code included with the boost.
http://blog.csdn.net/xiang_521/article/details/8890084
Using the Boost regular expression library on C + + BUILDER6