Performance analysis of Boost and C Regex on short target string regular expression matching

Source: Internet
Author: User

Yesterday, we summarized the performance of various regular expression matching libraries under the long target string and concluded that Boost regex has the best performance. Today, I applied it to the project. Naturally, the performance loss caused by long string matching is basically lost. Of course, the current scale is not too large, however, Boost can fully meet the requirements within an predictable scale.

However, after both Boost and C are affected by long string matching, the rest are short String Matching. It is found that Boost is not much better than C, for example, if a 10000 + short character match is successful or unsuccessful, the Boost regex + system takes about Ms for other modules, and the C regex + system takes about Ms for other modules, the difference between the two is very small.

Of course, at this time, I used Boost regex in the project to repeatedly compile the mode string without pre-compilation.

After the following research, you will find that the C regex mode character string pre-compilation and non-pre-compilation in long strings mentioned yesterday have little impact on the performance difference, however, short strings cannot be ignored, and the impact on Boost Regex is even greater.

1. Boost regex performance analysis

First, we replace the test string from yesterday with a short string, and then test the performance difference between the non-pre-compiled and pre-compiled strings in Boost regex.

The test code is as follows. The macro definition # define PRE_COMP (0) indicates no pre-compilation, and # define PRE_COMP (1) indicates pre-Compilation. The two modes match 10000 times in each loop.

/** Program: * This program test boost regex performance for short target string * Platform * Ubuntu14.04 g ++-4.8.2 * History: * weizheng 2014.11.07 1.0 */# include <boost/regex. hpp> # include <sys/time. h> # include <cstdio>/** choice whether to pre-complie the pattern string, 1 or 0 */# define PRE_COMP (1) const int LOOP_COUNT = 10000; /*********************************** main ** ********************* * ***********/Int main () {# if PRE_COMP boost: regex pattern ("commonquery. jsp. * keyName = YwdjywcQueryMain & queryAction = cxywcdj "); std: string target ="/YGFMISWeb/faces/query/commonquery. jsp? ^ Query/keyName = YwdjywcQueryMain & queryAction = cxywcdj "; # endif/** record the start time */struct timeval TV _start, TV _end; gettimeofday (& TV _start, NULL ); int count = 0; for (int I = 0; I <LOOP_COUNT; I ++) {# if! PRE_COMPboost: regex pattern ("commonquery. jsp. * keyName = YwdjywcQueryMain & queryAction = cxywcdj "); std: string target ="/YGFMISWeb/faces/query/commonquery. jsp? ^ Query/keyName = YwdjywcQueryMain & queryAction = cxywcdj "; # endifif (boost: regex_search (target, pattern) {count ++ ;}} /** record the end time */gettimeofday (& TV _end, NULL); unsigned long time_used = (TV _end. TV _sec * 1000000 + hour-(TV _start. TV _sec * 1000000 + TV _start. TV _usec)/1000; printf ("used: % lu ms \ n", time_used); printf ("matched % d times \ n", count); return 0 ;}
The result is as follows:

[Email protected]: ~ /Test $./boost_regex_main used: 38 msmatched 10000 times
The pre-compilation will be much better:

[Email protected]: ~ /Test $./boost_regex_main used: 11 msmatched 10000 times
Therefore, for Boost regex, you must use the pre-compiled string mode. Do not compile the mode string until it is matched.

2. The performance analysis of C regex is similar to the test Boost. The macro is used to select whether to pre-compile the mode string or pre-compile. The test code is as follows. For details about match_pre_comp () and match (), see the previous article.
/** Program: * This program test c regex performance for short target string * Platform * Ubuntu14.04 gcc-4.8.2 * History: * weizheng 2014.11.07 1.0 */# include <sys/time. h> # include <stdio. h> # include "regex. h "/** choice whether to pre-complie the pattern string, 1 or 0 */# define PRE_COMP (0) # define LOOP_COUNT (10000) /*********************************** main ** ********************************* */Int main (void) {char pattern [] = "commonquery. jsp. * keyName = YwdjywcQueryMain & queryAction = cxywcdj "; char target [] ="/YGFMISWeb/faces/query/commonquery. jsp? ^ Query/keyName = YwdjywcQueryMain & queryAction = cxywcdj "; # if PRE_COMPregex_t oRegex; if (regcomp (& oRegex, pattern, 0) printf (" regex complie error \ n "); # endif/** record the start time */struct timeval TV _start, TV _end; gettimeofday (& TV _start, NULL);/** matching */int count = 0; for (int I = 0; I <LOOP_COUNT; I ++) {# if PRE_COMPif (match_pre_comp (& oRegex, target) # endif # if! PRE_COMPif (match (pattern, target) # endif {count ++;}/** record the end time */gettimeofday (& TV _end, NULL ); unsigned long time_used = (TV _end. TV _sec * 1000000 + percent-(TV _start. TV _sec * 1000000 + percent)/1000; # if PRE_COMPregfree (& oRegex); # endifprintf ("used: % lu ms \ n ", time_used); printf (" matched % d times \ n ", count); return 0 ;}
The result is as follows:
[Email protected]: ~ /Test $./c_regex_main used: 52 msmatched 10000 times
Pre-compilation:
[Email protected]: ~ /Test $./c_regex_main used: 42 msmatched 10000 times
In fact, compared to the two modes of long strings, this is also not a small difference, mainly depends on the percentage, if the scale is larger, do not pre-compile 10000 ms, then pre-compile is 8000 ms. At the same time, we can see that Boost and C regex do not differ much when they do not pre-compile the mode string (of course, it is a comparison of long strings ). 3. The following table compares Boost regex and C regex.
Therefore, we finally conclude that the Boost regex library is used and the pre-compiled mode string is used.


Performance analysis of Boost and C Regex on short target string regular expression matching

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.