PHP-PCRE regular expression performance

Source: Internet
Author: User
Tags character classes
PHP extension text processing-PCRE regular expression syntax-some of the performance patterns may be more efficient than others. For example, the use of character classes such as [aeiou] is more efficient than the optional path (a | e | I | o | u. In general, it is the most funny to describe the demand with the simplest possible structure. The Jeffrey Friedl book (proficient in regular expressions) contains a lot of discussions about the performance of regular expressions.

When a mode starts with. * and the PCRE_DOTALL option is set, the mode is implicitly anchored through PCRE because it can match the start of a string. However, if PCRE_DOTALL is not set, PCRE cannot perform this optimization because. metacharacters cannot match line breaks. if the target string contains line breaks, the pattern may start matching from the end of a line break, rather than the start position. For example, the mode (. *) second matches the target string "first \ nand second" (\ n is a line break). The first capture sub-group result is "and ". To do this, PCRE tries to match each line break in the target string.

If you use the pattern to match the target string without a linefeed, you can explicitly specify the ING to get the best performance by setting PCRE_DOTALL or starting with the pattern ^. This saves the PCRE time to start scanning and searching for linefeeds along the target string.

Infinite repeated nesting in careful mode. This may cause a long running time when applying a unmatched string. Consider the mode fragment (a + )*.

This mode can be used to match "aaaa" in 33 ways, and the number will rapidly increase with the length of the string. (* repeat can match 0, 1, 2, 3, and 4 times. in addition to 0, each condition has a matched number of times ). When the remaining part of the pattern causes the entire match to fail, PCRE tries every possible change in principle, which will be very time-consuming.

In some simple cases, optimization is like (a +) * B followed by using the original string .. Before getting started with the formal match, PCRE checks whether the target string contains the "B" character. if not, it immediately fails. However, this optimization is not available when there are no original characters. You can compare the behavior differences between (a +) * \ d and the above pattern. The former reports failures almost immediately when applying a string consisting of "a" to the entire line, while the latter reports a considerable time consumption when the target string is longer than 20 characters.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.