Regular Expressions, grouping, submatching (sub-mode), non-capturing sub-matching (sub-mode), and regular expression capturing

Source: Internet
Author: User

Regular Expressions, grouping, submatching (sub-mode), non-capturing sub-matching (sub-mode), and regular expression capturing

We know that the regular expression has a lot of characters to indicate the number of matches (quantifiers), which can be used to repeat the number of times of a single character. Sometimes, we may need to match the number of times multiple characters appear together. At this time, we need to group. It is to enclose these characters in parentheses, which refer to the subexpression (also called grouping ). Then you can specify the number of times the subexpression repeats. You can also perform other operations on the subexpression. At this time, we can regard a group of characters in parentheses as a whole.

Group mode example

For example, search for strings that contain multiple consecutive win strings. You can do this.

<?php $str = "this is win winwindows!";preg_match_all("/(win)+/",$str,$marr);var_dump($marr);

It does not need grouping mode. Can it match multiple characters at the same time? We found that the previous operating Symbol: [win] +, although it can match the winwin character, because it represents one or more characters in the combination of w, I, and n, without limiting the order. As you can see, it will match like wwin, www, inw, etc. As long as it is composed of these three characters, multiple characters are matched successfully.

How can we find two results for each one? This is the sub-mode (sub-match). By default, in addition to combining multiple characters into one whole, this part of the expressions enclosed by parentheses will also be stored in a temporary buffer, so that the subsequent regular expression is called. In the above example, we do not need to call it later. Therefore, how can we block this subexpression from capturing content? You only need to add "? . Let's take a look at the following example of a regular expression grouping non-capture mode.

What are the advantages of the non-capture mode? From the above, we can reduce the number of captures and the number of matching times. Therefore, it is not necessary to add a non-capture prefix to the grouping expression "? : ", Which can save memory overhead and increase matching speed!

When we just talked about regular expression grouping, the content captured by the subexpression is stored in a cache by default. For subsequent calls. What is the situation? In fact, this is a reference to a regular expression. Each captured sub-match is stored sequentially from left to right in the regular expression mode. The buffer number that stores the sub-match starts from 1 and can be generally stored as 9. So that the following expression references this value, also known as backward reference.

Let's take a look at the following example to find a string that is not adjacent to each other and appears multiple times the system word: add.

<?php$str = "add123456addasdf"; preg_match_all('/(add)\d+\1/',$str,$marr); var_dump($marr);

Reverse references are often used to handle some special matching situations. For example, search for strings that are not adjacent to duplicate strings. Search for html content in a pair of tags. Special Analysis of html is very common (Note: if reverse reference is used, the submatching capture cannot be blocked before, that is, it cannot be added "? : "Prefix ). It is often used:

<? Php $ str = file_get_contents ('HTTP: // blog.chacuo.net/'); preg_match_all ('/<(\ S +) [^>] *> [^ <] * <\/\ 1>/', $ str, $ marr); var_dump ($ marr); // (\ S +) indicates all characters other than non-display characters, generally, the name of an html tag is in the format of <other tag attributes> followed by [^>] *. All other attributes in the tag are matched. // [^ <] * Indicates <tag ...> the intermediate content </tag> indicates the intermediate content and ends with "<, therefore, match all the [^ <] * characters // The Last <\/\ 1> "\/" escape "/" characters, "\ 1" indicates (\ S +) before the reverse application)

The above are the important regular expressions used. grouping, reverse matching, and non-capturing grouping instructions and instances. I hope this will be helpful to my friends who want to change the attributes. At the same time, welcome to exchange!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.