How does the regular expression pattern match strings?

Source: Internet
Author: User
Regular expressions, also known as regular expressions. A concept of computer science. Regular tables are often used to retrieve and replace text that conforms to a pattern (rule). Many programming languages support the use of regular expressions for string manipulation. In this article we will share the basics of regular expression pattern matching strings.

There is a feature implementation in the actual project that needs to parse some specific patterns of strings. In the existing code base, in the implemented part of the functionality, are used to detect specific characters, the disadvantage of using this method is:

It's logically easy to make mistakes.

It's easy to miss out on some boundary condition checks.

Code complexity difficult to understand, maintenance

Poor performance

See the code base has a CPP, the entire CPP 2000 lines of code, there is a method, the light parsing string of more than 400 lines! A comparison of characters in the past, really unsightly. And many of the above comments have expired, a lot of code writing style is also different, basic can be judged by a lot of hands. In this case, basically there is no way to go down the old road, it is natural to think of the use of regular expressions.

This article can be said to be a summary of the basic knowledge of writing a regular expression matching string. The following two sections are mainly divided into:

Basic rules for matching strings

Regular match, find and replace

The regular expression rule described in this article is ECMAScript. The programming language used is C + +. Other aspects of the non-introduction.

Basic rules for matching strings

1. Match a fixed string

Regex e ("abc");

2. Match fixed string, case insensitive

Regex e ("abc", Regex_constants::icase);

3. Match a fixed string more than one character, case-insensitive

Regex e ("ABC.", Regex_constants::icase);  // .  Any character except NewLine. 1 characters

4. Match 0 or 1 characters

Regex e ("ABC?");    // ?  Zero or 1 preceding character. Match? previous character

5. Match 0 or more characters

Regex e ("abc*");    *  Zero or more preceding character. Match * Previous character

6. Match 1 or more characters

Regex e ("abc+");    +  One or more preceding character. Match + previous character

7. Match characters in a specific string

Regex e ("ab[cd]*");    // [...] Any character inside square brackets. Match any character within []

8. Matching characters of non-specific strings

Regex e ("ab[^cd]*");    // [...] Any character not inside square brackets. Matches any character that is not in []

9. Match a specific string, and specify the number

Regex e ("ab[cd]{3}"); {n} matches any character before {} and has a number of 3 characters

10. Match a specific string, specify the number range

Regex e ("ab[cd]{3,}");  {n} matches any character preceding {} with a number of 3 or more than 3 regex e ("ab[cd]{3,5}");  {n} matches any character before {}, and the number of characters is more than 3, 5 below the closed interval


11. Match a rule in a rule

Regex e ("ABC|DE[FG]");    // |  Match | Any one of the rules on either side

12. Matching Grouping

Regex e ("(ABC) de+"); () () denotes a sub-group

13. Matching sub-groups

Regex e ("(ABC) de+\\1");  ()    () () represents a sub-group, while \1 represents the content of the first grouping in this position regex e ("(ABC) C (de+) \\2\\1");  \2 indicates that the content of the second grouping is matched here


14. Match the beginning of a string

Regex e ("^abc."); ^ Begin of the string finds substrings beginning with ABC


15. Match the end of a string

Regex e ("abc.$");//$ end of the string to find substrings ending in ABC


The above is the most basic matching pattern of writing. Typically, if you want to match a specific character, you need to escape with \, such as a match in a string that matches ".", then the match string should be preceded by a certain character. Out of the above basic rules, if you do not meet the specific needs, then you can refer to this link. With the understanding of basic matching patterns, you need to use regular expressions to match, find, or replace.

Regular match, find and replace

After writing the pattern string, the string to be matched and the pattern string must be matched in a regular way. There are three ways: match (Regex_match), find (Regex_search), replace (regex_replace).

The match is straightforward, passing the string to be matched and the pattern string directly into Regex_match, returning a bool quantity to indicate whether the string to be matched satisfies the rule of the pattern string. Matches the entire STR string.

BOOL match = Regex_match (str, e);//Match entire string str



The lookup is a substring that is found in the entire string and satisfies the pattern string. That is, it returns true as long as the satisfy pattern string exists in Str.

BOOL match = Regex_search (str, e);//Find substrings in string str that match e rules


However, in many cases, it is not enough to return a matching bool quantity, we need to get the matching substring. Then you need to group the matching strings in the pattern string, referring to the "basic rules for matching strings" 12th. Once the smatch is passed into the regex_search, you get a string that satisfies each sub-group.

Smatch M;bool found = Regex_search (str, M, e); for (int n = 0; n < m.size (); ++n)  {    cout << "m[" << N << "].str () =" << m[n].str () << Endl;  }


The substitution is also done based on the pattern string in the case of grouping.

cout << regex_replace (str, E, "are on $");


At this point, the string that satisfies grouping 1 and 2 is added in the middle of "is".

The above three functions have many versions of overloads that can meet the needs of different situations.

Actual combat

Requirements: Find the pattern string that satisfies Sectiona ("sectionb") or Sectiona ("sectionb"). and isolate the Sectiona and sectionb. Sectiona and sectionb do not appear as numbers, characters can be case-sensitive, or at least one character.

Analysis: According to the requirements, can be broadly divided into two parts, namely Sectiona and Sectionab. This is the need to use the grouping.

First step: Write a pattern string that satisfies the section case

[a-za-z]+

Step Two: Spaces may appear in Sectiona and sectionb. Assume that there are at most 1 spaces

\\s?

Combine the above two cases, which is the pattern string that satisfies our needs. But how can you organize it into two groups?

[a-za-z]+\\s[a-za-z]+

This is definitely not the case, according to the grouping rules, you need to distinguish the group by ()

Regex e ("([a-za-z]+) \\s?\\ (\" ([a-za-z]+) \ "\ \)");

At this point \ \ (\) After \\s is escaped in order to satisfy the SECTIONB outer quotation marks and parentheses.

After completion, you can use Regex_match to match, if matching, then continue to use Regex_search to find the string

if (Regex_match (str, E)) {smatch m; auto found = Regex_search (str, M, e); for (int n = 0; n < m.size (); ++n) {cout < ;< "m[" << n << "].str () =" << m[n].str () << Endl;}} else{cout << "not matched" << Endl;}

The first string of the object m array is the entire substring that satisfies the requirement, followed by the substring that satisfies grouping 1, grouping 2.

The above is the basic knowledge of the regular expression pattern matching string, I hope it will be helpful to everyone.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.