[Original] Regular expression pattern matching getting Started

Source: Internet
Author: User
Tags first string

Introduced

There is a feature implementation in the actual project that needs to parse some specific patterns of strings. In the existing code base, in the implemented part of the functionality, are used to detect specific characters, the disadvantage of using this method is:

    • It's logically easy to make mistakes.
    • It's easy to miss out on some boundary condition checks.
    • Code complexity difficult to understand, maintenance
    • Poor performance

See the code base has a CPP, the entire CPP 2000 lines of code, there is a method, the light parsing string of more than 400 lines! A comparison of characters in the past, really unsightly. And many of the above comments have expired, a lot of code writing style is also different, basic can be judged by a lot of hands.

In this case, basically there is no way to go down the old road, it is natural to think of the use of regular expressions. And I myself have no practical experience in the regular expression, especially for writing matching rules is smattering. The first time you want to find some information from the Internet, the first general understanding. But the results of the mother still let people very disappointed. (Of course, if you want to find some more professional knowledge, the results of the Niang every time will let people break, all is a copy of the same. But the usual degree of Niang life is still can be to give up degrees Niang query results, FQ to outside to find, also found some more basic video (need FQ).

This article can be said to be a summary of the basic knowledge of writing a regular expression matching string. The following two sections are mainly divided into:

    1. Basic rules for matching strings
    2. Regular match, find and replace

The regular expression rule described in this article is ECMAScript. The programming language used is C + +. Other aspects of the non-introduction.

Basic rules for matching strings

1. Match a fixed string

Regex e ("ABC");

2. Match fixed string, case insensitive

Regex e ("ABC", Regex_constants::icase);

3. Match a fixed string more than one character, case-insensitive

Regex e ("ABC.", regex_constants::icase); // .  Any character except NewLine. 1 characters

4. Match 0 or 1 characters

Regex e ("ABC?"); // ? Zero or 1 preceding character. Match? previous character

5. Match 0 or more characters

Regex e ("abc*"); //*zero or more preceding character. Match * Previous character

6. Match 1 or more characters

Regex e ("abc+"); //+one or more preceding character. Match + previous character

7. Match characters in a specific string

Regex e ("ab[cd]*"); // [...] Any character inside square brackets. Match any character within []

8. Matching characters of non-specific strings

Regex e ("ab[^cd]*"); // [...] Any character not inside square brackets. Matches any character that is not in []

9. Match a specific string, and specify the number

Regex e ("ab[cd]{3}"); //{n} matches any character before {} and has a number of 3 characters

10. Match a specific string, specify the number range

Regex e ("ab[cd]{3,}"); //{n} matches any character before {} with a number of 3 or more than 3 regex e ("ab[cd]{3,5}"); //{n} matches any character before {}, and the number of characters is more than 3, and 5 of the following closed intervals

11. Match a rule in a rule

Regex e ("ABC|DE[FG]"); //| match | Any one of the rules on either side
12. Matching Grouping
Regex e ("(ABC) de+"); // ()       () represents a sub-group

13. Matching sub-groups

Regex e ("(ABC) de+\\1"); // ()       () represents a sub-group, and \1 represents the contents of the first grouping in this position regex e ("(ABC) c (de+) \\2\\1"); //\2 indicates that the content of the second grouping is matched here

14. Match the beginning of a string

Regex e ("^abc."); //^begin of the string finds substrings beginning with ABC

15. Match the end of a string

Regex e ("abc.$"); //$end of the string to find substrings ending in ABC

The above is the most basic matching pattern of writing. Typically, if you want to match a specific character, you need to escape with \, such as a match in a string that matches ".", then the match string should be preceded by a certain character. Out of the above basic rules, if you do not meet the specific needs, then you can refer to this link. With the understanding of basic matching patterns, you need to use regular expressions to match, find, or replace.

Regular match, find and replace

After writing the pattern string, the string to be matched and the pattern string must be matched in a regular way. There are three ways: match (Regex_match), find (Regex_search), replace (regex_replace).

The match is straightforward, passing the string to be matched and the pattern string directly into Regex_match, returning a bool quantity to indicate whether the string to be matched satisfies the rule of the pattern string. Matches the entire STR string.

bool match = Regex_match (str, e); //Match entire string str

The lookup is a substring that is found in the entire string and satisfies the pattern string. That is, it returns true as long as the satisfy pattern string exists in Str.

bool match = Regex_search (str, e); //Find substrings in string str that match e rules

However, in many cases, it is not enough to return a matching bool quantity, we need to get the matching substring. Then you need to group the matching strings in the pattern string, referring to the "basic rules for matching strings" 12th. Once the smatch is passed into the regex_search, you get a string that satisfies each sub-group.

Smatch m; bool found = Regex_search (str, M, e);

for (int n = 0; n < m.size (); ++n)
{
cout << "m[" << n << "].str () =" << m[n].str () << Endl;
}

The substitution is also done based on the pattern string in the case of grouping.

cout << regex_replace (str, E, "are on $");

At this point, the string that satisfies grouping 1 and 2 is added in the middle of "is".

The above three functions have many versions of overloads that can meet the needs of different situations.

Actual combat

Requirements: Find the pattern string that satisfies Sectiona ("sectionb") or Sectiona ("sectionb"). and isolate the Sectiona and sectionb. Sectiona and sectionb do not appear as numbers, characters can be case-sensitive, or at least one character.

Analysis: According to the requirements, can be broadly divided into two parts, namely Sectiona and Sectionab. This is the need to use the grouping.

First step: Write a pattern string that satisfies the section case

[a-za-z]+

Step Two: Spaces may appear in Sectiona and sectionb. Assume that there are at most 1 spaces

\\s?
Combine the above two cases, which is the pattern string that satisfies our needs. But how can you organize it into two groups?
[a-za-z]+\\s[a-za-z]+

This is definitely not the case, according to the grouping rules, you need to distinguish the group by ()

Regex e ("([a-za-z]+) \\s?\\ (\" ([a-za-z]+) \ "\ \)");

At this point \ \ (\) After \\s is escaped in order to satisfy the SECTIONB outer quotation marks and parentheses.

After completion, you can use Regex_match to match, if matching, then continue to use Regex_search to find the string

if (Regex_match (str, E)) {Smatch m; Auto found = Regex_search (str, M, e);  for (int n = 0; n < m.size (); ++n) {cout << "m[" << n << "].str () =" << m[n].str () << Endl;}} Else {cout << "notmatched" << Endl;}

The first string of the object m array is the entire substring that satisfies the requirement, followed by the substring that satisfies grouping 1, grouping 2.

[Original] Regular expression pattern matching getting Started

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.