Regular expressions, also known as regular expressions. A concept of computer science. Regular tables are often used to retrieve and replace text that conforms to a pattern (rule). Many programming languages support the use of regular expressions for string manipulation. In this article we will share the basics of regular expression pattern matching strings.
There is a feature implementation in the actual project that needs to parse some specific patterns of strings. In the existing code base, in the implemented part of the functionality, are used to detect specific characters, the disadvantage of using this method is:
It's logically easy to make mistakes.
It's easy to miss out on some boundary condition checks.
Code complexity difficult to understand, maintenance
Poor performance
See the code base has a CPP, the entire CPP 2000 lines of code, there is a method, the light parsing string of more than 400 lines! A comparison of characters in the past, really unsightly. And many of the above comments have expired, a lot of code writing style is also different, basic can be judged by a lot of hands. In this case, basically there is no way to go down the old road, it is natural to think of the use of regular expressions.
This article can be said to be a summary of the basic knowledge of writing a regular expression matching string. The following two sections are mainly divided into:
Basic rules for matching strings
Regular match, find and replace
The regular expression rule described in this article is ECMAScript. The programming language used is C + +. Other aspects of the non-introduction.
Basic rules for matching strings
1. Match a fixed string
Regex e ("abc");
2. Match fixed string, case insensitive
Regex e ("abc", Regex_constants::icase);
3. Match a fixed string more than one character, case-insensitive
Regex e ("ABC.", Regex_constants::icase); // . Any character except NewLine. 1 characters
4. Match 0 or 1 characters
Regex e ("ABC?"); // ? Zero or 1 preceding character. Match? previous character
5. Match 0 or more characters
Regex e ("abc*"); * Zero or more preceding character. Match * Previous character
6. Match 1 or more characters
Regex e ("abc+"); + One or more preceding character. Match + previous character
7. Match characters in a specific string
Regex e ("ab[cd]*"); // [...] Any character inside square brackets. Match any character within []
8. Matching characters of non-specific strings
Regex e ("ab[^cd]*"); // [...] Any character not inside square brackets. Matches any character that is not in []
9. Match a specific string, and specify the number
Regex e ("ab[cd]{3}"); {n} matches any character before {} and has a number of 3 characters
10. Match a specific string, specify the number range
Regex e ("ab[cd]{3,}"); {n} matches any character preceding {} with a number of 3 or more than 3 regex e ("ab[cd]{3,5}"); {n} matches any character before {}, and the number of characters is more than 3, 5 below the closed interval
11. Match a rule in a rule
Regex e ("ABC|DE[FG]"); // | Match | Any one of the rules on either side
12. Matching Grouping
Regex e ("(ABC) de+"); () () denotes a sub-group
13. Matching sub-groups
Regex e ("(ABC) de+\\1"); () () () represents a sub-group, while \1 represents the content of the first grouping in this position regex e ("(ABC) C (de+) \\2\\1"); \2 indicates that the content of the second grouping is matched here
14. Match the beginning of a string
Regex e ("^abc."); ^ Begin of the string finds substrings beginning with ABC
15. Match the end of a string
Regex e ("abc.$");//$ end of the string to find substrings ending in ABC
The above is the most basic matching pattern of writing. Typically, if you want to match a specific character, you need to escape with \, such as a match in a string that matches ".", then the match string should be preceded by a certain character. Out of the above basic rules, if you do not meet the specific needs, then you can refer to this link. With the understanding of basic matching patterns, you need to use regular expressions to match, find, or replace.
Regular match, find and replace
After writing the pattern string, the string to be matched and the pattern string must be matched in a regular way. There are three ways: match (Regex_match), find (Regex_search), replace (regex_replace).
The match is straightforward, passing the string to be matched and the pattern string directly into Regex_match, returning a bool quantity to indicate whether the string to be matched satisfies the rule of the pattern string. Matches the entire STR string.
BOOL match = Regex_match (str, e);//Match entire string str
The lookup is a substring that is found in the entire string and satisfies the pattern string. That is, it returns true as long as the satisfy pattern string exists in Str.
BOOL match = Regex_search (str, e);//Find substrings in string str that match e rules
However, in many cases, it is not enough to return a matching bool quantity, we need to get the matching substring. Then you need to group the matching strings in the pattern string, referring to the "basic rules for matching strings" 12th. Once the smatch is passed into the regex_search, you get a string that satisfies each sub-group.
Smatch M;bool found = Regex_search (str, M, e); for (int n = 0; n < m.size (); ++n) { cout << "m[" << N << "].str () =" << m[n].str () << Endl; }
The substitution is also done based on the pattern string in the case of grouping.
cout << regex_replace (str, E, "are on $");
At this point, the string that satisfies grouping 1 and 2 is added in the middle of "is".
The above three functions have many versions of overloads that can meet the needs of different situations.
Actual combat
Requirements: Find the pattern string that satisfies Sectiona ("sectionb") or Sectiona ("sectionb"). and isolate the Sectiona and sectionb. Sectiona and sectionb do not appear as numbers, characters can be case-sensitive, or at least one character.
Analysis: According to the requirements, can be broadly divided into two parts, namely Sectiona and Sectionab. This is the need to use the grouping.
First step: Write a pattern string that satisfies the section case
[a-za-z]+
Step Two: Spaces may appear in Sectiona and sectionb. Assume that there are at most 1 spaces
\\s?
Combine the above two cases, which is the pattern string that satisfies our needs. But how can you organize it into two groups?
[a-za-z]+\\s[a-za-z]+
This is definitely not the case, according to the grouping rules, you need to distinguish the group by ()
Regex e ("([a-za-z]+) \\s?\\ (\" ([a-za-z]+) \ "\ \)");
At this point \ \ (\) After \\s is escaped in order to satisfy the SECTIONB outer quotation marks and parentheses.
After completion, you can use Regex_match to match, if matching, then continue to use Regex_search to find the string
if (Regex_match (str, E)) {smatch m; auto found = Regex_search (str, M, e); for (int n = 0; n < m.size (); ++n) {cout < ;< "m[" << n << "].str () =" << m[n].str () << Endl;}} else{cout << "not matched" << Endl;}
The first string of the object m array is the entire substring that satisfies the requirement, followed by the substring that satisfies grouping 1, grouping 2.
The above is the basic knowledge of the regular expression pattern matching string, I hope it will be helpful to everyone.