Basic knowledge of regular expression pattern matching string, regular expression string
Introduction
There is a function implementation in the actual project that needs to parse strings of some specific modes. In the existing code library, some implemented functions use specific characters to detect. The disadvantage of this method is:
- Logically error-prone
- It is easy to miss checks on some boundary conditions.
- Code complexity is hard to understand and maintain
- Poor performance
We can see that there is a cpp in the code base. The entire cpp has more than two thousand lines of code. In a method, there are more than 400 lines of code that parse strings! Comparing the characters one by one is really unsightly. In addition, many of the above comments have expired, and the writing style of many codes is also different. It can be judged that there have been a lot of people.
In this case, there is basically no way to go down this old road. Naturally, I thought of using regular expressions. I have no practical application experience in regular expressions, especially when writing matching rules. The first time I want to find some information from the Internet, I 'd like to get a general idea. However, du Niang's results are still disappointing. (Of course, if you want to find some professional knowledge, the results of DU Niang will be heartbroken every time, all of which are copies of the same style. However, du Niang's daily life is still acceptable. Later, du Niang's query results were abandoned. FQ went outside to find some basic videos (FQ required ).
This article is a summary of the basics of writing regular expressions to match strings. It consists of the following two parts:
- Basic rules for matching strings
- Regular Expression matching, search, and substitution
The regular expression rule described in this article is ECMAScript. The programming language used is C ++. Other aspects are not described.
Basic rules for matching strings
1. Match fixed strings
regex e("abc");
2. Match fixed strings, case insensitive
regex e("abc", regex_constants::icase);
3. Match one character other than a fixed string, case-insensitive
Regex e ("abc.", regex_constants: icase); //. Any character t newline. 1 character
4. Match 0 or 1 Character
Regex e ("abc? ");//? Zero or 1 preceding character. Match? First character
5. Match 0 or more characters
Regex e ("abc *"); // * Zero or more preceding character. Match the first character *
6. Match one or more characters
Regex e ("abc +"); // + One or more preceding character. Match + previous character
7. Match characters in a specific string
Regex e ("AB [cd] *"); // [...] Any character inside square brackets. match Any character in []
8. Match non-specific characters
Regex e ("AB [^ cd] *"); // [...] Any character not inside square brackets. match Any character not in []
9. match a specific string and specify the number
Regex e ("AB [cd] {3}"); // {n} matches any character before {}, and the number of characters is 3
10. match a specific string and specify the number range
Regex e ("AB [cd] {3,}"); // {n} matches any character before, and the number of characters is 3 or more regex e ("AB [cd] {3, 5}"); // {n} matches any character before, the number of characters is more than 3, and the number of characters is less than 5 closed intervals
11. match a rule in the rule
Regex e ("abc | de [fg]"); // | match | any rule on both sides
12. Matching Group
Regex e ("(abc) de +"); // () indicates a Sub-Group
13. Match sub-groups
Regex e ("(abc) de + \ 1"); // () indicates a sub-group, \ 1 indicates matching the content of the first group in this position. regex e ("(abc) c (de +) \ 2 \ 1 "); // \ 2 indicates matching the content of the second group here
14. Match the start of a string
Regex e ("^ abc."); // ^ begin of the string searches for substrings starting with abc
15. Match the end of a string
Regex e ("abc. $"); // $ end of the string to find the substring ending with abc
The above is the writing of the most basic matching mode. If you want to match a specific character, you need to use \ for escape. For example, if you want to match "." In a matching string, you should add \ before a specific character in the matching string \. If the above basic rules are not met, you can refer to this link. After using the basic matching mode, you need to use a regular expression for matching, searching, or replacement.
Regular Expression matching, search, and substitution
After writing a pattern string, you must match the string to be matched with the pattern string according to certain rules. There are three methods: Match (regex_match), search (regex_search), and replace (regex_replace ).
Matching is simple. You can directly pass the string to be matched and the pattern string to regex_match, and return a bool value to indicate whether the string to be matched meets the pattern string rule. Matches the entire str string.
Bool match = regex_match (str, e); // match the entire string str
Searching is a substring that finds and satisfies the pattern string in the entire string. That is, if 'str' contains a pattern string that meets the condition, true is returned.
Bool match = regex_search (str, e); // search for the substring matching the e rule in the str string
However, in many cases, it is not enough to return a matched bool volume. We need to obtain the matched substring. In this case, you need to group matching strings in the mode string. For details, refer to [basic rules for matching strings. Then pass the smatch into regex_search to obtain a string that meets each sub-group.
smatch m;bool found = regex_search(str, m, e);for (int n = 0; n < m.size(); ++n) { cout << "m[" << n << "].str()=" << m[n].str() << endl; }
Replacement is also completed in the case of grouping based on the pattern string.
cout << regex_replace(str, e, "$1 is on $2");
In this case, "is on" is added between strings that meet the requirements of group 1 and group 2 ".
The above three functions have many versions of overload, which can meet the needs of different situations.
Practice
Requirement: Find the pattern string that meets sectionA ("sectionB") or sectionA ("sectionB. And sectionA and sectionB are separated. SectionA and sectionB do not contain numbers. The characters are case-sensitive and contain at least one character.
Analysis: according to the requirements, it can be roughly divided into two parts: sectionA and sectionaB. This requires grouping.
Step 1: Write the pattern string that meets the section condition
[a-zA-Z]+
Step 2: spaces may appear in sectionA and sectionB. For the moment, assume that there is at most one space.
\ S?
Combine the above two cases, that is, the pattern string that can meet our needs. But how can we divide it into two groups?
[a-zA-Z]+\\s[a-zA-Z]+
The preceding statement is definitely incorrect. According to the grouping rules, you need to distinguish groups ().
regex e("([a-zA-Z]+)\\s?\\(\"([a-zA-Z]+)\"\\)");
At this time, in \ s? The following \ (\ "is used to meet the escape conditions of the outer quotation marks and parentheses of sectionB.
After the preceding steps are completed, you can use regex_match to match the string. If yes, use regex_search to search for the string.
if (regex_match(str, e)){ smatch m; auto found = regex_search(str, m, e); for (int n = 0; n < m.size(); ++n) { cout << "m[" << n << "].str()=" << m[n].str() << endl; }}else{ cout << "Not matched" << endl;}
The first string of the object m array is the entire substring that meets the requirements, followed by the substring that meets the requirements of group 1 and group 2.
Summary
The above section describes the basic knowledge of regular expression pattern matching strings. I hope it will be helpful to you. If you have any questions, please leave a message and I will reply to you in a timely manner. Thank you very much for your support for the help House website!