Note
Comments are provided in any programming language. otherwise, many codes are not easy to understand. if the regular expression contains thousands of rows without comments, it is hard for you to see vomiting.
The format of the comment is (? # Comment) where comment indicates the content of the comment, such as ABC [\ D (? # This means digit)] * It is equivalent to ABC [\ D] *
However, adding annotations is more confusing. Most of the time you use this method, you need to use. regexoptions. ignorepatternwhitespace together with the parameters in the function to add this parameter.
The following content starts with # and ends with a line.
ABC # This it start
[\ D] # This means Digit
* # This means zero times or more than zero times
The specific usage is string pattern = @ "ABC # This is start
[\ D] # This means Digit
*";
String source = "This contains ABC123 and other ";
String result = RegEx. Match (source, pattern, regexoptions. ignorepatternwhitespace). value;
Branch
As mentioned above, in parentheses, the | symbol represents either of them. In fact, the pattern of the entire regular expression can also be used | to represent the branch condition.
For example, AB [3-8] * | cd [1-3] + | EF [0-1] * indicates that three branches are matched from left to right, if the first branch matches, you don't have to worry about the branch. in fact, this is the same as or in the logic judgment of C #.
For example, if there are two Boolean variables A and B, if (a | B), if A is true, B will not be taken care.
Group
We use braces {} to specify the number of occurrences of a single letter. For example, W {3} indicates that W appears three times in a row. so how can it be a string of characters. we will think of the priority problems in arithmetic symbols. For example, multiplication and division numbers have a higher priority than addition and subtraction. to enable addition and subtraction to execute the operation first, enclose them in parentheses. here we can enclose a string of characters.
For example, (Arwen) {3} indicates that the Arwen character is repeated three times. (AB [CDE] *) {4} indicates that AB [CDE] * is repeated four times. after being enclosed by a number (), it is regarded as a unit and a whole. A professional is called a group.
In addition, it seems that you can use $ to add a number to reference the previous group. For example
String STR = @ "123abc ";
String STR = RegEx. Replace (STR, @ (\ D +) \ W +, @ "$1 + 456"); \ The result is 123 + 456
(\ D +) matches 123 and uses it as a group. $1 references this group.
Backward reference
Backward reference. This name sounds awkward,In fact, simply put, a group is used in the later matching.. What if I want to use that group again in a match in a later part? Simply put, simply write down the group.
For example, (Arwen) {2} 123abc (Arwen) {3}. But it looks a bit redundant,Therefore, the entire lazy approach comes out. Each group is named and the group name is directly used to represent the Group.. If you do not name it directly, it will be named 1, 2, 3 by default in the order from left to right of the group... for example, in the preceding example, the first (Arwen) name is 1, and the second (Arwen) name is 2. If there is another (James) Name, It is 3.
The preceding example can be abbreviated as (Arwen) {2} ABC \ 1 {3}. The format of the referenced group is a slash \ add group name.
You can also explicitly name each group by yourself. The format is (? <Name> exp) or (? 'Name' exp) Where name is the group name and exp is the content in the group.? It's just an identifier. If I want to name a for the group (Arwen), write it like this (? The example starting with <A> Arwen) can also be written as (Arwen) {2} \ A {3}
In addition to the default name or explicit name for a group, you can also choose not to name it in the format (? : Exp)For example, change the group (Arwen) (? : Arwen) indicates that this group has no name. You cannot reference it later. What is the significance of this operation? In addition to this, we can see that the group name is useful. match (string source, string pattern) this function can perform more detailed analysis after matching and obtain the result. At that time, the group name will also be used. if you want to ignore the group information at a location, you can explicitly specify that the group has no name.
Assertion with Zero Width
Some of our previous simple matches are the information of the Sub-characters to be matched by the crowdsourced security testing, and some metacharacters and restrictions are used to represent it. then match the result. this is a direct matching method. you can also use some indirect matching information, such as the characters in front of the substring and the characters in the backend.
So the strange term "zero-width assertion" appears. In fact, it indicates that the zero-width assertion indicates that the expression does not occupy any space and the width is 0, for example, the beginning or end of a ^ $ string. it only indicates the concept of a position starting or ending with a place. assertions are a statement in the logic theory. to put it simply, I made a conclusion.
0-width positive prediction predicate (? = Exp)It indicates that a string is followed by a string exp such .*(? = Fool) indicates that a character ends with fool. If you are afool. The matched sub-character is you are
The assertion (? <= Exp)It indicates that there is a string exp before a string, such (? <= Fuck) indicates that a string has a fuck before it. If you have a fuck character, the matching result is you.
With a string in front or back, you can naturally judge it by no character in front or back. this is the opposite. the method is similar, but the = sign is changed!
0-width negative prediction predicate (?! Exp)It indicates that a string is not followed by exp such as Hao (?! \ D) indicates that the character Hao is not followed by a number. If the character hao123 exists, it cannot match. If the character Hao 123 exists, it matches Hao.
Assertion after review (? <! Exp)It indicates that there is no exp before a string, such (? <! \ D) Hao indicates that there is no number before the Hao string. If there is a character 123hao, the matching fails. If it is a character 123 Hao, it matches to Hao.
In fact, it's too confusing to use the terminologies such as the zero-degree assertion. You should leave it alone. you can simply imagine that there are front, back, and front, and there are no easy-to-understand words in the back.
Greed and laziness
We know that because the matching conditions are often fuzzy, the matching results may be different. For example, the anrwen character can be used as. * n to match. then both an and anrwen meet the conditions.
Once an instance is matched, it will no longer be matched. It is naturally very lazy, so it is calledLazy match. Specifically, it means matching as short as possible
On the contraryGreedy match, Match as many characters as possible
The default method is greedy matching. Therefore, the matching result in the above example is anrwen.
If we want to get the result of an, write .*? N add more question marks
As for this? Under what circumstances is the number used? Naturally, it is used when a qualifier that repeats a character multiple times.
For example, * +? {N} these symbols can be used when they appear and become *? +? {N }?