Regular expressions exclude specific strings

Source: Internet
Author: User

additional knowledge: ASP. NET regular Get link
 Regex reg = new  Regex ( " href=\ "([^\"]+) \ "[^>]*> ([^<]+) </a>  "  , regexoptions.ignorecase); MatchCollection matches  = Reg. Matches (HTML);  foreach  (Match match in   matches) { if   match. Success) { string  urls = match. Groups[1  . ToString ();  string  name = match. Groups[2  . ToString ();}}  

Additional knowledge point two: ASP. NET Regular filtering

" ^\\s* " string. Empty, regexoptions.compiled |  "\\r\\n"string. Empty, regexoptions.compiled | Regexoptions.multiline);//Filter line break

1. For example, find a string that does not begin with Baidu.
Baidu.com
sina.com.cn

Regular: ^ (?! Baidu). *$ Match result is line 2nd, that is, line 1th is excluded.
The 0 width assertion (?!) is used here. EXP), note that we have a forward lookup syntax (also called sequential surround) (? =exp)
(? =exp) will find the "position" before exp if the equal sign is replaced by an exclamation point, it becomes the negation semantics, that is, the location of the lookup cannot be followed by exp
Under normal circumstances?! To combine with a specific anchor point, such as the start of a line or the end of a line, the above example means the following:
^(?! Baidu). *$ matches the position at the beginning of a line and requires that the next position is not followed by the Baidu string. This is because the first line ^ after the position is Baidu so the match failed, was excluded.

2. For example, find a string that does not end in COM.
www.sina.com.cn
www.114369.cn
www.114345.cn
Www.114380.com
Www.i029.com

Regular ^.*? (? <!com) $ matches the first 3 rows of results.
If you look for a string that ends in COM, use regular ^.*? (? <=com) $ or ^.*?com$
Explanation of the regular expression: ^.*? (? <!com) $
First match the beginning of the line, then the. *? This is to ignore precedence, that is, to override any characters that do not match, (? <!com) This is a negative form of a reverse look around, meaning that the front of this position must match a position cannot be a string COM, and finally a line ends. For www.123.com, first to match the beginning of the line, and then match the position behind W, found that the front is not COM, so successful but then to match the end, failure, backtracking let. Match a W symbol, then (? <com) matches the position behind the second W, and find that the preceding is not a COM match success, followed by a failure to match the corresponding end of the line, until. * When the www.baidu.com is matched, the (? <!com) matches the position behind M,  At this point, the front of this position is a COM match direct failure, and then. *? matches the end of the line, (? <!com) matches the position behind the $, and obviously fails this time, so the whole global match fails. Www.123.com is excluded from the match. Here's. * Followed by a question mark the result is the same.

3. Example to find rows that do not contain if
if (a>b)
printf ("Hello");
else if (a<b)
printf ("Hello2");
Else
printf ("Hello3");

Regular ^ ([^f]|[ ^I]F) +$
In fact, this match is also an excluded string match, but different from the above two, because the if may not be at the beginning of the line, but also not at the end of the row, but in the middle of the string so that the match caused trouble, in the regular expression does not provide similar exclusion features. The easiest thing to think about is the following:
^[^if]+$ this seems to be the case, but the exclusion character group excludes the I and F two characters, not the if string, so the regular expression matches those strings that have neither the I character nor the F character. But if there is one or more I or one or more f in the string, or I and F characters are all just not joined together. These are the cases where we need to match, and we can't match the ones that contain the if string, not the lines that contain the I or F characters, so this is a very big exploit.

^.*(?! IF). *$ this notation uses a 0-width assertion, the surface meaning seems to say that any character + non-if+ arbitrary characters make up the entire string, but carefully study the matching process to know that this is wrong, (if) match is a position, so for the string AIFB he can match to, And in fact such characters are exactly what we don't want. According to this regular expression, for AIFB first matches the beginning of the line, second. * is greedy mode (match first), will always match to the end of the string (at this time the drive is positioned in front of the position), at this time (?! If) need to match a position, this position can not be after the IF, this time exactly position at the back of the B character, match the matching conditions, followed by matching the end of the line, where the entire global match succeeds.

That is, for a string such as I want to exclude the string ABC, then for any of the strings HelloWorld ABC HelloWorld at the time of the match (?! ABC) can match the positions of H, E, L, L, O, W, O, R, L, D, and so on, all of which are successful. So the match didn't go to ABC this place at all, (?! ABC) will match the success. This time does not have the role of exclusion, why the above 1th and 2nd examples can be, because their position has the beginning of the line and the end of the limit. For example, I want to match the beginning of the line is not ABC, then this time ^ (?! ABC) this time (?! ABC) in fact, the position of the actuator at the time of the match is limited by the beginning of the line, so the match will fail for those strings that start with ABC.

For the regular expression ^.* (?! ABC). *$ How can I get the first. * To match the HelloWorld problem in the helloworldabcxxx.

For the above topic, our answer is ^ ([^f]|[ ^I]F) +$ In fact, all the matches are divided into 2 cases, one case is that there is no F character in the string, there is no way to have an if string, in which case the matching string is impossible to have if. The second case is the F character, but we ask that the front of F not be I, so in cases where F and no f two are considered, the regular should be able to satisfy all situations.

In fact, the answer to this question is not perfect, for the excluded string if only 2 characters I and F characters, we can use this way, but if we want to exclude the string HelloWorld, this method is obviously not practical, how to consider how many cases?

In this case we use ^ (?!. *helloworld). *$ Regular Expression We move the first. * To the inside of the 0-width assertion. Matches the position of the beginning of the line first, then matches the position after the beginning of the line, requiring that this position cannot be followed by a. *helloworld string, which, in white, requires that the location cannot be followed by a xxxxxxxxxxxxxxxxxxhelloworld-like string , which excludes HelloWorld from the beginning of the start of the line.

The expression I wrote earlier

A (? <! VIRTUAL).) +b

End with a B, no virtual string

Regular expressions exclude specific strings

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.