Use regular expressions to exclude specific strings

Source: Internet
Author: User
1. Search for strings that do not start with Baidu.
Baidu.com
Sina.com.cn

Regular: ^ (?! Baidu). * $ the matching result is 2nd rows, that is, 1st rows are excluded.
The zero-width assertion (?! Exp). Note that we have a forward search syntax (also called sequential view )(? = Exp)
(? = Exp) returns the [position] Before exp. If you replace the equal sign with an exclamation point, it becomes a negative semantic. That is to say, the position after the search cannot be exp.
Generally ?! To combine with a specific anchor, for example, starting with a ^ line or ending with a $ line, the preceding example is as follows:
^ (?! Baidu). * $ first matches the position at the beginning of a row, and requires that the following position is not followed by the Baidu string. In this case, the match fails because the position after the first line ^ is Baidu, which is excluded.

2. Search for strings that do not end with COM.
Www.sina.com.cn
Www.educ.org
Www. Hao. CC
Www.baidu.com
Www.123.com

Regular ^ .*? (? <! Com) $ match the first three rows.
Use regular ^ .*? (? <= Com) $ or ^ .*? Com $
Interpretation of Regular Expressions: ^ .*? (? <! Com) $
First match the beginning of the line, then .*? This is ignore first, that is, ignore first does not match any character ,(? <! Com) This is a negative form of backward view, which means that the string com cannot be matched before the position, and the row ends. For www.123.com, first match the beginning of the line, then match the position behind W, and find that the front is not com, so the success is followed by matching the end of the line, failure, backtracking to make .*? Match a w symbol, and then (? <Com) matches the position behind the second W and finds that the COM match is not successful, followed by the failure to match the end of the corresponding row of $ .*? When www.baidu.com is matched (? <! Com) matches the position after M. In this case, the com match in front of this location fails, and then .*? Match the end of a row ,(? <! Com) matches the position after $. Obviously, this time also fails, so the entire global match fails. Www.123.com is excluded from matching. Here, the. * result is the same without a question mark.

3. Search for rows without if in the example
If (A> B)
Printf ("hello ");
Else if (a <B)
Printf ("hello2 ");
Else
Printf ("hello3 ");

Regular ^ ([^ F] | [^ I] f) + $
In fact, this match is also an exclusive string match, but it is different from the above two types, because the if here may not start or end the row, in the middle of the string, matching is troublesome, and similar exclusion is not provided in the regular expression. The following regular expressions are the most common examples:
^ [^ If] + $ this statement looks like this, but the exclusion character group does not exclude the I and F characters, rather than the if string, therefore, this regular expression matches strings that neither contain I nor F. However, if one or more I or multiple I or one or more f characters exist in the string, or both I and F characters are not connected together. These situations are all situations where we need to match, but we cannot match the situations where the rows that contain the if string, rather than the rows that contain the I or F characters, are very prone to this writing vulnerability.

^ .*(?! If ). * $ this write method uses a zero-width assertion. On the surface, it seems that any character + non-If + any character constitutes the entire string, but after carefully studying the matching process, we will know that this is wrong ,(? If) matches a position, so the aifb character can also be matched. In fact, such a character is exactly what we don't want. According to this regular expression, aifb first matches the first line, and then. * The greedy mode (matching first) will always match to the end of the string (at this time, the drive is located in front of the $ position). At this time (?! If) You need to match a position. The position cannot be followed by If. At this time, the position is right behind the B character and matches the matching condition, followed by the end of the matching row, the global match is successful.

That is to say, if I want to exclude the ABC string for a string, then when matching any string helloworld ABC helloworld (?! ABC) can match H, E, l, l, O, W, O, R, L, D, and so on. So the match has not been carried out to ABC at all ,(?! ABC. At this time, it cannot be ruled out at all. Why can the above 1st and 2nd examples be used, because their positions are limited by the beginning and end of the line. For example, if I want to match the first line instead of ABC, then ^ (?! ABC) at this time (?! ABC) In fact, when matching, the position of the drive device is limited by the beginning of the line, so for those strings starting with ABC, the matching will fail.

For the regular expression ^ .*(?! ABC). * $ how can we make the first. * match helloworld in helloworldabcxxx.

For the above question, our answer is ^ ([^ F] | [^ I] f) + $. In fact, all the matches are divided into two situations, one case is that if the string does not contain the f character, it is naturally impossible to have the if string. In this case, the matching string cannot have the if character. The second case is that there are f characters, but we require that the front of F is not I at this time, so if both F and F are taken into account, this regular expression can satisfy all the situations.

In fact, the answer to this question is imperfect. We can use this method to exclude strings if only contain 2 characters I and F, but if we want to exclude strings helloworld, this method is obviously not practical. How many situations should we consider?

In this case, we use ^ (?!. * Helloworld). * $ regular expression we move the first. * to the zero-width assertion. First match the position at the beginning of the line, and then match the position at the end of the line. * The helloworld matched string. To put it bluntly, the position must be followed by a string similar to xxxxxxxxxxxxxxxxhelloworld. This eliminates the situation where helloworld is contained after the beginning of the line.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.