Regular expressions do not contain attributes

Source: Internet
Author: User

Find all IMG tags without the label with the description attribute alt:

Regular: ] *? Alt [^ <>] *?> ). *?>
Example:

Extension. If you want to find a with no title attribute, it should be:

Regular: <(?! [^ <>] *? Title [^ <>] *?> ). *?>
Example: <a src = "" alt = ""> <a src = ""> <a src = "" Title = ""> <a src = "" id = ""> <a src = "" Title = "" alt = "">
Use regular expressions to find words that do not contain consecutive strings ABC

[^ ABC] indicates that it does not contain any character in A, B, and C. How can I write an expression that does not contain a string ABC?

For myself, the simplest solution to this problem is to useProgramming LanguageTo find out the lazy style that contains ABC, and the rest is not included. However, I wrote a tutorial. Readers may not all have programming basics. Some of them just use some tools to extract some information from the TXT document, therefore, you must use a regular expression to answer the question.

So I opened regextester and started the experiment. First I tried to use it ((? 'Test' ABC) | .)*(? (Test )(?!)) (Meaning: Search for ABC or any character. If ABC is found, store it in the group named test and check whether there is any content in the group test, if a match fails, see the tutorial.) The result is "ABC", "AABC", "ABCD", and "AA, it seems that this solution is not feasible after the test group exists at the end.

Then I tried again (.(?! ABC) * (find all the characters that are not followed by ABC), and the result is "ABC". "ABCD" passed the test. "AABC" only intercepts the following "ABC ", obviously not.

Then try to enhance the condition :((? <! ABC ).(?! ABC) * (locate all the characters whose front and back are not ABC). The result is that all strings containing ABC only intercept "ABC ", if ABC is not included, it is passed directly.

It seems a bit confusing now, but how can we filter out strings containing ABC internally? In other words, how does one match the whole, not the part? Now we need to clarify the user's requirements: if the user wants to find a word, add \ B to both ends of the expression. If you want to find a line, add ^ and $. Because the user's problem is not clearly stated, I think it is a word.

So the expression \ B ((? <! ABC ).(?! ABC) * \ B. After testing, this expression can match all words that do not contain ABC and the word ABC.

How to exclude the word ABC? After some thought, I think it is most convenient to determine whether a word starts with a: \ B ((?! BC) | [^ A] (?! ABC ))((? <! ABC ).(?! ABC) * \ B (either not starting with a of BC or not starting with a, except that all the characters after the start must be prefixed and not followed by ABC ). Tested to fully meet the requirements, bingo!

Use a regular expression to search for words that do not contain a consecutive string ABC. The final result is \ B ((?! BC) | [^ A] (?! ABC ))((? <! ABC ).(?! ABC) * \ B
----------------
Update: according to the comments of maple, the more concise method is: \ B ((?! ABC) \ W) + \ B

Regular Expression-does not contain a string

This requirement is often used when regular expressions are used to match a substring that does not contain a substring. For example, I want to get the substring before "cd" from "eabcdfgh. Some may write:

([^ CD] *)

This method is completely incorrect, because [] contains a set, that is, [^ CD] indicates not equal to C or D, not CD. The followingProgramBut the EAB is matched.

CopyCode The Code is as follows: String S = "([^ CD] *)";
Match m = RegEx. Match ("eabcfgh", S );
MessageBox. Show (M. Value); // EAB
MessageBox. Show (M. Groups [1]. Value); // EAB

The above statements are incorrect, which can be avoided by normal young people. In special cases, regular expressions can be written in this way, and the efficiency is relatively high.

([/S] * CD)

First, it indicates that/S indicates matching any character. The special case is that I know the CD must exist in this string. Suppose that I want to match the part that does not contain the CD (for the convenience of description, only the part before the CD). That is to say, when the CD does not exist, the entire string should be taken out.

Copy code The Code is as follows: String S = "((.(?! CD ))*.)";
// String S = "([/S] * CD )";
Match m = RegEx. Match ("eabcdfgh", S );
MessageBox. Show (M. Value); // EAB
MessageBox. Show (M. Groups [1]. Value); // EAB

This writing method finally meets the requirements. However, it is worth mentioning that it is less efficient than the previous one.
Review related syntaxes:
(? : Subexpression) defines a non-capturing group.

Copy code The Code is as follows: // define a non-capture group
String S = "E (? : AB )(.*)";
Match m = RegEx. Match ("eabcd", S );
MessageBox. Show (M. Value); // eabcd
MessageBox. Show (M. Groups [1]. Value); // CD

AB is matched, but its group is not captured. Group [1] is CD

(? = Subexpression) Zero-width positive prediction predicate.

Copy code The Code is as follows: // zero-width positive prediction Predicate
// String S = "B (CD | de )(.*)";
String S = "B (? = CD | de )(.*)";
Match m = RegEx. Match ("eabcdfg", S );
MessageBox. Show (M. value );
MessageBox. Show (M. Groups [1]. Value); // difference between CD cdfg

There is a difference between this writing method and the comment-out method. The difference is "Zero Width". This writing method will be captured, that is, it does not occupy a group.

(?! Sub-expression) Zero-width negative prediction first asserted.

! It indicates that it is not included, and it is also zero-width and will not be captured.

(? <= Sub-expression) Assertion after reviewing with zero width.

Example :(? <= 19) \ D {2} \ B

"99", "50", and "05" in "1851 1999 1950 1905"

(? <! Subexpression) returns the assertion after review with zero-width negative.

Example :(? <! 19) \ D {2} \ B

"51" and "03" in "1851 1999 1950 1905 2003"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.