Use regular expressions to find words that do not contain consecutive strings abc

Source: Internet
Author: User
I wrote an article "getting started with regular expressions for 30 minutes". Some readers may ask:
[^ Abc] indicates that it does not contain any character in a, B, and c. How can I write an expression that does not contain a string abc?

For myself, the simplest solution to this problem is to use the combination of programming languages to find out the lazy style that contains abc. However, I wrote a tutorial. Readers may not all have programming basics. Some of them just use some tools to extract some information from the txt document, therefore, you must use a regular expression to answer the question.

So I opened RegexTester and started the experiment. First I tried to use it ((? 'Test' abc) | .)*(? (Test )(?!)) (Meaning: Search for abc or any character. If abc is found, store it in the group named test and check whether there is any content in the group test, if a match fails, see the tutorial.) The result is "abc", "aabc", "abcd", and "aa, it seems that this solution is not feasible after the test group exists at the end.

Then I tried again (.(?! Abc) * (find all the characters that are not followed by abc), and the result is "abc". "abcd" passed the test. "aabc" only intercepts the following "abc ", obviously not.

Then try to enhance the condition :((? <! Abc ).(?! Abc) * (locate all the characters whose front and back are not abc). The result is that all strings containing abc only intercept "abc ", if abc is not included, it is passed directly.

It seems a bit confusing now, but how can we filter out strings containing abc internally? In other words, how does one match the whole, not the part? Now we need to clarify the user's requirements: if the user wants to find a word, add \ B to both ends of the expression. If you want to find a line, add ^ and $. Because the user's problem is not clearly stated, I think it is a word.

So the expression \ B ((? <! Abc ).(?! Abc) * \ B. After testing, this expression can match all words that do not contain abc and the word abc.

How to exclude the word abc? After some thought, I think it is most convenient to determine whether a word starts with a: \ B ((?! Bc) | [^ a] (?! Abc ))((? <! Abc ).(?! Abc) * \ B (either not starting with a of bc or not starting with a, except that all the characters after the start must be prefixed and not followed by abc ). Tested to fully meet the requirements, Bingo!

Use a regular expression to search for words that do not contain a consecutive string abc. The final result is \ B ((?! Bc) | [^ a] (?! Abc ))((? <! Abc ).(?! Abc) * \ B
----------------
Update: according to the comments of maple, the more concise method is: \ B ((?! Abc) \ w) + \ B

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.