Use regular expressions to find entries that do not contain a specific string--regular expression

Source: Internet
Author: User
Tags regular expression expression engine

To do log analysis work often need to deal with thousands of log entries, in order to find a specific pattern of data in a large amount of data, often need to write a lot of complex regular expressions. For example, to enumerate entries in a log file that do not contain a particular string, to find entries that do not begin with a particular string, and so on.

Use negative outlook

There are forward-looking (lookahead) and speaks (lookbehind) concepts in regular expressions, which describe the matching behavior of regular engines in a very figurative way. It should be noted that the pre and post in the regular expression are somewhat different from the one we generally understand. A text, we are generally accustomed to the beginning of the text of the direction called "front", the end of the text is called "back." But for the regular expression engine, because it is parsed from the head of the text to the tail (you can control the resolution direction through the regular option), so for the tail direction of the text, called "front", because this time, the regular engine has not come to that block, and the direction of the text head, it is called "after", Because the regular engine has gone through that place. As shown in the following illustration:

The so-called foresight is when the regular expression matches to a certain character, to the "unresolved text" in advance to see if the match/does not conform to the pattern, and then Gu, is in the regular engine has matched the text to see if the match/does not conform to the matching pattern. Conforming and not conforming to a particular pattern of matching is also known as affirmative and negative matching.

Modern advanced regular expression engines generally support forward-looking, and speaks support is not very broad, so here we adopt a negative perspective to achieve our needs.

Realize

Test data:

Copy Code code as follows:

2009-07-07 04:38:44 127.0.0.1 Get/robots.txt
2009-07-07 04:38:44 127.0.0.1 Get/posts/robotfile.txt
2009-07-08 04:38:44 127.0.0.1 Get/

For example, here are a few simple log entries, and we want to achieve two goals:

1. Filter out the number 8th data
2. Find items that do not contain robots.txt strings (as long as the URL contains robots.txt filters).

The prospective grammar is:

(?! Match pattern) We first implement the first goal--match entries that do not begin with a specific string.

Here we want to exclude a continuous string, so the matching pattern is very simple, that is, 2009-07-08. Implemented as follows:

Copy Code code as follows:

^(?! 2009-07-08). *?$

With Expresso we can see that the results do filter out number 8th data.

Next, let's implement the second goal-to exclude entries that contain specific strings.

By the way we write, I divert:

Copy Code code as follows:

^.*? (?! Robots\.txt). *?$

This is a regular vernacular description is: The beginning of any character, and then do not follow robots.txt continuous string, and then followed by any character, the end of the string.
Run the test and find out:

Didn't reach the effect we wanted. What is this for? We'll debug the regular expression above with two capture groupings:

Copy Code code as follows:

^(.*?) (?! Robots\.txt) (. *?) $

Test results:

We see that the first group does not match anything, and the second group matches the entire string. Go back and analyze the regular expression just now. In fact, when the regular engine resolves to area A, it has already begun to perform the forward work in area B. This time found that when a zone is null to match the success ——. * is allowed to match the null character, the forward-looking conditions are satisfied, a region followed by the "2009" string, not a robots. So the entire matching process succeeds in matching all entries.

After analyzing the reasons, we will revise the above positive, and move the. * To the forward expression as follows:

Copy Code code as follows:

^(?!. *?robots). *$

Test results:

Complete

Implementations in PHP with a regular implementation that does not include a string

Preg_match ("/^" (?!) ABC).) *$/is ", $STR);

Complete code example

Copy Code code as follows:

$str = "DFADFADF765577ABC55FD";
$pattern _url = "/^ (?! ABC).) *$/is ";
if (Preg_match ($pattern _url, $STR))
{
echo "does not contain abc!. ";
}
Else
{
echo "contains abc! ";
}

The result is: false, containing abc!

Matches, contains the string "ABC" and does not contain a regular expression of the string "xyz":

Preg_match ("/(ABC) [^] [(?!] XYZ).) *$]/is ", $STR);

This method is effective, I use the following methods:

(?:(?! <\/div>). | \ n) *? Match a string that does not contain </div>

But the result of the final use is that the method is extremely inefficient, you can consider using a very short text (more than 10 words, or up to dozens of) for the same part of the regular formula to be used, but do not use it for a large section of the article, or if you need to replace it in more than one place, consider replacing it with another method such as: When you parse the text that matches the regular style of the paragraph and then verify that there is a paragraph of text in it, the regular expression is not a very efficient way to match text segments that do not contain a specific string.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.