Http://www.imkevinyang.com/2009/08/%E4%BD%BF%E7%94%A8%E6%AD%A3%E5%88%99%E8%A1%A8%E8%BE%BE%E5%BC%8F%E6%89%BE%E5 %87%ba%e4%b8%8d%e5%8c%85%e5%90%ab%e7%89%b9%e5%ae%9a%e5%ad%97%e7%ac%a6%e4%b8%b2%e7%9a%84%e6%9d%a1%e7%9b%ae.html
Understanding regular expression foresight, speaks is helpful for an article:
Overview
To do log analysis work often need to deal with thousands of log entries, in order to find a specific pattern of data in a large amount of data, often need to write a lot of complex regular expressions. For example, to enumerate entries in a log file that do not contain a particular string, to find entries that do not begin with a particular string, and so on. Use negative Outlook
There are forward-looking (lookahead) and speaks (lookbehind) concepts in regular expressions, which describe the matching behavior of regular engines in a very figurative way. It should be noted that the pre and post in the regular expression are somewhat different from the one we generally understand. A text, we are generally accustomed to the beginning of the text of the direction called "front", the end of the text is called "back." But for the regular expression engine, because it is parsed from the head of the text to the tail (you can control the resolution direction through the regular option), so for the tail direction of the text, called "front", because this time, the regular engine has not come to that block, and the direction of the text head, it is called "after", Because the regular engine has gone through that place. As shown in the following illustration:
The so-called foresight is when the regular expression matches to a certain character, to the "unresolved text" in advance to see if the match/does not conform to the pattern, and then Gu, is in the regular engine has matched the text to see if the match/does not conform to the matching pattern. Conforming and not conforming to a particular pattern of matching is also known as affirmative and negative matching.
Modern advanced regular expression engines generally support forward-looking, and speaks support is not very broad, so here we adopt a negative perspective to achieve our needs. Implement
Test data:
2009-07-07 04:38:44 127.0.0.1 get/robots.txt
2009-07-07
04:38:44 127.0.0.1 get/posts/robotfile.txt 2009-07-08 04:38:44 127.0.0.1 Get/
For example, here are a few simple log entries, and we want to achieve two goals:
1. Filter out the number 8th data
2. Find items that do not contain robots.txt strings (as long as the URL contains robots.txt filters).
The prospective grammar is:
(?! Match mode)
Let's start with the first goal--match entries that don't start with a particular string.
Here we want to exclude a continuous string, so the matching pattern is very simple, that is, 2009-07-08. Implemented as follows:
^(?! 2009-07-08). *?$
With Expresso we can see that the results do filter out number 8th data.
Next, let's implement the second goal-to exclude entries that contain specific strings.
By the way we write, I divert:
^.*? (?! Robots\.txt). *?$
This is a regular vernacular description is: The beginning of any character, and then do not follow robots.txt continuous string, and then followed by any character, the end of the string.
Run the test and find out:
Didn't reach the effect we wanted. This is why. We'll debug the regular expression above with two capture groupings:
^(.*?) (?! Robots\.txt) (. *?) $
Test results:
We see that the first group does not match anything, and the second group matches the entire string. Go back and analyze the regular expression just now. In fact, when the regular engine resolves to area A, it has already begun to perform the forward work in area B. This time found that when a zone is null to match the success ——. * is allowed to match the null character, the forward-looking conditions are satisfied, a region followed by the "2009" string, not a robots. So the entire matching process succeeds in matching all entries.
After analyzing the reasons, we will revise the above positive, and move the. * To the forward expression as follows:
^(?!. *?robots). *$
Test results:
Bingo!
--kevin Yang
^(?!. *exclude). *$
Use this expression to match "abcincludexyz" but does not match "abcexcludexyz".
Note that you cannot forget the expression "^" at the front. If there is no "^" It will match the "xcludexyz" in "abcexcludexyz".