Java Regular Expression: does not contain a rule string, java rule string

Last Update:2015-12-16 Source: Internet

Author: User

Tags expression engine

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java Regular Expression: does not contain a rule string, java rule string
Overview

For log analysis, you often need to deal with thousands of log entries. To find data in a specific mode in a large amount of data, you often need to write many complex regular expressions. For example, if a log file does not contain entries of a specific string, you can find entries that do not start with a specific string.

Forward using the negative model

Regular expressions have the concepts of Lookahead and Lookbehind. These two terms describe the Matching Behavior of the Regular Expression Engine. Note that the front and back of the regular expression are a little different from what we generally understand. For a piece of text, we generally call the direction at the beginning of the text as "Front", and the end of the text as "back ". HoweverFor the Regular Expression Engine, because it is parsed from the text header to the tail (you can use the regular expression option to control the resolution direction), for the tail direction of the text, it is called the "front ", at this time, the RegEx engine has not moved to that part, but the direction of the text header is called "back", because the RegEx engine has passed through that part.. As shown in:

When a regular expression matches a character, you can preview the text that has not been parsed to see if it meets/does not match the matching mode, check whether the matching mode is met or not in the text that has been matched by the Regular Expression Engine. This is also calledAffirmative match and negative match.

Modern advanced Regular Expression engines generally support forward looking, which is not widely supported by postcare. Therefore, we use forward looking with a negative expression to meet our needs.

Implementation

Test data:

2009-07-07 04:38:44 127.0.0.1 GET /robots.txt2009-07-07 04:38:44 127.0.0.1 GET /posts/robotfile.txt2009-07-08 04:38:44 127.0.0.1 GET /

For example, we want to achieve the following two objectives for the preceding simple log entries:

1. filter out the data on the 8 th.

2. Find out the items that do not contain the robots.txtstrings (only the files containing robots.txt in urlmust be filtered out ).

The syntax of foresight is:

(?! Matching Mode)

Let's first achieve the first goal --Match entries that do not start with a specific string.

Because we want to exclude a continuous string, the matching mode is very simple, that is, 2009-07-08. The implementation is as follows:

^(?!2009-07-08).*?$

With Expresso, we can see that the result indeed filters out the data on the 8 th.

Next, let's achieve the second goal --Exclude entries containing specific strings.

As we wrote above, I drew a picture from the gourd:

^.*?(?!robots\.txt).*?$

This regular expression is described in the vernacular as follows: starting from the beginning, then following any character and ending with the character string.

Run the test and the result shows:

We didn't achieve what we wanted. Why? We can add two capture groups to the above regular expression for debugging:

^(.*?)(?!robots\.txt)(.*?)$

Test results:

We can see that the first group does not match anything, but the second group matches the entire string. Let's take a look at the regular expression. In fact, when the RegEx engine is resolved to the domain, the forward-looking work in Area B has been started. At this time, it is found that the match is successful when zone A is Null --. * null characters are allowed to be matched, and the forward-looking conditions are met. The a domain is followed by a "2009" string rather than robots. Therefore, all entries are successfully matched during the entire matching process.

After analyzing the cause, we can modify the above regular expression .*? Forward expression:

^(?!.*?robots).*$

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More