Java Regular Expression: does not contain a rule string, java rule string

Source: Internet
Author: User
Tags expression engine

Java Regular Expression: does not contain a rule string, java rule string
Overview

For log analysis, you often need to deal with thousands of log entries. To find data in a specific mode in a large amount of data, you often need to write many complex regular expressions. For example, if a log file does not contain entries of a specific string, you can find entries that do not start with a specific string.

Forward using the negative model

Regular expressions have the concepts of Lookahead and Lookbehind. These two terms describe the Matching Behavior of the Regular Expression Engine. Note that the front and back of the regular expression are a little different from what we generally understand. For a piece of text, we generally call the direction at the beginning of the text as "Front", and the end of the text as "back ". HoweverFor the Regular Expression Engine, because it is parsed from the text header to the tail (you can use the regular expression option to control the resolution direction), for the tail direction of the text, it is called the "front ", at this time, the RegEx engine has not moved to that part, but the direction of the text header is called "back", because the RegEx engine has passed through that part.. As shown in:

When a regular expression matches a character, you can preview the text that has not been parsed to see if it meets/does not match the matching mode, check whether the matching mode is met or not in the text that has been matched by the Regular Expression Engine. This is also calledAffirmative match and negative match.

Modern advanced Regular Expression engines generally support forward looking, which is not widely supported by postcare. Therefore, we use forward looking with a negative expression to meet our needs.

Implementation

Test data:

2009-07-07 04:38:44 127.0.0.1 GET /robots.txt2009-07-07 04:38:44 127.0.0.1 GET /posts/robotfile.txt2009-07-08 04:38:44 127.0.0.1 GET /

For example, we want to achieve the following two objectives for the preceding simple log entries:

1. filter out the data on the 8 th.

2. Find out the items that do not contain the robots.txtstrings (only the files containing robots.txt in urlmust be filtered out ).

The syntax of foresight is:

(?! Matching Mode)

Let's first achieve the first goal --Match entries that do not start with a specific string.

Because we want to exclude a continuous string, the matching mode is very simple, that is, 2009-07-08. The implementation is as follows:

^(?!2009-07-08).*?$

With Expresso, we can see that the result indeed filters out the data on the 8 th.

Next, let's achieve the second goal --Exclude entries containing specific strings.

As we wrote above, I drew a picture from the gourd:

^.*?(?!robots\.txt).*?$

This regular expression is described in the vernacular as follows: starting from the beginning, then following any character and ending with the character string.

Run the test and the result shows:

We didn't achieve what we wanted. Why? We can add two capture groups to the above regular expression for debugging:

^(.*?)(?!robots\.txt)(.*?)$

Test results:

We can see that the first group does not match anything, but the second group matches the entire string. Let's take a look at the regular expression. In fact, when the RegEx engine is resolved to the domain, the forward-looking work in Area B has been started. At this time, it is found that the match is successful when zone A is Null --. * null characters are allowed to be matched, and the forward-looking conditions are met. The a domain is followed by a "2009" string rather than robots. Therefore, all entries are successfully matched during the entire matching process.

After analyzing the cause, we can modify the above regular expression .*? Forward expression:

^(?!.*?robots).*$

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.