Java Regular Expression advanced usage (grouping and capturing)

Source: Internet
Author: User

 

Regular Expressions are often used in string processing. For simple usage of regular expressions, I believe that some people with a basic program will understand them. I will not describe the basic program here. Here we will mainly explain the advanced usage of Regular Expressions in Java-grouping and capturing.

To repeat a single character, it is very simple. Simply add a qualifier after the character. For example, a + indicates matching one or more of the,? Matches 0 or 1. These delimiters are as follows:

 

X? X, Neither once nor once
X* X, Zero or multiple times
X+ X, Once or multiple times
X{N} X, ExactlyNTimes
X{N,} X, At leastNTimes
X{N,M}

X, At leastNTimes, but no moreMTimes

 

But what if we want to repeat multiple characters? Now we need to use grouping. We can use parentheses "()" to specify the child expression to be repeated, and then repeat the child expression, for example: (ABC )? Represents 0 or 1 ABC. The expression in parentheses represents a group.

 

A group can be divided into capture groups and non-capture groups.

 

Capture Group 

The capture group can be numbered from left to right by calculating its parentheses. For example(A) (B (c )))There are four such groups:

1 (A) (B (c )))
2 /
3 (B (c ))
4 (C)

Group zero always represents the entire expression

The reason for naming the capture group is that each sub-sequence of the input sequence that matches these groups is saved in the match. The captured sub-sequence can be passed later
Back references are used in expressions. You can also retrieve data from the matcher after the matching operation is completed.

 

Back Reference means that in the following expressions, we can use the group number to reference the text sequence captured by the previous expression (text is not a regular expression ).

 

For example (["']). */1 uses a group, AND/1 is a reference to the group. It matches all strings contained in two or two single quotes, for example, "ABC" or "'" or' "', but note that it does not match" A' or 'A. As described above, Back Reference only references text rather than expressions.

 

Non-capturing Group 

To(?)The group at the beginning is pureNon-capturingGroup, which does not capture text or count the combined counts. That is to say, if we use? The Group does not capture text or contain group numbers. Therefore, there is no back reference.

In Java, the following types of non-capturing groups are supported:

 
   
 
(? =X) X, Through the zero-width positive lookahead
(?!X) X, Using a zero-width negative lookahead
(? <=X) X, Through the zero-width positive lookbehind
(? <!X) X, Using a zero-width negative lookbehind

 

 

These four non-capturing groups are used to match expression X, but do not contain the expression text.

(? = X) Assertion with zero width. The matching continues only when the child expression x matches the right side of the position. For example,/W + (? =/D) match the word followed by a number instead of the number. This construction will not be traced back.
(?! X) Assertion with Zero Width and negative first. The matching continues only when the child expression X does not match the right side of the position. For example,/W + (?! /D) the word that does not match the digit, but does not match the digit.
(? <= X) Assertion after the width is zero. The matching continues only when the child expression x matches on the left side of the position. For example ,(? <= 19) 99 matches the 99 instance following 19. This construction will not be traced back.
(? <! X) Assertion after negative width. The matching continues only when the child expression X does not match on the left side of the position. For example ,(? <! 19) 99 matches with 99 instances not following 19

In general, if you want to use a combination of negative judgments, the first assertions and the later assertions are generally used. For example, the left side cannot contain strings 1302 and 1301, because you cannot deny a combination of 1302 and 1302 characters in some way (Note: [^ (1301) | ()] it cannot be 1, 3, 0, or 2, rather than 1302). Only the first or last-sent assertions can indicate a whole: 456 (? <! 1302 | 1301) 789.

 

Example:

 

The above is a theoretical introduction. Here we will use some examples to illustrate the problem:

1. Test Matching (? <! 4) 56 (? = 9) The meaning here is that the first part of the matched text 56 cannot be 4, and the last part must be 9. Therefore, it can match the following text 5569, not 4569.

 

2. Extract the string da12bka3434bdca4343bdca234bm to extract numbers between characters a and B. However, the character before a cannot be C, and the character after B must be D.

 

For example, only the number 3434 meets the requirements. So how can we extract it?

First, we write the expression for extracting this string :(? <! C) A (/d +) BD there is only one capture group (/d +)

The Java code snippets are as follows:

  1. Pattern P = pattern. Compile ("(? <! C) A (// D +) BD ");
  2. Matcher M = P. matcher ("da12bka3434bdca4343bdca234bm ");
  3. While (M. Find ()){
  4. System. Out. println (M. Group (1); // you only need to capture the number of group 1. Result 3434
  5. System. Out. println (M. group (0); // The 0 group is the entire expression. Here, it is not extracted (? <! C. Result: a3434bd
  6. }

As you can see, non-capturing groups do not return results because they do not capture text.

 

Regular Expressions are actually very powerful. Here is a simple discussion of advanced usage. If you are interested, discuss with yourself.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.