Java regular expression non-capturing Group details

Source: Internet
Author: User

I have read Regular Expressions over the past few days to summarize non-capturing groups.
This section mainly summarizes a total of five groups of 1 + 2.
(? : X )(? = X )(? <= X )(?! X )(? <! X)

I. Starting from (? :) Non-capturing group.
The following example shows a non-capturing group.

There are two amounts: $8899 and $6688. Apparently, the first one is RMB 8899, and the last one is USD 6688. I need a regular expression to extract the amount and type of currency. Regular Expressions can be written as follows: (\ D) + ([$]) $ (test in Java, so escape character '\' is added '\')
The test procedure is as follows:

[Java]View
Plaincopy

  1. Pattern P = pattern. Compile ("(\ D +) ([$]) $ ");
  2. String STR = "8899 $ ";
  3. Matcher M = P. matcher (STR );
  4. If (M. Matches ()){
  5. System. Out. println ("currency amount:" + M. Group (1 ));
  6. System. Out. println ("currency type:" + M. group (2 ));
  7. }

Output result:
Currency: 8899
Currency type: ¥

OK. Here, the regular expression is divided into two groups, one is (\ D +), the other is ([$]), the former matches the currency amount, and the other matches the currency type.

Now, I need this regular expression to match the floating point number. For example, $8899.56. We all know that less than one dollar is basically not enough, so I want to ignore the decimal part and extract the regular expressions 8899 and ¥.
The regular expression is as follows:
[Code = "Java"] (\ D + )(\\.?) (\ D +) ([$]) $ [/Code]
There are four groups separated by parentheses. Therefore, to output the integer part of the currency amount and the currency type, you must enter group (1) and group (4) respectively. If the output part and the regular expression are separated, I want to modify only the regular expression without modifying the code of the output part, that is, group (1) and group (2) are used as the output part. This can lead to non-capture group (? :).
Modify the preceding regular expression:
[Code = "Java"] (\ D + )(? :\\.?) (? : \ D +) ([$]) $ [/Code]
In this way, group (1) and group (2) are used as the output, and 8899 and ¥ are also output.
The two groups in the middle of this regular expression use non-capturing groups (? :), Which can be understood as grouping without capturing.

II ,(? =) And (? <=)
Some materials call them positive lookup and positive lookup;
Some materials are also called affirmative order and positive reverse order.

1. Ignore their names. Let's look at the following example:

[Java]View
Plaincopy

  1. Pattern P = pattern. Compile ("[0-9a-z] {2 }(? = Aa )");
  2. String STR = "12332aa438aaf ";
  3. Matcher M = P. matcher (STR );
  4. While (M. Find ()){
  5. System. Out. println (M. Group ());
  6. }

This program outputs 32 38
This regular expression is used to match a string that contains two characters (numbers, or letters) and followed by [/color] After [color = Red.
Analysis:
The 32aa substring satisfies this condition, so it can be matched because (? =) Is not captured, so the output is only 32, excluding AA. Similarly, 38aa matches this regular expression, and the output is only 38.

Let's take a closer look:
After STR matches successfully for the first time and outputs 32 values, the program will continue to search for other matched substrings. In this case, we should start from the last bit of 32aa and look back, or from the last bit of 32? That is, starting from index 5 or from 7? Some people may think that the next bit of 32aa will be searched later. Because 32aa matches the regular expression, the next bit is of course followed by it, that is, starting from 4. But it is actually from the second of 32, that is, the first. The reason is (? =) Is not captured.
The following is a comment on the API documentation:

(? = X) x, via zero-width positive lookahead

It can be seen that zero-width (Zero Width) means this.

Now, it is more interesting to write strings: Str = "aaaaaaaa ";
Let's take a look at its output: Aa aa
Analysis:
This string has a total of 8.
The first match is easier to find, that is, the first four: AAAA, of course, the third and fourth A are not captured, so the output is the first and second;
Continue to search. At this time, it starts from the third A, from the third to the sixth, and the four A zones are configured. Therefore, output the third and fourth;
Continue to search. At this time, from the fifth A, the fifth to the eighth, the four A zones are configured, so the fifth and sixth A are output;
Next, search for the seventh A. Obviously, the seventh and eighth a do not meet the regular expression matching conditions.
Let's extend the following (? =) After the captured string, what is the result if it is placed first?
Replace the example:

[Java]View
Plaincopy

  1. Pattern P = pattern. Compile ("(? = Hopeful) Hope ");
  2. String STR = "hopeful ";
  3. Matcher M = P. matcher (STR );
  4. While (M. Find ()){
  5. System. Out. println (M. Group ());
  6. }

Its output is hope.
The regular expression indicates whether it can match hopeful. If yes, the hope in hopeful is captured. Of course, continue to search for matched substrings from F.
The comparison shows that ,(? = Hopeful) Hope and hope (? = Ful), the effects of the two regular expressions are actually the same.

2. Let's talk about (? <=)
Modify the regular expression,
Pattern P = pattern. Compile ("(? <= Aa) [0-9a-z] {2 }");
String or str = "12332aa438aaf ";
Its output is 43.

This regular expression is used to match a string that contains two characters (numbers or letters ), and [/color] before [color = Red] is followed by two letters.

Similarly, let's take a deeper look and change STR to STR = "aaaaaaaa". Let's take a look at what the output is.
Analysis:
Needless to say, the first match is the first four A, and the output is the third and fourth;
Continue to search backward. Starting from the fifth A, the program finds that the fifth and sixth A are satisfied because they are two characters and meet the requirements of the first two A (third and fourth ). Therefore, if the matching is successful, the fifth and sixth A are output;
Continue to search backward. Starting from the seventh A, the program finds that the seventh and eighth a meet, because they are two characters and meet the first two A (fifth and sixth ). Therefore, if the match is successful, output the Seventh and Eighth. The search is complete.

3 ,(?!) And (? <!)
In terms of appearance, it is very similar to the previous group. The difference is that '=' is changed '! '
So the meaning is the opposite.
[0-9a-z] {2 }(?! Aa) indicates that it matches two characters and is not followed by AA.
(? <= Aa) [0-9a-z] {2} indicates that the string matches two characters and is not followed by AA.
The usage is similar to the previous one.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.