Java Regular Expression application Summary

Source: Internet
Author: User
Tags character classes

 

Java Regular Expression application Summary

08:45:17

Copyright statement: original works are not reprinted! Otherwise, legal liability will be held.

Java Regular Expression application Summary

 

I. Overview

 

Regular Expressions are an important tool for Java to process strings and texts.

 

Java's processing of regular expressions is concentrated in the following two classes:

Java. util. regex. Matcher pattern class: used to represent a compiled regular expression.

Java. util. regex. Pattern matching class: matches the abstract results expressed by a string in a Pattern.

(Unfortunately, Java Doc does not provide the concept of responsibility for these two classes .)

 

For example, a simple example:

Java code

Import java. util. regex. Matcher;

Import java. util. regex. Pattern;

 

/**

* Regular Expression example

*

* @ Author leizhimin 2009-7-17 9:02:53

*/

Public class TestRegx {

Public static void main (String [] args ){

Pattern p = Pattern. compile ("f (. + ?) K ");

Matcher m = p. matcher ("fckfkkfkf ");

While (m. find ()){

String s0 = m. group ();

String s1 = m. group (1 );

System. out. println (s0 + "|" + s1 );

}

System. out. println ("---------");

M. reset ("fucking! ");

While (m. find ()){

System. out. println (m. group ());

}

 

Pattern p1 = Pattern. compile ("f (. + ?) I (. + ?) H ");

Matcher m1 = p1.matcher ("finishabigfishfrish ");

While (m1.find ()){

String s0 = m1.group ();

String s1 = m1.group (1 );

String s2 = m1.group (2 );

System. out. println (s0 + "|" + s1 + "|" + s2 );

}

 

System. out. println ("---------");

Pattern p3 = Pattern. compile ("(19 | 20) \ d ([-/.]) (0 [1-9] | 1 [012]) \ 2 (0 [1-9] | [12] [0-9] | 3 [01]) ");

Matcher m3 = p3.matcher ("1900-01-01 2007/08/13 19004251.01 1900 01 01 1900-01.01 1900 13 01 1900 02 31 ");

While (m3.find ()){

System. out. println (m3.group ());

}

}

}

 

Output result:

Fck | c

Fkk | k

---------

Fuck

Finish | in | s

Fishfrish | ishfr | s

---------

1900-01-01

2007/08/13

190000001.01

1900 01 01

1900 02 31

 

Process finished with exit code 0

 

Ii. Some confusing Problems

 

1. How to Deal with the backslash in Java

 

In other languages, \ indicates to insert a character \;

In Java, \ indicates the backslash of the regular expression to be inserted, and the subsequent characters have special meanings.

 

See the API documentation:

Predefined character classes

. Any character (may or may not match the line terminator)

\ D Number: [0-9]

\ D non-numeric: [^ 0-9]

\ S blank character: [\ t \ n \ x0B \ f \ r]

\ S non-blank characters: [^ \ s]

\ W word character: [a-zA-Z_0-9]

\ W non-word characters: [^ \ w]

 

But look at the above program, it is not difficult to see the comparison:

\ D is written as \ d in actual use;

 

 

In a Java regular expression, if you want to insert a \ character, you must write it as \ in the regular expression because the following APIDoc definition \ represents a backslash.

However, if the regular expression indicates line breaks and so on, you do not need to add a backslash. For example, press enter \ r to write \ r.

 

Character

X characters x

\ Backslash character

\ 0n CHARACTER n with an octal value of 0 (0 <= n <= 7)

\ 0nn: nn (0 <= n <= 7) character with a octal value of 0)

\ 0mnn: mnn (0 <= m <= 3, 0 <= n <= 7)

\ Xhh character with hexadecimal value 0x hh

\ Uhhhh character with hexadecimal value 0x hhhh

\ T tab ('\ u0009 ')

\ N New Line (line feed) character ('\ u000a ')

\ R carriage return ('\ u000d ')

\ F form feed ('\ u000c ')

\ A alarm (bell) character ('\ u0007 ')

\ E escape character ('\ u001B ')

\ Cx control letter corresponding to x

 

2. Matcher. find (): Try to find the next subsequence of the character sequence that matches the pattern. This method starts from the beginning of the Character Sequence. If the previous call of this method is successful and the matching has not been reset since then, it starts with the first character that was not matched in the previous matching operation, that is, if the child sequence that matches the pattern is found in the previous time, the child sequence will start searching.

 

3. Matcher. matchers (): determines whether the entire character sequence matches the pattern. When multiple strings are continuously checked using the Matcher object, you can use

Matcher. reset (): reset the Matcher, discard all its explicit status information, and set the position to zero.

Alternatively, Matcher. reset (CharSequence input) resets the Matcher with the new input sequence.

.

 

4. group concept. This concept is very important. A group is a regular expression divided by parentheses and can be referenced by numbers. When the Group number starts from 0, several pairs of parentheses indicate that there are several groups, and the group can be nested. If the group number is 0, it indicates the entire expression. If the group number is 1, it indicates the first group, and so on.

For example, there are three groups in A (B) C (D) E Regular Expression: group 0 is ABCDE, group 1 is B, and group 2 is D;

There are four groups in A (B) C) (D) E Regular Expression: group 0 is ABCDE, group 1 is BC, group 2 is B, group 3 is C, group 4 is D.

 

Int groupCount (): returns the number of groups in the matching mode, excluding group 0th.

String group (): returns the 0th group of the previous matching operation (such as find.

String group (int group): returns the child sequence of the group specified during the previous matching operation. If the match succeeds but the specified group fails to match any part of the Character Sequence, null is returned.

Int start (int group): returns the initial index of the Child sequence matched by the group specified during the previous matching operation.

Int end (int group): returns the last index of the Child sequence matched by the group specified during the previous matching operation + 1.

 

5. Control the matching range

The most abnormal method is the lookingAt () method. The name is confusing and you need to carefully check APIDoc.

 

Start () returns the previously matched initial index.

End () returns the Offset after the last matched character.

 

Public boolean lookingAt () tries to match the input sequence starting from the beginning of the region with this pattern.

Similar to the matches method, this method always starts from the beginning of a region. Unlike this method, it does not need to match the entire region.

If the match is successful, you can use the start, end, and group methods to obtain more information.

Return Value:

True is returned only when the prefix of the input sequence matches the pattern of this pair.

 

6. Pattern mark

 

Static Methods of the Pattern class

Static Pattern compile (String regex, int flags)

Compile the given regular expression into a pattern with the given flag.

The flags parameter is the Pattern mark, which is very important in some cases.

 

Pattern. CANON_EQ

Enabling a specification is equivalent.

Pattern. CASE_INSENSITIVE

Enable case-insensitive matching.

Pattern. COMMENTS

In this mode, blank spaces and comments are allowed.

Pattern. DOTALL

Enable dotall mode.

Pattern. LITERAL

Enable mode.

Pattern. MULTILINE

Enable multiline mode.

Pattern. UNICODE_CASE

Enable Unicode-aware case folding.

Pattern. UNIX_LINES

Enable the Unix line mode.

 

3. String replacement

 

String. replace (char oldChar, char newChar)

Returns a new string generated by replacing all the oldChar in the string with newChar.

String. replace (CharSequence target, CharSequence replacement)

Replace this string with each substring that matches the target sequence of the literal value.

String. replaceAll (String regex, String replacement)

Replace this string with the given replacement string to match each substring of the given regular expression.

String. replaceFirst (String regex, String replacement)

Replace the string with the given replacement string to match the first substring of the given regular expression.

 

StringBuffer. replace (int start, int end, String str)

Replace the characters in the substring of this sequence with the characters in the given String.

StringBuilder. replace (int, int, java. lang. String)

Replace the characters in the substring of this sequence with the characters in the given String.

 

Matcher. replaceAll (String replacement)

The replacement mode matches each sub-sequence of the input sequence of the given replacement string.

Matcher. replaceFirst (String replacement)

The replacement mode is the first sub-sequence of the input sequence that matches the given replacement string.

 

Iv. String splitting

 

String [] split (String regex)

Splits the string based on the matching of the given regular expression.

String [] split (String regex, int limit)

Splits the string based on the given regular expression.

 

Of course, there is also a StringTokenizer class that can be used to split strings, but SUN is no longer recommended.

In fact, regular expressions can be used to segment the string.

 

5. Not mentioned

 

The regular expression API is simple and easy to use. It does not matter much if it is not complicated. The biggest difficulty of a regular expression is that it is proficient in writing regular expressions.

The regular expression specifications are described in detail in the Pattern class APIdoc and are well organized.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.