A detailed description of the Java regular expression

Source: Internet
Author: User

Java provides a powerful regular expression API, under the Java.util.regex package. This tutorial shows you how to use the regular expression API.

Regular expressions

A regular expression is a text pattern that is used for text searches. In other words, search for the pattern that appears in the text. For example, you can use regular expressions to search for e-mail addresses or hyperlinks in a Web page.

Regular expression Examples

The following is an example of a simple Java regular expression that is used to search for HTTP.//In text

String Text = "This is the text to being        searched" + "for occurrences of the/        http pattern."; String pattern = ". *http://.*"; Boolean matches = Pattern.matches (pattern, text); System.out.println ("matches =" + matches);

The sample code does not actually detect if the found http://is part of a legitimate hyperlink, such as containing the domain name and suffix (. com,.net, and so on). The code simply finds the string http://whether it appears.

API for regular expressions in Java6

This tutorial describes the API for regular expressions in Java6.

Pattern (Java.util.regex.Pattern)

Class Java.util.regex.Pattern is the primary entry in the Java Regular Expression API, and whenever you need to use regular expressions, start with the Pattern class

Pattern.matches ()

The most straightforward way to check whether a regular expression pattern matches a piece of text is to call the static method Pattern.matches (), as shown in the following example:

String Text = "This is the text to being        searched" +        "for occurrences of the pattern."; String pattern = ". *is.*"; Boolean matches = Pattern.matches (pattern, text); System.out.println ("matches =" + matches);

The above code finds whether the word "is" appears in the variable text, allowing "is" to contain 0 or more characters (specified by. *) before and after "is".

The Pattern.matches () method is useful for checking if a pattern appears once in one text, or for the default settings of the pattern class.

If you need to match multiple occurrences, even output different matching text, or just require a non-default setting. A pattern instance needs to be obtained through the Pattern.compile () method.

Pattern.compile ()

If you need to match a regular expression to appear more than once in the text, you need to create a pattern object with the Pattern.compile () method. Examples such as the following

String Text = "This is the text to being        searched" + "for occurrences of the/        http pattern."; String patternstring = ". *http://.*"; Pattern pattern = pattern.compile (patternstring);

You can specify a special flag in the Compile method:

Pattern pattern = Pattern.compile (patternstring, pattern.case_insensitive);

The pattern class contains multiple flags (int types) that control how pattern-matching patterns are used. The flags in the above code make pattern matching ignore case

Pattern.matcher ()

Once you have the pattern object, you can then get the Matcher object. The Matcher example is used to match patterns in text. Example below

Matcher Matcher = pattern.matcher (text);

The Matcher class has a matches () method that checks to see if the text matches the pattern. Here is a complete example of Matcher

String Text = "This is the text to being        searched" + "for occurrences of the/        http pattern."; String patternstring = ". *http://.*"; Pattern pattern = Pattern.compile (patternstring, pattern.case_insensitive); Matcher Matcher = pattern.matcher (text); Boolean matches = Matcher.matches (); System.out.println ("matches =" + matches);
Pattern.split ()

The split () method of the Pattern class, which can be used as a delimiter for a regular expression, to split the text into an array of type string. Example:

String Text = "A Sep text Sep with Sep many sep separators"; String patternstring = "Sep"; Pattern pattern = pattern.compile (patternstring); string[] Split = Pattern.split (text); System.out.println ("split.length =" + Split.length); for (String element:split) {    System.out.println ("element =" + E lement);}

In the example above, the text is divided into an array containing 5 strings.

Pattern.pattern ()

The pattern of the pattern class returns the regular expression used to create the pattern object, as an example:

String patternstring = "Sep"; Pattern pattern = pattern.compile (patternstring); String pattern2 = Pattern.pattern ();

The pattern2 value in the above code is SEP, which is the same as the patternstring variable.

Matcher (Java.util.regex.Matcher)

The Java.util.regex.Matcher class is used to match multiple occurrences of a regular expression in a piece of text, and Matcher also applies to matching the same regular expression in multiple text.

Matcher has many useful methods, please refer to the official Javadoc for details. Only the core method is described here.

The following code shows how to use the Matcher

String Text = "This is the text to being        searched" + "for occurrences of the/        http pattern."; String patternstring = ". *http://.*"; Pattern pattern = pattern.compile (patternstring); Matcher Matcher = pattern.matcher (text); Boolean matches = Matcher.matches ();

First create a pattern, then get matcher, call the Matches () method, return True to indicate pattern matching, and return false to indicate a mismatch.

You can do more with Matcher.

Create Matcher

Create a matcher by using the Matcher () method of the pattern.

String Text = "This is the text to being        searched" + "for occurrences of the/        http pattern."; String patternstring = ". *http://.*"; Pattern pattern = pattern.compile (patternstring); Matcher Matcher = pattern.matcher (text);
Matches ()

The matches () method of the Matcher class is used to match the regular expression in the text

Boolean matches = Matcher.matches ();

If the text matches the regular expression, the matches () method returns True. Otherwise, false is returned.

The matches () method cannot be used to find multiple occurrences of a regular expression. If necessary, use the Find (), start (), and End () methods.

Lookingat ()

Lookingat () is similar to the matches () method, and the biggest difference is that the Lookingat () method matches the beginning of the text with a regular expression;

Matches () matches the entire text to a regular expression. In other words, if the regular expression matches the beginning of the text without matching the entire text, Lookingat () returns True, and matches () returns false. Example:

String Text = "This is the text to being        searched" + "for occurrences of the/        http pattern."; String patternstring = "This is the"; Pattern pattern = Pattern.compile (patternstring, pattern.case_insensitive); Matcher Matcher = pattern.matcher (text); System.out.println ("Lookingat =" + Matcher.lookingat ()); System.out.println ("matches   =" + matcher.matches ());

The above example matches the regular expression "This is the" for the beginning of the text and the entire text respectively. The method that matches the beginning of the text (Lookingat ()) returns True.

The method of matching the regular expression to the entire text (matches ()) returns false because the entire text contains extra characters, and the regular expression requires that the text exactly match "This is the", and that there cannot be extra characters before and after.

Find () + start () + End ()

The Find () method is used to find the regular expression that appears in the text, which is passed in by the Pattern.matcher (text) method when the Matcher is created. If multiple matches are made in the text, the Find () method returns the first, and each call to find () returns to the next one.

Start () and end () return the start and end positions of each matched string in the entire text. In fact, end () returns the last bit at the end of the string, so that the return value of start () and end () can be used directly in string.substring ().

String Text = "This is the text which are to being        searched" +        "for occurrences of the word ' is '."; String patternstring = "is"; Pattern pattern = pattern.compile (patternstring); Matcher Matcher = pattern.matcher (text); int count = 0;while (Matcher.find ()) {    count++;    System.out.println ("Found:" + Count + ":"  + matcher.start () + "-" + matcher.end ());}

This example finds the pattern "is" 4 times in the text and outputs the following:

Found:1:2-4 found:2:5-7 found:3:23-25 found:4:70-72
Reset ()

The Reset () method resets the matching state inside the matcher. When the Find () method starts to match, the Matcher internally records the distance up to the current lookup. Calling reset () will look again at the beginning of the text.

You can also call the Reset (charsequence) method. This method resets the Matcher and passes a new string as a parameter in place of the original string that created the Matcher.

Group ()

Suppose you want to find a URL link in a text and want to extract the link you found. Of course, this can be done through the start () and End () methods. But using the group () method is easier.

Groupings are expressed in parentheses in regular expressions, for example:

(John)

This regular expression matches John, and the parentheses do not belong to the text to match. Parentheses define a grouping. When a regular expression is matched to text, you can access the parts within the grouping.

Use the group (int groupno) method to access a grouping. A regular expression can have more than one grouping. Each grouping is marked by a pair of parentheses. To access the text of a grouping match in a regular expression, you can pass the group number to the group (int groupno) method.

Group (0) represents the entire regular expression, in order to get a grouping with parentheses, the grouping number should be calculated starting from 1.

String text    =  "John writes about this, and John writes about that," +                        "and John writes about everything. "  ; String patternString1 = "(John)"; Pattern pattern = pattern.compile (patternString1); Matcher Matcher = pattern.matcher (text), while (Matcher.find ()) {    System.out.println ("Found:" + matcher.group (1));}

The above code searches the text for the word John. From each matching text, extract the Grouping 1, which is the part marked by parentheses. The output is as follows

Found:john Found:john Found:john
Multi-group

As mentioned above, a regular expression can have multiple groupings, for example:

(John) (.+?)

This expression matches the text "John" followed by a space, followed by 1 or more characters, followed by a space. You may not see the last space.

This expression includes some words have special meaning. Character points. Represents any character. The character + represents one or more occurrences, and. Together represents any character that occurs one or more times. Character? Represents the text that matches as short as possible.

The complete code is as follows

String text    =          "John writes about this, and John Doe writes on that," +                  "and John Wayne writes about EVERYT Hing. "        ; String patternString1 = "(John) (. +?)"; Pattern pattern = pattern.compile (patternString1); Matcher Matcher = pattern.matcher (text), while (Matcher.find ()) {    System.out.println ("Found:" + matcher.group (1) +< c5/> ""       + matcher.group (2));}

Notice how the grouping is referenced in the code. The code output is as follows

Found:john writes Found:john Doe Found:john Wayne
Nested groupings

Grouping in regular expressions can nest groups, for example

(John) (. +?))

This is the previous example and is now placed in a large group. (There is a space at the end of the expression).

When nested groupings are encountered, the grouping numbers are determined by the order of the opening brackets. In the example above, group 1 is the large group. Group 2 is a group that includes John, and group 3 is included. +? The group. It is important to understand these when you need to refer to groupings through groups (int groupno).

The following code shows how to use nested groupings

String text    =          "John writes about this, and John Doe writes on that," +                  "and John Wayne writes about EVERYT Hing. "        ; String patternString1 = "((John) (. +?))"; Pattern pattern = pattern.compile (patternString1); Matcher Matcher = pattern.matcher (text), while (Matcher.find ()) {    System.out.println ("Found:   ");}

The output is as follows

Found:  found:  found:
ReplaceAll () + Replacefirst ()

The ReplaceAll () and Replacefirst () methods can be used to replace part of the Matcher search string. The ReplaceAll () method replaces all regular expressions that match, and Replacefirst () replaces only the first matching one.

Before processing, the Matcher will be reset first. So the match expression here is calculated from the beginning of the text.

Examples such as the following

String text    =          "John writes about this, and John Doe writes on that," +                  "and John Wayne writes about EVERYT Hing. "        ; String patternString1 = "((John) (. +?))"; Pattern pattern = pattern.compile (patternString1); Matcher Matcher = pattern.matcher (text); String ReplaceAll = Matcher.replaceall ("Joe Blocks"); System.out.println ("ReplaceAll   =" + ReplaceAll); String Replacefirst = Matcher.replacefirst ("Joe Blocks"); System.out.println ("Replacefirst =" + Replacefirst);

The output is as follows

ReplaceAll = Joe Blocks about this, and Joe Blocks writes on That,and Joe Blocks writes about Everything.replacefirst = Joe Blocks about this, and John Doe writes on That,and John Wayne writes about everything.

The line breaks and indents in the output are added for readability.

Notice that all occurrences of John followed by a word in the 1th string are replaced by Joe Blocks. In a 2nd string, only the first occurrence is replaced.

Appendreplacement () + Appendtail ()

The Appendreplacement () and Appendtail () methods are used to replace the string phrase in the input text and append the replaced string to a stringbuffer.

When the Find () method finds a match, it can call the Appendreplacement () method, which causes the input string to be incremented to stringbuffer, and the matching text is replaced. Starts at the end of the last matching text until the matching text is copied.

Appendreplacement () records the contents of the copy StringBuffer and can continue to call find () until there is no match.

Until the last matching item, the remaining portion of the input text is not copied to StringBuffer. This part of the text is from the end of the last match, to the end of the text section. By calling the Appendtail () method, you can copy this part of the content into the StringBuffer.

String text    =          "John writes about this, and John Doe writes on that," +                  "and John Wayne writes about EVERYT Hing. "        ; String patternString1 = "((John) (. +?))"; Pattern      Pattern      = Pattern.compile (patternString1); Matcher      Matcher      = pattern.matcher (text); StringBuffer StringBuffer = new StringBuffer (), while (Matcher.find ()) {    matcher.appendreplacement (StringBuffer, " Joe Blocks ");    System.out.println (Stringbuffer.tostring ());} Matcher.appendtail (StringBuffer); System.out.println (Stringbuffer.tostring ());

Note that we call the Appendreplacement () method in the while loop. Call Appendtail () after the loop is complete. The code output is as follows:

Joe Blocks Joe Blocks about this, and Joe Blocks Joe Blocks on this, and Joe Blocks writes on that, and Joe Blocks J OE Blocks about this, and Joe Blocks writes on that, and Joe Blocks writes about everything.
Java Regular Expression syntax

For more efficient use of regular expressions, you need to understand the regular expression syntax. The regular expression syntax is complex and can write very high-level expressions. These grammatical rules can only be mastered by a great deal of practice.

In this text, we will use an example to understand the basic part of the regular expression syntax. The focus will be on the core concepts that need to be understood in order to use regular expressions and will not involve too much detail. For a detailed explanation, see the Pattern class in Java DOC.

Basic syntax

Before introducing advanced features, let's take a quick look at the basic syntax of the regular expression.

Character

is one of the most frequently used expressions in regular expressions, and the function is to simply match a certain character. For example:

John

This simple expression will match the John text in one of the input text.

You can use any English character in an expression. You can also use the character for the 8-binary, 16-binary, or Unicode-encoded representation. For example:

101\x41\u0041

All 3 of the above expressions represent uppercase characters a. The first is the 8 binary encoding (101), the 2nd is the 16 encoding (41), and the 3rd is the Unicode encoding (0041).

Character classification

Character classification is a structure that can be matched against multiple characters rather than just one character. In other words, a character classification matches one character in the input text, which corresponds to multiple allowable characters in the character classification. For example, if you want to match a character, a, B or C, the expression is as follows:

[ABC]

Use one of the brackets [] to denote the character classification. The square brackets themselves are not part of the match.

Many things can be done with character classification. For example, to match the word John, the first letter can be uppercase and lowercase J.

[Jj]ohn

The character class [Jj] matches J or J, and the remaining Ohn will match the character Ohn exactly.

Predefined character classifications

There are some predefined character classifications that can be used in regular expressions. For example, \d represents any number, \s represents any white space character, and \w represents any word character.

Predefined character classifications do not need to be enclosed in square brackets, but can also be used in combination

\d[\d\s]

The 1th matches any number, and the 2nd matches any number or whitespace character.

A complete list of predefined character classifications, which is listed at the end of this article.

Boundary matching

Regular expressions support matching boundaries, such as word boundaries, beginning or end of text. For example, \w matches a word, ^ matches the beginning of the line, and $ matches the end of lines.

^this is a single line$

The above expression matches a line of text, with only the text this is a. Notice the beginning and end of the line flag, which means that there cannot be any text behind the text, only the beginning and end of the line.

A complete list of matching boundaries is listed at the end of this article.

quantifier Matching

Quantifiers can match one expression to appear multiple times. For example, the following expression matches the letter a appears 0 or more times.

A *

Quantifiers * are expressed 0 or more times. + represents 1 or more times. Represents 0 or 1 times. There are some other quantifiers, see the list later in this article.

Quantifier matching is divided into hunger mode, greedy mode, exclusive mode. The hunger pattern matches as little text as possible. The greedy pattern matches as much text as possible. The exclusive pattern matches as much text as possible and even causes the remaining expression to fail.

The following shows the hunger mode, greedy mode, exclusive mode differences. Assume the following text:

John went for a walk, and John Fell, and John hurt his knee.

Expression in Hunger Mode:

John.*?

This expression matches John followed by 0 or more characters: Represents any character. * Denotes 0 or more times. Follow * * * * * use hunger mode.

In hunger mode, quantifiers will only match as few characters as 0 characters. The expression in the previous example will match the word John, appearing 3 times in the input text.

If you change to greedy mode, the expression is as follows:

john.*

In greedy mode, quantifiers match as many characters as possible. Now the expression matches the first John that appears, and all the remaining characters are matched in greedy mode. This way, there is only one match.

Finally, we change to exclusive mode:

John.*+hurt

* followed by + denotes exclusive mode quantifier.

This expression has no matches in the input text, although John and hurt are included in the text. Why is that? Because. *+ is an exclusive mode. With greedy mode, match as many text as possible so that the entire expression matches differently. The exclusive pattern will match as many matches as possible, but does not take into account whether the remainder of the expression can match.

The. *+ will match all characters after the first John, which causes the remaining hurt in the expression to have no match. If you change to greedy mode, there will be a match. The expression is as follows:

John.*hurt
logical operators

Regular expressions support a small number of logical operations (with, or, not).

With the operation is the default, the expression John, which means J with O with H and N.

Or the operation needs to display the specified, with | Said. For example, an expression john|hurt means John or hurt.

Character
. Any English letter
\\ Backslash, a separate backslash as an escape character, used with other special characters. If you want to match the backslash itself, you need to escape. Two backslashes actually match the 8 binary representation of a backslash n character. N values from 0 to 7
Nn The 8 binary representation of the character. N values from 0 to 7
Mnn The 8 binary representation of the character. M takes values from 0 to 3, n values from 0 to 7
\xhh The 16 binary representation of the character.
\uhhhh The 16 binary of the character represents 0xhhhh. Corresponding Unicode encoded characters
\ t The indent character.
\ n Line break (Unicode: ' \u000a ')
\ r return character (Unicode: ' \u000d ')
\f tab (Unicode: ' \u000c ')
\a Alert (ringtone) character (Unicode: ' \u0007′)
\e Escape character (Unicode: ' \u001b ')
\cx Control X
Character classification
[ABC] Match A, or B or C
[^ABC] Matches a character that is not a a,b,c, is a negative match
[A-za-z] Matches A to Z, a to Z direct character, is range match
[A-d[m-p]] Match A to D character or M to P character, which is a set match
[A-z&&[def]] Match D, E, or F. is the intersection match (here is the intersection between the range A-Z and the character def).
[A-Z&&[^BC]] Matches all characters between A-Z and excludes BC characters. is the subtraction match
[A-z&&[^m-p]] Matches all characters between A-Z, excluding the characters between m-p are subtraction matches
Built-in character classification
. Matches any one character, depending on whether the pattern created is an incoming flag, which may match the line end character
\d Match any number [0-9]
\d Match any non-numeric [^0-9]
\s Match any whitespace (space, indent, line break, carriage return)
\s Match any non-whitespace character
\w Match any word
\w Match any non-word
Boundary matching
^ Match beginning
$ Match end of Line
\b Match word boundaries
\b Match non-word boundaries
\a Match text start
\g Match the end of the previous match
\z Matches the end of the input text except the final terminator if any
\z Match end of text
Quantifiers
Greedy mode Hunger mode Exclusive mode
X? X?? x?+ Match 0 or 1 times
x* X*? x*+ Match 0 or more times
x+ X+? X + + Match 1 or more times
X{n} X{n}? x{n}+ Match n Times
X{n,} X{n,}? x{n,}+ Match at least N times
X{n, M} X{n, M}? X{n, m}+ Matches at least n times, up to M times


A detailed description of the Java regular expression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.