Getting started with Java regular expressions

Source: Internet
Author: User
Tags alphabetic character


When do I use regular expressions?

In the process of program development, it is often necessary to perform matching, searching, extracting, replacing and judging the text content repeatedly. It is time-consuming for individuals to use code only to achieve these functions. A regular expression that can be used to describe or match a series of strings that conform to a syntactic rule. Therefore, the use of regular expressions, can be used to contain a variety of character lyrics LRC file extract valid information, and the collation of preservation. A regular expression is often referred to as a pattern, which is used to describe or match a series of strings that conform to a certain syntactic rule.

Second, the use of regular expressions

In Java, regular expressions are used primarily in two classes:

Pattern, which can be thought of as a compiled object of an action rule, that is, a compiled regular expression that can be used to process the corresponding character according to the specified rules.

Matcher, which is a tool for matching characters by a specified rule.

To cite a popular example, there are a lot of fruits, including apples and pears, bananas and so on. There is a rule that tells you to look for an apple, where pattern is in the packaging of [Apple]. Then you take a picture of the apple with your camera and then you match it with the photo to the fruit, which is the equivalent of Matcher.


Java.util.regex.Pattern the compiled representation of the regular expression. A regular expression that is specified as a string must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that, according to the regular expression, can match any sequence of characters. All the states involved in performing a match reside in the match, so multiple matches can share the same pattern.
Therefore, the typical invocation order is:


Pattern p = pattern.compile ("A*b");
Matcher m = P.matcher ("Aaaaab");
Boolean B = m.matches ();

The usual method of pattern:

Compiles () compiles the given regular expression into the pattern

Matcher () creates a match between the given input and this pattern

Split () splits the input sequence around this pattern

Java.util.regex.Mather the engine that performs a matching operation on the character sequence by interpreting the Pattern.


Creates a match from a pattern by invoking the Matcher method of the pattern. After you create a match, you can use it to perform three different matching operations:


Matches () matches the entire input sequence to the pattern.


Lookingat () matches the input sequence from the beginning to the pattern.


The Find () method scans the input sequence to find the next sub-sequence that matches the pattern.


Start () returns the initial index of the match

End () returns the offset of the last matched character

Group (int group) returns the input subsequence captured by a given group during a match operation
For the match m, input sequence s, and group index G, the expression m.group (g) and S.substring (M.start (g), M.end (g)) are equivalent

Structure of common expressions:
Choose
| The vertical delimiter represents the selection. For example, "Gray|grey" can match grey or gray.
Quantity limit
The number qualifier after a character is used to limit the number of occurrences that the preceding character allows. The most common number qualifiers include "+", "?" and "*" (no quantity limit is represented once and appears only once):
The + plus sign indicates that the preceding character must appear at least once. (1 or more times). For example, "Goo+gle" can match Google, Gooogle, goooogle, etc.;
? A question mark indicates that the preceding character can appear at most one time. (0 times, or 1 times). For example, "colou?r" can match color or colour;
* Asterisks indicate that the preceding characters may not appear, or may appear one or more times. (0 times, or 1 times, or more). For example, "0*42" can match 42, 042, 0042, 00042, and so on.
The
Parentheses can be used to define the scope and precedence of an operator. For example, "GR (a|e) Y" is equivalent to "Gray|grey", "(grand) father" matches father and grandfather.


The complete collection of regular expressions

Character Describe
\ Marks the next character as a special character, or a literal character, or a backward reference, or an octal escape. For example, "n" matches the character "n". "\ n" matches a line break. Serial "\ \" matches "\" and "\ (" Matches "(".
^ Matches the starting position of the input string. If the multiline property of the RegExp object is set, ^ also matches the position after "\ n" or "\ r".
$ Matches the end position of the input string. If the multiline property of the RegExp object is set, $ also matches the position before "\ n" or "\ r".
* Matches the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * Equivalent to {0,}.
+ Matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but not "Z". + equivalent to {1,}.
? Matches the preceding subexpression 0 or one time. For example, "Do (es)?" You can match "do" in "does" or "does".? = {0,1}.
N N is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food".
{N,} N is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{N,m} Both M and n are non-negative integers, where n<=m. Matches at least n times and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Note that there can be no spaces between a comma and two numbers.
? When the character immediately follows any other restriction (*,+,?,{n},{n,},{n,m}), the matching pattern is non-greedy. The non-greedy pattern matches the searched string as little as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "o+?" A single "O" will be matched, and "o+" will match all "O".
. Matches any single character except "\ n". To match any character that includes "\ n", use the Like "(. | \ n) "mode.
(pattern) Match pattern and get this match. The obtained matches can be obtained from the resulting matches collection, the Submatches collection is used in VBScript, and the $0...$9 property is used in JScript. To match the parentheses character, use "\ (" or "\").
(?:p Attern) Matches pattern but does not get a matching result, which means that this is a non-fetch match and is not stored for later use. This is used in the or character "(|)" It is useful to combine the various parts of a pattern. For example, "Industr (?: y|ies)" is a more abbreviated expression than "industry|industries".
(? =pattern) Positive pre-check to match the find string at the beginning of any string matching pattern. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example, "Windows (? =95|98| nt|2000) "Can match" windows "in" Windows2000 ", but does not match" windows "in" Windows3.1 ". Pre-checking does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check.
(?! Pattern Forward negation, matching the lookup string at the beginning of any mismatched pattern string. This is a non-fetch match, which means that the match does not need to be acquired for later use. For example, "Windows (?! 95|98| nt|2000) "Can match" windows "in" Windows3.1 ", but does not match" windows "in" Windows2000 ". Pre-check does not consume characters, that is, after a match occurs, the next matching search starts immediately after the last match, rather than starting with the character that contains the pre-check
(? <=pattern) Reverse affirmation pre-check, and positive forward certainly pre-check class quasi, just the opposite direction. For example, "(? <=95|98| nt|2000) Windows can match "Windows" in 2000Windows, but not "windows" in "3.1Windows".
(? <!pattern) Reverse negation of pre-check, and positive negative pre-check class quasi-, just the opposite direction. For example "(? <!95|98| nt|2000) Windows can match "Windows" in 3.1Windows, but not "windows" in "2000Windows".
X|y Match x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food".
[XYZ] The character set is combined. Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain".
[^XYZ] Negative character set. Matches any character that is not contained. For example, "[^ABC]" can match "P" in "plain".
[A-z] The character range. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" to "Z".
[^a-z] A negative character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" to "Z".
\b Matches a word boundary, which is the position between a word and a space. For example, "er\b" can Match "er" in "never", but cannot match "er" in "verb".
\b Matches a non-word boundary. "er\b" can Match "er" in "verb", but cannot match "er" in "Never".
\cx Matches the control character indicated by X. For example, \cm matches a control-m or carriage return. The value of x must be one of a-Z or a-Z. Otherwise, c is considered to be a literal "C" character.
\d Matches a numeric character. equivalent to [0-9].
\d Matches a non-numeric character. equivalent to [^0-9].
\f Matches a page break. Equivalent to \x0c and \CL.
\ n Matches a line break. Equivalent to \x0a and \CJ.
\ r Matches a carriage return character. Equivalent to \x0d and \cm.
\s Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s Matches any non-whitespace character. equivalent to [^\f\n\r\t\v].
\ t Matches a tab character. Equivalent to \x09 and \ci.
\v Matches a vertical tab. Equivalent to \x0b and \ck.
\w Matches any word character that includes an underscore. Equivalent to "[a-za-z0-9_]".
\w Matches any non-word character. Equivalent to "[^a-za-z0-9_]".
\xn Match N, where n is the hexadecimal escape value. The hexadecimal escape value must be two digits long for a determination. For example, "\x41" matches "A". "\x041" is equivalent to "\x04&1". ASCII encoding can be used in regular expressions:
\num Matches num, where num is a positive integer. A reference to the obtained match. For example, "(.) \1 "matches two consecutive identical characters.
\ n Identifies an octal escape value or a backward reference. n is a backward reference if \ n is preceded by at least one of the sub-expressions obtained. Otherwise, if n is the octal number (0-7), N is an octal escape value.
\nm Identifies an octal escape value or a backward reference. If at least NM has obtained a subexpression before \nm, then NM is a backward reference. If there are at least N fetches before \nm, then n is a backward reference followed by the literal m. If none of the preceding conditions are met, if both N and M are octal digits (0-7), then \nm will match the octal escape value nm.
\nml If n is an octal number (0-3) and both M and L are octal digits (0-7), the octal escape value NML is matched.
\un Match N, where N is a Unicode character represented by four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?).


The practice test of the regular expression
The following examples are used to gain an in-depth understanding of the use of the Java expression:
In Java, regular expressions typically have 4 functions:
1. Find
2. Extract
3. Segmentation
4. Replace (delete)

Do a simple test to familiarize yourself with the Pattern,matcher classes and their corresponding method meanings:

import Java.util.regex.matcher;import Java.util.regex.pattern;public class Patterntest {private static String TEST = "Kelvin        Li and Kelvin Chan is both working in Kelvin Chen's Kelvinsoftshop company ";        /** * @param args */public static void main (string[] args) {//TODO auto-generated method stub                Pattern pattern = pattern.compile ("Kelvin");        Matcher Matcher = Pattern.matcher (TEST);        StringBuffer sb = new StringBuffer ();        Boolean result = Matcher.find ();  while (result) {matcher.appendreplacement (SB, "Kevin");//This operation would replace the matched strings with            You provide when the pattern is met.        result = Matcher.find ();        } matcher.appendtail (SB);//app the rest of strings after the match work done.    System.out.println ("The final result is:" + sb.tostring ()); }}

Printed results: The final result Is:kevin Li and Kevin Chan is both working in Kevin Chen's Kevinsoftshop Company

Getting started with Java regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.