Getting started with Java regular Expressions __ regular expressions

Source: Internet
Author: User
Tags alphabetic character control characters numeric stringbuffer

When to use regular expressions.

In the process of program development, it is often necessary to perform the matching, searching, extracting, replacing and judging of the text content repeatedly. If the individual only use code to achieve the above functions, time-consuming. A regular expression that can be used to describe or match a sequence of strings that conform to a certain syntactic rule. Therefore, with the help of regular expressions, it is possible to extract valid information from a variety of character lyrics LRC files and to classify and save them. A regular expression is often referred to as a pattern, which is used to describe or match a series of strings that conform to a certain syntactic rule.

Second, the use of regular expressions

In Java, regular expressions are used primarily in two classes:

pattern, which can be viewed as a compiled object of an action rule, a compiled regular expression that can be used to handle the corresponding character in the specified rules.

Matcher is a tool for matching characters to a specified rule.

A popular example, there are a lot of fruit, including apples and pears, bananas and so on. There is a rule that tells you to look for an apple, where pattern is the same as the [Apple] package. Then you took a photo of the apple with your camera and then went to the fruit to match it with the photograph, which is equivalent to Matcher.
Java.util.regex.Pattern the compiled representation of a regular expression. A regular expression specified as a string must first be compiled into an instance of this class. The resulting pattern can then be used to create the Matcher object, which, according to the regular expression, can match any sequence of characters. All States involved in performing a match reside in the match, so multiple matches can share the same pattern.
Therefore, the typical invocation order is:


Pattern p = pattern.compile ("A*b");
Matcher m = P.matcher ("Aaaaab");
Boolean B = m.matches ();

Pattern-Commonly used methods:

Compiles () compiles the given regular expression into the pattern

Matcher () creates a match between the given input and this pattern

Split () splits the input sequence around this pattern

Java.util.regex.Mather the engine that performs matching operations on character sequence by interpreting pattern.


Creates a match from the pattern by invoking the Matcher method of the pattern. After you create a match, you can use it to perform three different matching operations:


Matches () matches the entire input sequence with the pattern.


Lookingat () matches the input sequence from the beginning to the pattern.


The Find () method scans the input sequence to look for the next subsequence that matches the pattern.


Start () returns a matching initial index

End () returns the offset of the last matched character

Group (int group) returns the input subsequence captured by a given group during a match operation
For the match m, the input sequence s and the group index G, the expression m.group (g) and S.substring (M.start (g), M.end (g)) are equivalent

Structure of common expressions:
Choose
| The vertical separator represents the selection. For example, "Gray|grey" can match grey or gray.
Quantity limit
The quantity qualifier after a character is used to limit the number of occurrences of the preceding character. The most common number qualifiers include "+", "?" and "*" (in the case of a quantity qualifier, one occurrence and one occurrence only):
The + plus sign indicates that the preceding character must appear at least once. (1 times, or many times). For example, "Goo+gle" can match Google, Gooogle, goooogle, etc.;
? A question mark indicates that the preceding character can appear only once. (0 times, or 1 times). For example, "colou?r" can match color or colour;
* The asterisk indicates that the preceding character may not appear, or it can occur one or more times. (0 times, or 1 times, or many times). For example, "0*42" can match 42, 042, 0042, 00042, and so on.
The
Parentheses can be used to define the scope and precedence of an operator. For example, "GR (a|e) Y" is equivalent to "Gray|grey", "(grand)? Father" matches father and grandfather.


The complete works of regular expressions

Character description
\ marks the next character as a special character, or a literal character, or a backward reference, or a octal escape character. For example, "n" matches the character "n". ' \ n ' matches a newline character. Serial "\ \" matches "\" and "\ (matches" ().
^ matches the start position of the input string. If the multiline property of the RegExp object is set, ^ also matches the position after "\ n" or "\ r".
$ matches the end position of the input string. If the multiline property of the RegExp object is set, the $ also matches the position before "\ n" or "\ r".
* Match the preceding subexpression 0 or more times. For example, zo* can match "z" and "Zoo". * is equivalent to {0,}.
+ matches the preceding subexpression one or more times. For example, "zo+" can Match "Zo" and "Zoo", but cannot match "Z". + is equivalent to {1,}.
? Match the preceding subexpression 0 times or once. For example, "Do (es)?" You can match ' do ' in ' does ' or ' does '.
{n} n is a non-negative integer. Matches the determined n times. For example, "o{2}" cannot match "O" in "Bob", but can match two o in "food".
{N,} n is a non-negative integer. Match at least n times. For example, "o{2,}" cannot match "O" in "Bob", but can match all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "o*".
{n,m} m and n are non-negative integers, of which n<=m. Matches n times at least and matches up to M times. For example, "o{1,3}" will match the first three o in "Fooooood". "o{0,1}" is equivalent to "O?". Notice that there is no space between the comma and the two number.
? When the character is immediately following any other qualifier (*,+,?,{n},{n,},{n,m}), the match pattern is not greedy. Non-greedy patterns match as few strings as possible, while the default greedy pattern matches as many of the searched strings as possible. For example, for the string "Oooo", "o+?" A single "O" will be matched, and "o+" will match all "O".
. Matches any single character except "\ n". To match any character including "\ n", use the image (. | \ n) "mode.
(pattern) matches the pattern and gets the match. The obtained matches can be obtained from the resulting matches collection, use the Submatches collection in VBScript, and use the $0...$9 property in JScript. To match the parentheses character, use "\ (" or "\)".
(?:p Attern) matches pattern but does not get matching results, which means that this is a non fetch match and is not stored for later use. This is in use or the character "(|)" It is useful to combine parts of a pattern. For example, "Industr (?: y|ies)" is an expression more abbreviated than "Industry|industries".
(? =pattern) forward positive check, match the lookup string at the beginning of any string matching pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, the Windows (? =95|98| nt|2000) "Can match windows in Windows2000, but cannot match windows in Windows3.1." It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
(?! pattern) positive negation, which matches the lookup string at the beginning of any string that does not match the pattern. This is a non-fetch match, that is, the match does not need to be acquired for later use. For example, Windows (?! 95|98| nt|2000) "Can match windows in Windows3.1, but cannot match windows in Windows2000." It does not consume characters, that is, after a match occurs, the next matching search begins immediately after the last match, instead of starting after the character that contains the pre-check.
(? <=pattern) Reverse positive check, and positive to confirm the quasi-check category, but in the opposite direction. For example, "(? <=95|98| nt|2000) Windows can match "Windows" in "2000Windows", but it does not match "windows" in "3.1Windows".
(? <!pattern) reverse negation of the check, and positive negative pre-search category, but the opposite direction. For example, "(? <!95|98| nt|2000) Windows can match "Windows" in "3.1Windows", but it does not match "windows" in "2000Windows".
X|y matches x or Y. For example, "Z|food" can match "Z" or "food". "(z|f) Ood" matches "Zood" or "food".
[XYZ] Character set combination. Matches any one of the characters contained. For example, "[ABC]" can Match "a" in "plain".
[^XYZ] Negative character set combination. Matches any characters that are not included. For example, "[^ABC]" can match "P" in "plain".
[A-z] character range. Matches any character within the specified range. For example, "[A-z]" can match any lowercase alphabetic character in the range "a" through "Z".
[^a-z] a negative character range. Matches any character that is not in the specified range. For example, "[^a-z]" can match any character that is not in the range "a" through "Z".
\b Matches a word boundary, which refers to the position between the word and the space. For example, "er\b" can Match "er" in "never", but cannot match "er" in "verb".
\b Matches a non word boundary. "er\b" can Match "er" in "verb", but cannot match "er" in "Never".
\CX matches the control characters indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be one-a-Z or a-Z. Otherwise, c is treated as a literal "C" character.
\d matches a numeric character. equivalent to [0-9].
\d matches a non-numeric character. equivalent to [^0-9].
\f matches a page feed character. Equivalent to \x0c and \CL.
\ n matches a newline character. Equivalent to \x0a and \CJ.
\ r matches a carriage return character. Equivalent to \x0d and \cm.
\s matches any white space character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s matches any non-white-space character. equivalent to [^\f\n\r\t\v].
\ t matches a tab character. Equivalent to \x09 and \ci.
\v matches a vertical tab. Equivalent to \x0b and \ck.
\w matches any word character that includes an underscore. Equivalent to "[a-za-z0-9_]".
\w matches any non word character. Equivalent to "[^a-za-z0-9_]".
\XN matches N, where n is the hexadecimal escape value. The hexadecimal escape value must be a determined two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04&1". You can use ASCII encoding in regular expressions ...
\num matches num, where num is a positive integer. A reference to the match that was obtained. For example, "(.) \1 "matches two consecutive identical characters.
\ n identifies a octal escape value or a backward reference. n is a backward reference if you have at least n obtained subexpression before \ nthe. Otherwise, if n is an octal number (0-7), then N is an octal escape value.
\NM identifies a octal escape value or a backward reference. NM is a backward reference if at least NM has obtained the subexpression before \nm. If there are at least N fetches before \nm, then n is a backward reference followed by a literal m. If all the preceding conditions are not satisfied, if both N and M are octal digits (0-7), then \nm will match octal escape value nm.
\NML if n is an octal number (0-3) and both M and L are octal digits (0-7), then the octal escape value NML is matched.
\un matches N, where N is a Unicode character represented in four hexadecimal digits. For example, \u00a9 matches the copyright symbol (?).


The practice test of regular expression
Here's an example of an in-depth understanding of the use of Java's regular expressions:
In Java, regular expressions are common in 4 different functions:
1. Find
2. Extract
3. Split
4. Replace (delete)

Do a simple test to familiarize yourself with the Pattern,matcher classes and their corresponding method meanings:

Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Patterntest {

    private static String TEST = "Kelvin Li and Kelvin Chan are both working in Kelvin Chen ' s K Elvinsoftshop Company ";
    
    /**
     * @param args
     *
    /public static void main (string[] args) {
        //TODO auto-generated method stub

        Patt Ern pattern = pattern.compile ("Kelvin");
        
        Matcher Matcher = Pattern.matcher (TEST);
        StringBuffer sb = new StringBuffer ();
        Boolean result = Matcher.find ();
        while (result) {
            matcher.appendreplacement (SB, "Kevin");/This operation'll replace the matched strings with you P Rovide when was met.
            result = Matcher.find ();
        }
        Matcher.appendtail (SB);//app the rest of strings after the match work done.
        System.out.println ("The final result is:" + sb.tostring ());
    }


Print results: The final result Is:kevin Li and Kevin Chan are both working in Kevin Chen's Kevinsoftshop Company

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.