Java Regular Expressions

Source: Internet
Author: User
Tags stringbuffer

Regular expression Instances
A string is actually a simple regular expression, such as the Hello World regular expression that matches the "Hello World" string.
The. (dot) is also a regular expression that matches any one of the words such as: "A" or "1".
The following table lists examples and descriptions of some regular expressions:
Regular Expression Description
This is text
Match string "This is text"
This\s+is\s+text
Note the \s+ in the string.
The \s+ after matching the word "This" can match multiple spaces, then match the is string, then \s+ match multiple spaces and then follow the text string.
Can match this instance: this is text
^\d+ (. \d+)?
^ defined with what to start with
\d+ match one or more numbers
? Setting the options in parentheses is optional
. Match "."
Instances that can be matched: "5", "1.5", and "2.21".
Java Regular expressions are the most similar to Perl.
The Java.util.regex package mainly consists of the following three classes:
Pattern class:
The pattern object is a compiled representation of a regular expression. The Pattern class has no public constructor method. To create a pattern object, you must first call its public static compilation method, which returns a Pattern object. The method takes a regular expression as its first argument.
Matcher class:
The Matcher object is the engine that interprets and matches the input string. Like the pattern class, Matcher does not have a public construction method. You need to invoke the Matcher method of the Pattern object to get a Matcher object.
Patternsyntaxexception:
Patternsyntaxexception is a non-mandatory exception class that represents a syntax error in a regular expression pattern.
Regular expressions are used in the following instances. Runoob.Used to find if the Runoob substring is wrapped in a string:
Instance
Import java.util.regex.*;

Class regexexample1{
public static void Main (String args[]) {
String content = "I am Noob" +
"From runoob.com.";

  String pattern = ".*runoob.*";  boolean isMatch = Pattern.matches(pattern, content);  System.out.println("字符串中是否包含了 ‘runoob‘ 子字符串? " + isMatch);

}
}
The result of the instance output is:
Does the string contain the ' Runoob ' substring? True
Capturing groups
A capturing group is a method of processing multiple characters as a single unit, which is created by grouping characters within parentheses.
For example, the regular expression (dog) creates a single group that contains "D", "O", and "G".
Capturing groups are numbered by calculating their opening brackets from left to right. For example, in an expression ((A) (B (C))), there are four such groups:
((A) (B (C)))
A
(B (C))
C
You can see how many groupings of an expression by calling the GroupCount method of the Matcher object. The GroupCount method returns an int value that indicates that the Matcher object currently has more than one capturing group.
There is also a special set (group (0)), which always represents the entire expression. The group is not included in the return value of GroupCount.
Instance
The following example shows how to find the number string from a given string:
Regexmatches.java File Code:
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Regexmatches
{
public static void Main (String args[]) {

  // 按指定模式在字符串查找  String line = "This order was placed for QT3000! OK?";  String pattern = "(\\D*)(\\d+)(.*)";  // 创建 Pattern 对象  Pattern r = Pattern.compile(pattern);  // 现在创建 matcher 对象  Matcher m = r.matcher(line);  if (m.find( )) {     System.out.println("Found value: " + m.group(0) );     System.out.println("Found value: " + m.group(1) );     System.out.println("Found value: " + m.group(2) );     System.out.println("Found value: " + m.group(3) );   } else {     System.out.println("NO MATCH");  }

}
}
The results of the above example compilation run as follows:
Found value:this Order is placed for qt3000! Ok?
Found value:this Order is placed for QT
Found value:3000
Found Value:! Ok?
Regular expression syntax
In other languages, \ means: I want to insert a normal (literal) backslash in the regular expression, so please do not give it any special meaning.
In Java, \ means: I want to insert a backslash for a regular expression, so the character after it has a special meaning.
Therefore, in other languages, a backslash \ is sufficient to be escaped, whereas in regular expressions it is necessary to have two backslashes to be parsed into the escape function in other languages. It is also easy to understand that in regular expressions, two \ represents one \ In another language, which is why a regular expression that represents a single digit is \d, and a normal backslash is \ \.
Character
Description
\
Marks the next character as a special character, text, reverse reference, or octal escape. For example, "n" matches the character "n". "\ n" matches the line break. The sequence "\ \" matches "\", "\ (" Match "(".
^
Matches the starting position of the input string. If the Multiline property of the RegExp object is set, ^ will also match the position after "\ n" or "\ r".
$
Matches the position of the end of the input string. If you set the Multiline property of the RegExp object, the $ will also match the position before \ n or \ r.

Matches the preceding character or sub-expression 0 or more times. For example, Zo
Match "Z" and "Zoo".Equivalent to {0,}.
+
Matches the preceding character or sub-expression one or more times. For example, "zo+" matches "Zo" and "Zoo", but does not match "Z". + equivalent to {1,}.
?
Matches the preceding character or sub-expression 0 or one time. For example, "Do (es)?" Match "Do" in "do" or "does".? Equivalent to {0,1}.
N
N is a non-negative integer. Matches exactly n times. For example, "o{2}" does not match "O" in "Bob", but matches two "o" in "food".
{N,}
N is a non-negative integer. Match at least n times. For example, "o{2,}" does not match "O" in "Bob", but matches all o in "Foooood". "O{1,}" is equivalent to "o+". "O{0,}" is equivalent to "O
"。
{N,m}
M and n are non-negative integers, where n <= m. Matches at least n times, up to M times. For example, "o{1,3}" matches the first three o in "Fooooood". ' o{0,1} ' is equivalent to ' O? '. Note: You cannot insert a space between a comma and a number.
?
When this character follows any other qualifier (*, + 、?、 {n}, {n,}, {n,m}), the matching pattern is "non-greedy". The "non-greedy" pattern matches the shortest possible string searched, while the default "greedy" pattern matches the string that is searched for as long as possible. For example, in the string "Oooo", "o+?" Only a single "O" is matched, and "o+" matches All "O".
.
Matches any single character except for "\ r \ n". To match any character that includes "\ r \ n", use a pattern such as "[\s\s]".
(pattern)
Matches the pattern and captures the matched sub-expression. You can use the $0...$9 property to retrieve a captured match from the result "match" collection. To match the bracket character (), use (or).
(?:p Attern)
A subexpression that matches the pattern but does not capture the match, that is, it is a non-capturing match and does not store a match for later use. This is useful for combining pattern parts with the "or" character (|). For example, ' Industr (?: y|ies) is a more economical expression than ' industry|industries '.
(? =pattern)
A subexpression that performs a forward lookahead search that matches the string at the starting point of the string that matches the pattern. It is a non-capture match, that is, a match that cannot be captured for later use. For example, ' Windows (? =95|98| nt|2000) ' Matches Windows 2000 ' in Windows, but does not match Windows 3.1 in Windows. Lookahead does not occupy characters, that is, when a match occurs, the next matching search immediately follows the previous match, rather than the word specifier that makes up the lookahead.
(?! Pattern
A subexpression that performs a reverse lookahead search that matches a search string that is not at the starting point of a string that matches the pattern. It is a non-capture match, that is, a match that cannot be captured for later use. For example, ' Windows (?! 95|98| nt|2000) ' matches Windows 3.1 ' in Windows, but does not match Windows 2000 in Windows. Lookahead does not occupy characters, that is, when a match occurs, the next matching search immediately follows the previous match, rather than the word specifier that makes up the lookahead.
X|y
Match x or Y. For example, ' Z|food ' matches ' z ' or ' food '. ' (z|f) Ood ' matches "Zood" or "food".
[XYZ]
Character. Matches any one of the characters contained. For example, "[ABC]" matches "a" in "plain".
[^XYZ]
The reverse character set. Matches any characters that are not contained. For example, "[^abc]" matches "plain" in "P", "L", "I", "N".
[A-z]
The character range. Matches any character within the specified range. For example, "[A-z]" matches any lowercase letter in the range "a" to "Z".
[^a-z]
The inverse range character. Matches any character that is not in the specified range. For example, "[^a-z]" matches any character that is not in the range "a" to "Z".
\b
Matches a word boundary, which is the position between the word and the space. For example, "er\b" matches "er" in "never", but does not match "er" in "verb".
\b
Non-word boundary match. "er\b" matches "er" in "verb", but does not match "er" in "Never".
\cx
Matches the control character indicated by X. For example, \cm matches a control-m or carriage return character. The value of x must be between A-Z or a-Z. If this is not the case, then the C is assumed to be the "C" character itself.
\d
numeric character matching. equivalent to [0-9].
\d
Non-numeric character matching. equivalent to [^0-9].
\f
The page break matches. Equivalent to \x0c and \CL.
\ n
Line break matches. Equivalent to \x0a and \CJ.
\ r
Matches a carriage return character. Equivalent to \x0d and \cm.
\s
Matches any whitespace character, including spaces, tabs, page breaks, and so on. equivalent to [\f\n\r\t\v].
\s
Matches any non-whitespace character. equivalent to [^ \f\n\r\t\v].
\ t
TAB matches. Equivalent to \x09 and \ci.
\v
Vertical tab matches. Equivalent to \x0b and \ck.
\w
Matches any character, including underscores. with "[A-za-z0-9] "equivalent.
\w
Matches any non-word character. With "[^a-za-z0-9
] "equivalent.
\xn
Match N, where N is a hexadecimal escape code. The hexadecimal escape code must be exactly two digits long. For example, "\x41" matches "A". "\x041" is equivalent to "\x04" & "1". Allows the use of ASCII code in regular expressions.
\num
Matches num, where num is a positive integer. To capture a matching reverse reference. For example, "(.) \1 "matches two consecutive identical characters.
\ n
Identifies an octal escape code or a reverse reference. If there are at least N captured subexpression in front of it, then N is a reverse reference. Otherwise, if n is an octal number (0-7), then N is the octal escape code.
\nm
Identifies an octal escape code or a reverse reference. If there is at least a NM capture subexpression in front of the \nm, then NM is a reverse reference. If there are at least N captures in front of the \nm, then n is a reverse reference followed by the character M. If neither of the preceding cases exists, then \nm matches the octal value nm, where N and M are octal digits (0-7).
\nml
When n is an octal number (0-3), M and L are octal numbers (0-7), the octal escape code NML is matched.
\un
Matches n, where N is a Unicode character represented by a four-bit hexadecimal number. For example, \u00a9 matches the copyright symbol (?).
According to the requirements of Java Language Specification, the backslash in a string of Java source code is interpreted as Unicode escape or other character escaping. Therefore, you must use two backslashes in string literals to indicate that regular expressions are protected from being interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal "\b" matches a single backspace character, while "\b" matches the word boundary. The string literal "(hello)" is illegal and will cause a compile-time error; to match the string (hello), you must use the string literal "\ (hello\)".
Methods of the Matcher class
Index method
The index method provides useful index values that exactly indicate where the match is found in the input string:
Serial number method and description
1 public int Start ()
Returns the initial index of the previous match.
2 public int start (int group)
Returns the initial index of a subsequence captured by a given group during a previous match operation
3 public int End ()
Returns the offset after the last matching character.
4 public int end (int group)
Returns the offset after the last character of a subsequence captured by a given group during a previous match operation.
Research methods
The research method is used to check the input string and return a Boolean value indicating whether the pattern is found:
Serial number method and description
1 public boolean Lookingat ()
Attempts to match the input sequence starting at the beginning of the zone with the pattern.
2 public boolean find ()
Attempts to find the next subsequence of the input sequence that matches the pattern.
3 public boolean find (int start)
Resets the match and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
4 public boolean matches ()
Try to match the entire region to the pattern.
Replacement method
The replacement method is the method of replacing the text in the input string:
Serial number method and description
1 public Matcher appendreplacement (StringBuffer SB, String replacement)
Implement non-terminal add and replace steps.
2 public StringBuffer Appendtail (StringBuffer SB)
Implement terminal add and replace steps.
3 public string ReplaceAll (string replacement)
The replacement pattern matches each subsequence of the input sequence with the given replacement string.
4 public string Replacefirst (string replacement)
Replaces the first subsequence of an input sequence that matches a given replacement string.
5 public static string Quotereplacement (string s)
Returns the literal substitution string for the specified string. This method returns a string that works like a literal string passed to the Appendreplacement method of the Matcher class.
Start and End methods
Here is an example of the count of occurrences of the word "cat" appearing in the input string:
Regexmatches.java File Code:
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Regexmatches
{
private static final String REGEX = "\bcat\b";
private static final String INPUT =
"Cat cat Cat Cattie cat";

public static void main( String args[] ){   Pattern p = Pattern.compile(REGEX);   Matcher m = p.matcher(INPUT); // 获取 matcher 对象   int count = 0;   while(m.find()) {     count++;     System.out.println("Match number "+count);     System.out.println("start(): "+m.start());     System.out.println("end(): "+m.end());  }

}
}
The results of the above example compilation run as follows:
Match Number 1
Start (): 0
End (): 3
Match Number 2
Start (): 4
End (): 7
Match Number 3
Start (): 8
End (): 11
Match Number 4
Start (): 19
End (): 22
You can see that this example is using the word boundary to make sure that the letter "C" "a" "T" is not just a substring of a longer word. It also provides some useful information about where the match occurred in the input string.
The Start method returns the initial index of the subsequence captured by the given group during the previous match operation, and the end method indexes the last matching character plus 1.
Matches and Lookingat methods
Both the matches and Lookingat methods are used to try to match an input sequence pattern. The difference is that matches requires that the entire sequence be matched, while Lookingat is not required.
The Lookingat method does not require an entire sentence to match, but it needs to start with the first character.
These two methods are often used at the beginning of the input string.
Let's explain this feature in the following example:
Regexmatches.java File Code:
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Regexmatches
{
private static final String REGEX = "Foo";
private static final String INPUT = "Fooooooooooooooooo";
private static final String INPUT2 = "Ooooofoooooooooooo";
private static pattern pattern;
private static Matcher Matcher;
private static Matcher Matcher2;

public static void main( String args[] ){   pattern = Pattern.compile(REGEX);   matcher = pattern.matcher(INPUT);   matcher2 = pattern.matcher(INPUT2);   System.out.println("Current REGEX is: "+REGEX);   System.out.println("Current INPUT is: "+INPUT);   System.out.println("Current INPUT2 is: "+INPUT2);   System.out.println("lookingAt(): "+matcher.lookingAt());   System.out.println("matches(): "+matcher.matches());   System.out.println("lookingAt(): "+matcher2.lookingAt());

}
}
The results of the above example compilation run as follows:
Current REGEX Is:foo
Current INPUT is:fooooooooooooooooo
Current INPUT2 is:ooooofoooooooooooo
Lookingat (): True
Matches (): false
Lookingat (): false
Replacefirst and ReplaceAll methods
The Replacefirst and ReplaceAll methods are used to replace text that matches a regular expression. The difference is that Replacefirst replaces the first match, ReplaceAll replaces all matches.
The following example explains this feature:
Regexmatches.java File Code:
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Regexmatches
{
private static String REGEX = "dog";
private static String INPUT = "the dog says Meow." +
"All dogs say meow.";
private static String REPLACE = "Cat";

public static void main(String[] args) {   Pattern p = Pattern.compile(REGEX);   // get a matcher object   Matcher m = p.matcher(INPUT);    INPUT = m.replaceAll(REPLACE);   System.out.println(INPUT);

}
}
The results of the above example compilation run as follows:
The cat says Meow. All cats say meow.
Appendreplacement and Appendtail methods
The Matcher class also provides appendreplacement and Appendtail methods for text substitution:
Take a look at the following example to explain this feature:
Regexmatches.java File Code:
Import Java.util.regex.Matcher;
Import Java.util.regex.Pattern;

public class Regexmatches
{
private static String REGEX = "A*b";
private static String INPUT = "Aabfooaabfooabfoob";
private static String REPLACE = "-";
public static void Main (string[] args) {
Pattern p = pattern.compile (REGEX);
Get Matcher Object
Matcher m = P.matcher (INPUT);
StringBuffer sb = new StringBuffer ();
While Bjrongjinhuiyin.com (M.find ()) {
M.appendreplacement (Sb,replace);
}
M.appendtail (SB);
System.out.println (Sb.tostring ());
}
}
The results of the above example compilation run as follows:
-foo-foo-foo-
Methods of the Patternsyntaxexception class
Patternsyntaxexception is a non-mandatory exception class that indicates a syntax error in a regular expression pattern.
The Patternsyntaxexception class provides the following methods to help us see what errors have occurred.
Serial number method and description
1 public String getdescription ()
Gets the description of the error.
2 public int GetIndex ()
Gets the index of the error.
3 public String Getpattern ()
Gets the regular expression pattern for the error.
4 public String GetMessage ()
Returns a multiline string that contains a syntax error and a description of its index, an incorrect regular expression pattern, and a visual indication of the error index in the pattern.
Java Date Time Java method br/> note list
Touni's little slave.
861***[email protected]
Given a regular expression and another string, we can achieve the following purposes:

  1. Whether the given string conforms to the filtering logic of the regular expression (called "match");
  2. You can get the specific part we want from the string through a regular expression.
    Regular expressions are characterized by:
  3. Flexibility, logic and functionality are very strong;
  4. Complex control of strings can be achieved quickly and in a very simple way.
  5. For those who have just come into contact, it is rather obscure.
    Note: When the regular expression is written, there is no wrong pair, and the return result is only true and false
    Check QQ number, requirements: must be 5~15 digits, 0 can not start. Before you have a regular expression
    public class Regex {
    public static void Main (string[] args) {
    CHECKQQ ("0123134");
    }
    public static void Checkqq (String QQ)
    {
    int len = Qq.length ();
    if (len>=5 && len <=15)
    {
    if (!qq.startswith ("0"))
    {
    Try
    {
    Long L = Long.parselong (QQ);
    System.out.println ("QQ:" +l);
    }
    catch (NumberFormatException e)
    {
    SYSTEM.OUT.PRINTLN ("Illegal characters appearing");
    }
    }
    Else
    System.out.println ("Cannot start with 0");
    }
    Else
    SYSTEM.OUT.PRINTLN ("QQ number length error");
    }
    }
    Use the code after the regular expression:
    public class Regex {
    public static void Main (string[] args) {
    CheckQQ2 ("0123134");
    }
    public static void CheckQQ2 (String qq) {
    String reg = "[1-9][0-9]{4,14}";
    System.out.println (Qq.matches (reg)? " Legal QQ ":" Illegal qq "); Br/>}
    }
    Touni's little slave.
    Touni's little slave.
    861***[email protected]
    Yimkong
    136***[email protected]
    Parentheses Group: The number of parentheses on the left is the first few groups
    Pattern p = pattern.compile ("(\d{2}) ([a-z]{2,3})");
    Matcher m =p.matcher ("33AA-32SDY-29SSC");
    while (M.find ()) {
    System.out.println (M.group (2));//Each match gets a second set of content
    }
    /
    Results:
    Aa
    Sdy
    SSc
    /

Forward References:
Pattern p = pattern.compile ("(\d (\d)) \2");
Matcher Matcher = P.matcher ("211"); Yimkong
Yimkong
136***[email protected]
Br/>system.out.println (Matcher.matches ());
Result: True
Explanation: "\\2" represents a reference to the preceding 2nd set of matching values
Yimkong
Yimkong
136***[email protected]
Jinling
103***[email protected]
Matcher.appendreplacement (SB, replacecontent); With Matcher.appendtail (SB);
Appendreplacement method: SB is a stringbuffer,replacecontext to replace the string, this method will match the content to replace with Replacecontext, It also gets the string from the position where the Kim was last replaced to the replacement position, and then appends the replacement result to the StringBuffer (if the replacement is the first replacement, it is just appending the replacement string).
Appendtail method: SB is a StringBuffer, this method is to append the last match to the contents of the string appended to the StringBuffer.
Two methods can be used together to achieve all replacements or replace the first one:
All substitutions:
while (M.find ()) {
M.appendreplacement (Sb,replacecontext);
}
. Appendtail (SB);
Replace the first one:
if (Matcher.find ()) {
Matcher.appendreplacement (SB, Replacecontext);
}
Matcher.appendtail (SB);

Java Regular Expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.