Java Regular expression error-prone knowledge point rollup _java

Source: Internet
Author: User
Tags character classes

The

One, overview
Regular expression is an important tool for Java to process strings and text.
Java's handling of regular expressions is concentrated in the following two classes:
java.util.regex.matcher   mode class : The is used to represent a compiled regular expression. The
java.util.regex.pattern   matching class: matches the abstract result expressed by a string in a pattern.
(Unfortunately, Java Doc does not give the concept of responsibility for these two classes.) &NBSP
, for example, a simple example:

Import Java.util.regex.Matcher; 

Import Java.util.regex.Pattern; /** * Regular Expression Example * * @author leizhimin 2009-7-17 9:02:53/public class TESTREGX {public static void main (string[ ] args {Pattern p = pattern.compile ("f (. +?) 
        K "); 
        Matcher m = P.matcher ("FCKFKKFKF"); 
            while (M.find ()) {String s0 = M.group (); 
            String S1 = m.group (1); 
        System.out.println (S0 + "| |" + s1); 
        } System.out.println ("---------"); 
        M.reset ("fucking!"); 
        while (M.find ()) {System.out.println (M.group ()); Pattern P1 = Pattern.compile ("f (. +?)" I (. +?) 
        H "); 
        Matcher m1 = P1.matcher ("Finishabigfishfrish"); 
            while (M1.find ()) {String s0 = M1.group (); 
            String S1 = m1.group (1); 
            String s2 = m1.group (2); 
        System.out.println (S0 + "| |" + s1 + "| |" + s2); 
        } System.out.println ("---------"); Pattern P3 = Pattern.compile ("(19|20) \\d\\d" ([-/.]) (0[1-9]|1[012]) \\2 (0[1-9]|[ 
        12][0-9]|3[01]); 
        Matcher m3 = p3.matcher ("1900-01-01 2007/08/13 1900.01.01 1900 01 01 1900-01.01 1900 13 01 1900 02 31"); 
        while (M3.find ()) {System.out.println (M3.group ());
 

 } 
    } 
}

Output results:
fck| | C
fkk| | K
---------
Fuck
finish| | in| | S
fishfrish| | ishfr| | S
---------
1900-01-01
2007/08/13
1900.01.01
1900 01 01
1900 02 31

Process finished with exit code 0
second, some easy to confused problems
1, Java to the problem of backslash processing
In other languages, \ \ Indicates that you want to insert a character \;
In the Java language, \ \ represents a backslash to insert a regular expression, and the following characters have special meaning.
A. Predefined character classes
. Any character (may or may not match the line terminator)
\d number: [0-9]
\d Non-digit: [^0-9]
\s whitespace characters: [\t\n\x0b\f\r]
\s non-whitespace characters: [^\s]
\w Word characters: [a-za-z_0-9]
\w non-word characters: [^\w]
But look at the above program, the contrast is not difficult to see:
\d was written when it was actually used .\\d;
In a Java regular expression, if you want to insert a \ character, you need to write in a regular expression\\\\, because the following Apidoc definition \ \ represents a backslash.
However, if you are representing a carriage return line in a regular expression, you do not need to add more backslashes. Like a carriage return, \ r writing.
B. Character
X character X
\ backslash Character
\0n with octal value 0 of the character n (0 <= n <= 7)
\0nn with octal value 0 of the character nn (0 <= n <= 7)
\0mnn characters with octal value 0 mnn (0 <= m <= 3, 0 <= n <= 7)
\XHH characters with hexadecimal value of 0x hh
\uhhhh characters with hexadecimal value of 0x HHHH
\ t tab (' \u0009 ')
\ n New Line (newline) character (' \u000a ')
\ r return character (' \u000d ')
\f page feed (' \u000c ')
\a Alarm (Bell) character (' \u0007 ')
\e Escape character (' \u001b ')
\CX corresponds to the control character of X
2, Matcher.find ():Attempts to find the next subsequence of a sequence of characters that matches the pattern. This method starts at the beginning of the sequence of characters. If the previous call to the method succeeds and the match is not reset from then on, the first character that does not have a match from the previous match does not begin, that is, if a substring matching the pattern was found the previous time, the lookup begins after this subsequence.
3, Matcher.matchers ():Determines whether the entire character sequence matches the pattern. When you check multiple strings with the Matcher object continuously, you can use the
Matcher.reset (): Resets the match, discards all its explicit state information, and sets its add position to zero.
or Matcher.reset (charsequence input) resets the matching device with the new input sequence to reuse the match.
4, the concept of the group, this concept is important, the group is a regular expression divided by parentheses, you can refer to the group by number. The group number starts at 0, and several pairs of parentheses indicate that there are several groups, and the group can be nested, the group number 0 represents the entire expression, the group number 1 represents the first group, and so on.
For example: A (b) C (d) E regular formula has three groups, group 0 is ABCDE, Group 1 is B, and Group 2 is D;
A ((B) C) (D) e regular has four groups: group 0 is ABCDE, group 1 is BC, Group 2 is B; Group 3 is C and Group 4 is D.
int GroupCount ():Returns the number of groups that match its pattern, excluding group No. 0.
String Group ():Returns the No. 0 Group of previous matching operations, such as find ().
String Group (int group):Returns a sequence of substrings that are matched by the specified group during the previous match operation. If the match succeeds, but the specified group fails to match any part of the character sequence, NULL is returned.
int start (int group): Returns the initial index of a sequence of substrings that are matched by the specified group during the previous match operation.
int end (int group): Returns the last index of +1 of the subsequence of the specified group for the previous matching operation.
5, matching the scope of control
The most abnormal will calculate Lookingat () method, the name is very confusing, need to look carefully apidoc.
Start () returns the initial index of the previous match.
End () returns the offset after the last matched character.
The public boolean Lookingat () attempts to match the input sequence starting at the beginning of the zone with the pattern.
Like the matches method, this method always starts at the beginning of the zone, and unlike it, it does not need to match the entire region.
If the match succeeds, you can get more information through the start, end, and group methods.
Return:
Returns true if and only if the prefix of the input sequence matches the pattern of this match.
Small series for everyone to tidy up these easy mixed knowledge points, but still not comprehensive enough, need everyone in the study after the accumulation, regular expression the biggest difficulty lies in skilled writing regular expression, we should learn from the difficult points, I believe there will be some harvest.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.