Java Core Technology vol. two notes (c) Regular expressions

Source: Internet
Author: User

Regular expression syntax

A regular expression describes the constituent rules (patterns) of a string. If a specific string exactly conforms to the rule described by the regular expression, the string is matched to the expression. Let's take a look at how to describe this rule, which is the regular expression syntax. Regular expressions resemble the glob pattern in the previous article, but are more complex and powerful.

Only the usual syntax is listed here, and more complex can be found in the details of regular expressions.

The constituent elements of a regular expression:

  • Character
    • Normal characters match the character itself
    • \unnnn\xnn /n /nn \0nnn 16 binary or decimal symbol represents the character
    • \ n \ r \f \a \e control characters: tab, line feed, carriage return, page break, warning, escape character
    • \CC controls related to character C
  • Character class
    • [C1C2C3 ...] any character represented by CI. CI can be multiple characters, or a range of characters a-z A-Z 0-9, or other character class, (contains - the character itself must be the first or last item, containing [ characters when must be the first item that contains the ^ character when it cannot be the first item, just escape \ and ]).
    • [^... ] The complement set of the character set
    • [...     &&...] Intersection of two character sets
  • Predefined character Classes
    • . all characters except the line terminator (the Dotall identity is set to represent all characters including the line terminator)
    • \d number, equivalent to [0-9]
    • \d non-numeric, equivalent to [^0-9]
    • \s white space character, equivalent to [ \t\n\r\f\x0b]
    • \s non-whitespace characters
    • \w word character, equivalent to [a-za-z0-9]
    • \w non-word characters
    • \p{name} name character class, name has the following values
      • Lower ASCII lowercase letter [A-z]
      • Upper ASCII Capital Letter [A-z]
      • Alpha ASCII letter [A-za-z]
      • Digit ASCII number [0-9]
      • Alnum ASCII letters or numbers [0-9a-za-z]
      • Xdight hexadecimal digits
      • Print| Graph ascii printable character [\x21-\x7e]
      • Punct punctuation, printable non-alphanumeric symbols [\p{print}&&\p{alnum}]
      • ASCII all ascii[\x00-\x7f]
      • Cntrl ASCII control character [\x00-\x1f]
      • Blank Space character or tab stop [ \ t]
      • Space white space character [ \t\n\r\f\0x0b]
      • Javalowercase Character.islowercase () determines the character
      • Javauppercase Character.isuppercase () determines the character
      • Javawhitespace Character.iswhitespace () determines the character
      • Javamirrored character.ismirrored () determines the character
      • Inxxxx Unicode character in a character block, XXXX is the name of the word converts sequential blocks after excluding the space, such as arrows, latin1supplement
      • Isxxxx Unicode a script character, XXXX is the name of the script after excluding spaces, such as common
      • yyyy| Inyyyy Unicode character in a category, YYYY is a category name, such as L: letter, Sc: currency symbol
      • Iszzzz Unicode characters for a property (alphabetic, ideographic, letter, lowercase, uppercase, titlecase, punctuation, Control, white_sp Ace,digit, Hex_digit, Noncharater_code_point, Assigned)
    • \p{name} The complement of the named character class
  • Boundary Match character
      • ^   $       Start and end of input, if multiline mode is set to start and end of line
      • \b       boundary of word
      • \b     Boundaries of non-words
      • \a     Input start
      • \z       End of input
      • \z     End of input except line terminator
      • \g     End of previous match
    • Span style= "Background-color: #ffff00;" >x?     X can occur 0 or 1 times (there is no line)
    • x*     X can appear 0 to infinity
    • x+     X can appear 1 to infinity
    • x{n}   X{n,}   x{n,m}    x must appear n times, n to Infinity, n to M times
  • quantifier suffix
    • ? Turns the default match into a barely match.
    • + Convert default match to possessive match
  • Collection operations
    • Match XY x followed by y
    • X| Y any match of X or Y
  • Group
    • (x) define the X sub-expression as a group, capturing the string matched by the group, the group number is based on 1, in the order of the opening parenthesis, and group 0 represents all the characters that match
    • \ nthe inverse refers to the string captured by group I.
  • Escape and other
    • \
    • \q...\e literally quoted ...
    • (?...) Special structure

The matching pattern of quantifier

    • Default mode: The maximum number of repetitions that can be matched successfully; [a-z]*ab = CAAB
    • Barely matched: matches the minimum number of repetitions;
    • Occupy/Greedy match: matches the maximum number of repetitions, even if subsequent matches fail. [a-z]*+ ab = Caab

Using regular Expressions in Java

Regular expressions are used in classes such as Pattern,matcher, and string classes have methods that can be used to quickly use regular expressions. The string input parameter type of the regular expression is the Charsequence interface (Chararray, Charbuffer, String, StringBuffer, StringBuilder)

  Pattern

    • Static Pattern compile (String regex)
    • Static Pattern compile (String regex, int flags) Multiple flags are added, such as Pattern.compile (Regexstr, pattern.dotall+ Pattern.multiline). Common identifiers are as follows
        • Case_insensitive ignoring case
        • Unicode_case and case_insensitive, use the case of Unicode letters to match
        • MULTILINE ^,$ matches the beginning and end of a line instead of the beginning and end of the entire input
        • Unix_lines in multiline mode, the line terminator is \ n
        • Dotall '. ' Matches all characters, including line Terminator
        • Canon_eq consider the equivalence of Unicode character specification, do not understand ...
    • The static string quote (string s) preprocess the regular expression string, cancels the meaning of all special characters and matches the normal character, returning the processed expression, (plus \q\e)
    • String pattern ()/tostring ()
    • int flags () returns the pattern matching identity
    • Matcher Matcher (charsequence input) creates a Matcher class that can handle matching
    • Static Boolean matches (string regex, charsequence input) returns whether the entire input string matches the Regex
    • String[] Split (charsequence input, int limit) divides the input string into a delimited character string that satisfies this regular expression, and the limit indicates that it needs to be cut into segments (to be cut into n segments, only n-1 times), and a limit of 0 indicates Full cut, for 1 actually does not cut.
    • String[] Spilt (charsequence input)

  Matcher

    • Boolean matches () whether the input string matches the pattern regular expression for the whole match
    • Boolean Lookingat () whether the beginning of the input string matches
    • Boolean find () tries to find the next match and returns true if another match is found, otherwise false. After a successful lookup, use methods such as Start,end,group to obtain the matching information.
    • Boolean find (int start) finds the next match starting at the offset specified by the string
    • int start () The starting index position of the current match
    • int end () the last index position at the end of the current match
    • String Group () returns the currently matched substring, equivalent group (0)
    • int GroupCount () returns the number of groups (not group 0, which is the entire matching string)
    • int start (int groupidx) returns the start position of a group
    • int end (int groupidx) returns the last index position at the end of a group
    • String Group (int groupidx) returns the strings captured by a group
    • String Group (string name) returns the strings captured by the named group
    • string ReplaceAll (string replacement) replacement can contain $n represents a reference to a group, and\$ represents the $ character
    • String Replacefirst (String replacement)
    • static string Quotereplacement (string s) preprocessing substitution strings (\ \ and $ before \ \) to make special characters match normal characters
    • Matcher Reset () reset status, can start looking for matches
    • Matcher Reset (charsequence input) toggles the input string and resets the status
    • Matcher Usepattern (pattern Newpattern) Toggle Mode
    • Matcher appendreplacement (StringBuffer SB, String replacement) These two appendxxx methods are used in combination, and the results are stored directly in StringBuffer when replaced
    • StringBuffer Appendtail (StringBuffer SB)

 

A convenient way to support the use of regular expressions is also available in the String class

      • Boolean matches (String regex)
      • String Replacefirst (string regex, string replacement)
      • String ReplaceAll (string regex, string replacement)
      • String[] Split (String regex, int limit)
      • String[] Split (String regex)

String patternstring="((\\d{1,2}):(\\d{1,2}) +"; String inputstring="start 12:43, end 05:43"; Pattern Pattern=Pattern.compile (patternstring); Matcher Matcher=Pattern.matcher (inputstring); System. out. println ("Total match:"+matcher.matches ()); intC=0;  while(Matcher.find ()) {System. out. println ("Match"+ (++C) +": "+Matcher.group ());  for(intI=1; I<=matcher.groupcount (); i++) System. out. println ("\tgroup"+i+": "+Matcher.group (i)); }

Pattern p = pattern.compile ("cat");         = P.matcher ("one cat, cats in theyard");         New StringBuffer ();          while (M.find ()) {            "dog");        }        M.appendtail (SB);        System. out. println (Sb.tostring ());

Java Core Technology vol. two note (c) Regular expression

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.