Java regular expression + syntax table

Source: Internet
Author: User
Tags character classes control characters
Eclipse Regular Expression plug-in myregexp

First, we recommend an Eclipse plug-in for Java regular expressions,

Download from http://myregexp.com/eclipsePlugin.html

 

 

Body

Conversion from http://dev.csdn.net/htmls/85/85006.html

 

The regular expression is used to specify the string mode. You can use a regular expression to locate a string that matches a certain pattern. For example, the following routine locates all hyperlinks in an HTML file by searching for the string mode <a href = "...">.

Of course, to specify a mode, using the mark... is not accurate enough. You need to specify exactly what kind of character arrangement is a legal match. When describing a pattern, you need to use a special syntax.

Here is a simple example. Regular Expression
[JJ] Ava. +
Match any of the following strings:

 

  • The first letter is J or J.
  • The next three letters are Ava.
  • The remainder of the string is composed of one or more arbitrary characters.

For example, the string "javaness" matches this special regular expression, but the string "core Java" does not match.

As you can see, you need to understand some syntax to understand the meaning of regular expressions. Fortunately, for most purposes, it is enough to use a small amount of simple constructs.

  • A character class is a set of optional user characters. It is encapsulated with '[', for example, [JJ], [0-9]. [A-Za-Z] or [^ 0-9]. Here-indicates the range (UNICODE falls between all characters), ^ indicates complement (all characters outside the specified character ).
  • There are many predefined character classes, such as/d (number) or/P {SC} (UNICODE currency symbol), as shown in Tables 12-8 and 12-9.
  • Most characters match themselves, such as the Ava character in the above example.
  • Symbol. Except for line Terminators, which depends on flag settings ))
  • /Is used as an escape character. For example,/. Matches a period and // matches a backslash.
  • ^ And $ match the line header and the end of the line respectively.
  • If both X and Y are regular expressions, XY indicates "matching X is followed by matching y ". X | y indicates "any matching of X or Y"
  • The quantifier can be used in the expression. x + indicates that X is repeated once or multiple times. x * indicates that X is repeated 0 times or multiple times. X? Indicates that X is repeated 0 times or 1 time.
  • By default, a quantizer always matches the longest possible occurrence of successful overall matching. Can I add a suffix? (Reluctant or stingy match to match the minimum number of duplicates), and + (possessive or greedy match to match the maximum number of duplicates even if the overall match fails) to change this property.

 

For example, the string cab matches [A-Z] * AB, but does not match [A-Z] * + AB. In the first case, [A-Z] * only matches the character C, so the character AB exactly matches the rest of the pattern. However, the greedy version [A-Z] * + matches the character cab, and the remaining part of the pattern AB fails to match (so that the overall match fails ).
  • You can use groups to define subexpressions. Encapsulate groups in (), such as ([+-]?) ([0-9] + ). Then you can get the pattern matcher to return a match for each group, or use/N to reference the group (refer back to a group with/N ), where N is the group number (starting with/1)

Here is a slightly complex but useful regular expression-it is used to describe decimal and hexadecimal integers.
[+-]? [0-9] + | 0 [XX] [0-9a-fa-f] +

Unfortunately, its syntax is not fully standardized between various programs and libraries using regular expressions. Consensus has been reached on the basic structure, but there are many maddening differences in details ). The regular expression class in Java uses a syntax similar to the Perl language, but not the same. Table 12-8 shows the construction of all regular expressions in the Java syntax. For more information about regular expressions, see the API documentation of the pattern class or Jeffrey E. f. fried L's book Mastering Regular Expressions (O 'Reilly and Associates, 1997) (I just went to the second bookstore and checked it. The Southeast University Press has introduced its second version, photocopy)

Table 12-8 regular expression syntax

Syntax explanation

Character
C character C
/Unnnn,/xnn,/0n,/0nn,/0nnn code unit with hexadecimal or octal values

/0n octal 0n represents the characters (0 <= n <= 7)/0nn octal 0nn represents the characters (0 <= n <= 7) /0mnn octal character (0 <= m <= 3, 0 <= n <= 7) /xnn hexadecimal 0xnn character/uhhhh hexadecimal 0xhhhh character

/T,/N,/R,/F,/a,/e control characters, which are tabs, line breaks, carriage returns, page breaks, alarms, and escape characters in turn
/CC control character C

Character class
[C1c2...] C1, C2 ...... Any character in. CI can be a character, a character range (C1-C2), or a character class.
[^...] Character Set
[... &...] Intersection of two character classes

Predefined character classes
. Any character except the line terminator (if the dotall flag is set, it indicates any character)
/D number [0-9]
/D Non-numeric [^ 0-9]
/S blank character [/T/N/R/f/x0b]
/S non-blank characters
/W characters [a-zA-Z0-9 _]
/W non-word characters
/P {name} is a specified character class. See table 12-9.
/P {name} specifies the character set of the character class

Boundary match
^ $ Start and end of the input (in multiline mode, it is the beginning and end of the line)
/B word boundary
/B Non-word boundary
/The beginning of input
/Z input end
/Z end of the input except the last line terminator
The end of the/g match.

Quantifiers
X? Optional X (that is, X may or may not appear)
X * X, which can be repeated 0 or multiple times
X + X, which can be repeated once or multiple times
X {n} X {n,} X {n, m} X repeats n times, at least N times, and N to m times

Keyword suffix
? Set the default (Greedy) match to reluctant match
+ Set the default (Greedy) match to possessive match.

Set Operations
Y is followed by xy X.
Matching of X | y X or Y

Group
(X) Match X and capture it in an automatically counting group.
/N matches the nth Group

Escape
/C character C (must not be a letter)
/Q.../e reference it verbatim...
(?...) Special Structure, look at the pattern class API

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.