Regular expression Summary _ regular expression

Source: Internet
Author: User
Tags character classes control characters printable characters readline alphanumeric characters


Regular expression: An expression that conforms to a certain rule

Role: Dedicated to manipulating strings

With some specific symbols to represent some code operations, thus simplifying the writing, so learning regular expressions is learning some special symbols of the use.

Benefits: can simplify complex operations on strings

Disadvantages: The more symbol definition, the longer, the worse the reading

Specific operation function:

1, matching: String matches method, match the entire string with the rule, as long as there is a mismatch, the match ends, return false

2, cutting: According to the folding word to complete the cutting, in order to allow the results of the rule to be reused, the rules can be encapsulated into a group, with () completed, the group appears numbered, from the outset, want to use the existing group (n is the number of rent) such as: "(.) \\1+"------ Cut by any number of overlapping words

3. Replacement: String ReplaceAll ()

Replaces overlapping characters with a single character, and $ gets the elements of the group.

4. Get: Take out strings that match the rules

Operation Steps:

A, encapsulate the regular expression as an object

B, Jeang Jong The object is associated with the string to manipulate

C, after association, get the regular matching engine

D, the use of the engine to match the rules of the substring to operate, such as the removal of

Gets the instance:

Import java.util.regex.*;

Class RegexDemo2

{

public static void Main (string[] args)

{

Getdemo ();

}

public static void Getdemo ()

{

String str= "ni hao ma ye xu CEng jing de ni yao Hao yi Xie!";

System.out.println (str);

String regex= "\\b[a-za-z]{3}\\b";

Pattern P=pattern.compile (regex);//to encapsulate a regular expression as an object

Matcher M=p.matcher (str);//To associate a regular object with the string to manipulate

while (M.find ())

{

System.out.println (M.group ());

System.out.println (M.start () + "----" +m.end ());

}

}

}

The construction summary of regular expressions

Structure

The

Character

X

Character X

\\

Backslash character

\0n

Character n with octal value 0 (0 <= n <= 7)

\0nn

Character nn with octal value 0 (0 <= n <= 7)

\0mnn

Character Mnn with octal value 0 (0 <= m <= 3, 0 <= n <= 7)

\xhh

Character with hexadecimal value of 0x hh

\uhhhh

Characters with hexadecimal value of 0x HHHH

\ t

tab characters (' \u0009 ')

\ n

New lines (line break) (' \u000a ')

\ r

return character (' \u000d ')

\f

Page breaks (' \u000c ')

\a

Alarm (Bell) character (' \u0007 ')

\e

Escape character (' \u001b ')

\cx

The control character corresponding to X

Character class

[ABC]

A, B, or C (simple Class)

[^ABC]

Any character except A, B, or C (negation)

[A-za-z]

A to Z or A to Z, the letters at both ends are included (range)

[A-d[m-p]]

A to D or M to P:[a-dm-p] (and set)

[A-z&&[def]]

D, E or F (intersection)

[A-Z&&[^BC]]

A to Z, except B and C:[ad-z] (minus)

[A-z&&[^m-p]]

A to Z, not M to P:[a-lq-z] (minus)

Predefined character classes

.

Any character (may or may not match the line terminator)

\d

Number: [0-9]

\d

Non-digit: [^0-9]

\s

whitespace characters: [\t\n\x0b\f\r]

\s

Non-whitespace characters: [^\s]

\w

Word characters: [a-za-z_0-9]

\w

Non-word characters: [^\w]

POSIX character class (Us-ascii only)

\p{lower}

Lowercase alphabetic characters: [A-z]

\p{upper}

Uppercase characters: [A-z]

\P{ASCII}

All ascii:[\x00-\x7f]

\p{alpha}

Alphabetic characters: [\p{lower}\p{upper}]

\p{digit}

Decimal digits: [0-9]

\p{alnum}

Alphanumeric characters: [\p{alpha}\p{digit}]

\P{PUNCT}

Punctuation mark:! " #$%& ' () *+,-./:;<=>?@[\]^_ ' {|} ~

\p{graph}

Visible characters: [\p{alnum}\p{punct}]

\p{print}

printable characters: [\p{graph}\x20]

\p{blank}

Spaces or tabs: [\ t]

\p{cntrl}

Control characters: [\x00-\x1f\x7f]

\p{xdigit}

hexadecimal digits: [0-9a-fa-f]

\p{space}

whitespace characters: [\t\n\x0b\f\r]

Java.lang.Character Class (Simple Java character type)

\p{javalowercase}

Equivalent to Java.lang.Character.isLowerCase ()

\p{javauppercase}

Equivalent to Java.lang.Character.isUpperCase ()

\p{javawhitespace}

Equivalent to Java.lang.Character.isWhitespace ()

\p{javamirrored}

Equivalent to java.lang.Character.isMirrored ()

Classes for Unicode blocks and categories

\p{ingreek}

Characters in a Greek block (simple block)

\p{lu}

Uppercase letters (Simple category)

\P{SC}

Currency symbol

\p{ingreek}

All characters, except in the Greek block (negation)

[\p{l}&&[^\p{lu}]]

All letters, except uppercase letters (minus)

Boundary Matching Device

^

The beginning of a line

$

End of Line

\b

Word boundaries

\b

Non-word boundaries

\a

The beginning of the input

\g

End of last match

\z

The end of the input, only for the last terminator (if any)

\z

End of input

Greedy quantity Word

X?

X, not once or once

x*

X, 0 or more times

x+

X, one or more times

X{n}

X, exactly n times

X{n,}

X, at least n times

X{N,M}

X, at least n times, but not more than m times

Reluctant quantity word

X??

X, not once or once

X*?

X, 0 or more times

X+?

X, one or more times

X{n}?

X, exactly n times

X{n,}?

X, at least n times

X{n,m}?

X, at least n times, but not more than m times

Possessive quantity Word

x?+

X, not once or once

x*+

X, 0 or more times

X + +

X, one or more times

x{n}+

X, exactly n times

x{n,}+

X, at least n times

x{n,m}+

X, at least n times, but not more than m times

Logical operator

Xy

X followed by Y

X| Y

X or Y

X

X, as a capturing group

Back reference

\ n

Any matching nth capture group

Reference

\

Nothing, but reference the following characters

\q

Nothing, but refer to all characters until \e

\e

Nothing, but ending a reference starting with \q

Special construction (not capture)

(?: X)

X, as a non-capturing group

(? idmsux-idmsux)

Nothing, but will match the flag Idmsux On-off

(? idmsux-idmsux:x)

X, as a idmsux group with a given flag-on-off

(? =x)

X, through a positive lookahead of 0 widths

(?! X

X, through a negative lookahead of 0 widths

(? <=x)

X, through a positive lookbehind of 0 widths

(? <! X

X, through a negative lookbehind of 0 widths

(? >x)

X, as a separate, non-capturing group

Backslashes, escapes, and references

The backslash character (' \ ') is used to reference the escaped construct, as defined in the previous table, and to refer to other characters that will be interpreted as non-escaped constructs. Therefore, the expression \ \ Matches a single backslash, and \{matches the left parenthesis.

It is wrong to use backslashes before any alphabetic characters that escape constructs are used, and they are reserved for future extensions of regular expression languages. You can use a backslash before a non-alphanumeric character, regardless of whether the character is part of an escaped construct or not.

The backslash in the Java source code string is interpreted as Unicode escape or other character escape, as required by the Java Language specification. Therefore, you must use two backslashes in the string literal to indicate that the regular expression is protected and not interpreted by the Java bytecode compiler. For example, when interpreted as a regular expression, the string literal "\b" matches a single backspace character, and "\\b" matches the word boundary. string literal "\ (hello\)" is illegal and will result in a compile-time error; to match the string (hello), you must use string literal "\ (hello\\)".

Character class

A character class can appear in other character classes, and can contain a set operator (implicit) and an intersection operator (&&). The collection operator represents a class that contains at least one of its operand classes. The intersection operator represents a class that contains all the characters in its two operand classes.

The precedence of the character class operators is as follows, in order from highest to lowest:

1

Literal value escape

\x

2

Group

[...]

3

Range

A-Z

4

and set

[A-E] [I-u]

5

Intersection

[A-z&&[aeiou]]

Note that the different sets of metacharacters are actually inside the character class, not outside of the character class. For example, regular expressions. It loses its special meaning inside the character class, and the expression-becomes the range that forms the meta character.

Line Terminator

A line terminator is a sequence of one or two characters that marks the end of the line of the input character sequence. The following code is recognized as a line terminator: New lines (' \ n '), a carriage return ("\ r \ n") followed by a new line character, a separate carriage return (' \ R '), the next line of characters (' \u0085 '), the row delimiter (' \u2028 '), or the paragraph separator (' \u2029).

If you activate Unix_lines mode, the new line character is the only recognized line terminator.

If the DOTALL flag is not specified, the regular expression. can match any character (except the line terminator).

By default, regular expressions ^ and $ ignore line terminators, matching only the beginning and end of the entire input sequence. If the multiline mode is activated, then ^ matches at the beginning of the entry and after the line terminator (the end of the input). When in multiline mode, $ matches only before the row terminator or at the end of the input sequence.

Groups and captures

Capturing groups can be numbered by counting their open brackets from left to right. For example, in an expression ((A) (B (C)), there are four such groups:

1

((A) (B (C)))

2

\a

3

(B (C))

4

C

Group 0 always represents an entire expression.

The capture group is named so that each subsequence of the input sequence that matches the groups is saved in the match. The captured subsequence can later be used in an expression by a back reference, or it can be obtained from the match after the matching operation completes.

The capture input associated with a group is always the child sequence that matches the group most recently. If the group is recalculated again for quantification, the value that was previously captured (if any) will be preserved if the second calculation fails, for example, to "ABA" the string with an expression (a (b)) + matches, the second group is set to "B". At the beginning of each match, all captured inputs are discarded.

The group that starts with (?) is a pure, non-capturing group that does not capture text or count for group totals

Example: Check QQ number

Class Regexdemo

{

public static void Main (string[] args)

{

CHECKQQ ();

}

public static void Checkqq ()

{

String qq= "534946o910";

String regex= "[1-9][0-9]{4,14}";

Boolean flag=qq.matches (regex);

if (flag)

{

System.out.println ("qq=" +qq+ "is ok!");

}

Else

{

System.out.println ("You enter the QQ:" +qq+ "illegal");

}

}

}

/*

Demand

* Turn the following strings into: I want to learn programming

* Which one of the four functions? Or a few of them.

* Way of thinking:

* 1, if you just want to know if the string is right or wrong, use the matching

* 2, want to change the existing string into another string, replace

* 3, want to customize the way the string into multiple strings, cutting, get the rules of the substring

* 4, want to get the string substring match the requirements, get, get the substring that matches the rule

**/

Class RegexTest1

{

public static void Main (string[] args)

{

Test_1 ();

}

public static void Test_1 ()

{

String str= "I am ... Do you want to ...? Learn to learn ... The series of ... Chengcheng ";

String regex= "\\.+";

Str=str.replaceall (Regex, "");

Str=str.replaceall ("(.) \\1+ "," $ ");

System.out.println (str);

}

}

/*

Requirements: Verify the email address.

*/

public static void Checkmail ()

{

String mail = "abc12@sina.com";

Mail = "1@1.1";

String reg = "[A-za-z0-9_]+@[a-za-z0-9]+ (\\.[ a-za-z]+) + ";//More exact match.

Reg = "\\w+@\\w+ (\\.\\w+) +";//a relatively less precise match.

Mail.indexof ("@")!=-1

System.out.println (Mail.matches (reg));

}

Web crawler (spider):

Import java.io.*;

Import java.util.regex.*;

Import java.net.*;

Import java.util.*;

Class RegexTest2

{

public static void Main (string[] args) throws Exception

{

Getmails_1 ();

}

public static void Getmails_1 () throws Exception

{

URL url = new URL ("http://192.168.1.254:8080/myweb/mail.html");

URLConnection conn = Url.openconnection ();

bufferedreaded Bufin = new BufferedReader (New InputStreamReader (Conn.getinputstream ()));

String line = null;

String Mailreg = "\\w+@\\w+ (\\.\\w+) +";

Pattern p = pattern.compile (Mailreg);

while ((Line=bufin.readline ())!=null)

{

Matcher m = p.matcher (line);

while (M.find ())

{

System.out.println (M.group ());

}

}

}

/*

Gets the e-mail address in the specified document.

Use the Get feature. Pattern Matcher

*/

public static void Getmails () throws Exception

{

BufferedReader BUFR =

New BufferedReader (New FileReader ("Mail.txt"));

String line = null;

String Mailreg = "\\w+@\\w+ (\\.\\w+) +";

Pattern p = pattern.compile (Mailreg);

while ((Line=bufr.readline ())!=null)

{

Matcher m = p.matcher (line);

while (M.find ())

{

System.out.println (M.group ());

}

}

}

}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.