Common regular expressions in Java

Source: Internet
Author: User
Tags character set fdf html tags lowercase regular expression

Since jdk1.4 launched the java. util. regex package, we have provided a good JAVA regular expression application platform.
Because regular expressions are a complex system, I only want to introduce some concepts. For more information, see related books and explore them on your own.
\ Backslash
T interval ('')
N line feed ('')
R press enter ('')
D is equivalent to [0-9].
D is equivalent to [^ 0-9].
S blank symbol [tnx0Bfr]
S non-blank symbol [^ tnx0Bfr]
W single character [a-zA-Z_0-9]
W non-individual characters [^ a-zA-Z_0-9]
F page feed
E Escape
B. Boundary of a word
B. A non-word boundary
The end of the match before G
^ Starts with a limit.
^ Java conditions must start with Java
$ Is the end of the limit.
Java $ condition is limited to the end character of java
. The condition limits any single character except n.
Java .. the condition is limited to any two characters except line breaks after java
Add the specified condition "[]"
[A-z] The condition is limited to one character in the lowercase a to z range.
[A-Z] conditions are limited to one character in the upper case A to Z range
[A-zA-Z] The condition is limited to one character in the lowercase a to z or uppercase A to Z range.
[0-9] The condition is limited to one character in the lowercase 0 to 9 range
[0-9a-z] The condition is limited to one character in the lowercase 0 to 9 or a to z range.
[0-9 [a-z] The condition is limited to one character (intersection) in the lowercase 0 to 9 or a to z range)
[] Add ^ and then add the restriction "[^]"
[^ A-z] The condition is limited to one character in the range of non-lowercase a to z
[^ A-Z] conditions are limited to one character in the range of not uppercase A to Z
[^ A-zA-Z] The condition is limited to one character in the range of non-lowercase a to z or uppercase A to Z.
[^ 0-9] The condition is limited to one character in the range of 0 to 9 in non-lowercase letters.
[^ 0-9a-z] The condition is limited to one character in the range of 0 to 9 or a to z in non-lowercase letters.
[^ 0-9 [a-z] The condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to z)
When the limit is 0 or more times for a specific character, you can use 「*」
J * more than 0 J
. * 0 or more arbitrary characters
J. * d j and D must contain more than 0 arbitrary characters.
When the condition is that a specific character appears more than once, you can use "+ 」
J + 1 or more J
. + 1 or more arbitrary characters
More than one arbitrary character between J. + d j and D
You can use "?" when the limit is 0 or more times for a specific character 「?」
JA? J or JA appears
Limit to the number of consecutive occurrences of the specified character "{}」
J {2} JJ
J {3} JJJ
More than a characters, and "{,}」
J {3,} JJJ, JJJJ, JJJJJ ,??? (More than three J events coexist)
More than one text, less than B "{a, B }」
J {3, 5} JJJ, JJJJ, or JJJJJ
Take the two as "| 」
J | a j or
Java | Hello Java or Hello
Specifies a combination type in "()".
For example, you can write <a href> </a> to query data between <a href> </a> in <a href = "index.html"> index </a>. * href = ". * "> (. + ?) </A>
When using the Pattern. compile function, you can add parameters that control the matching behavior of regular expressions:
Pattern. compile (String regex, int flag)
The flag value range is as follows:
Pattern. CANON_EQ is determined to be matched only when the "canonical decomposition" of the two characters are identical. For example, when this sign is used, the expression "? "Will match "? ". By default, canonical equivalence is not considered )".
Pattern. CASE_INSENSITIVE (? I) by default, case insensitive matching applies only to the US-ASCII character set. This flag allows the expression to ignore the case sensitivity for matching. To match the size of a Unicode character with an unknown size, you just need to combine the UNICODE_CASE and the flag.
Pattern. COMMENTS (? X) in this mode, space characters (in a regular expression) are ignored during matching, tab, press enter, and so on ). The comment starts from # and ends until the end of this line. You can enable the Unix line mode through the embedded flag.
Pattern. DOTALL (? S) in this mode, the expression '.' can match any character, including the end character of a row. By default, the expression '.' does not match the end character of the row.
Pattern. MULTILINE
(? M) in this mode, '^' and '$' match the start and end of a row respectively. In addition, '^' still matches the start of the string, '$' also matches the end of the string. By default, these two expressions only match the start and end of the string.
Pattern. UNICODE_CASE
(? U) in this mode, if you enable the CASE_INSENSITIVE flag, it will match Unicode characters in case insensitive. By default, case-insensitive matching applies only to the US-ASCII character set.
Pattern. UNIX_LINES (? D) In this mode, only 'n' is considered as the stop of a row and matches with '.', '^', and '$.
Aside from the vague concept, let's write a few simple Java regular use cases:
◆ For example, when the string contains verification
// Search for any string starting with Java and ending
Pattern pattern = Pattern. compile ("^ Java .*");
Matcher matcher = pattern. matcher ("Java is not a person ");
Boolean B = matcher. matches ();
// If the condition is met, tr is returned; otherwise, false is returned.
System. out. println (B );
◆ When a string is separated by multiple conditions
Pattern pattern = Pattern. compile ("[, |] + ");
String [] strs = pattern. split ("Java Hello World Java, Hello, World | Sun ");
For (int I = 0; I <strs. length; I ++ ){
System. out. println (strs [I]);
}
◆ Text replacement (first occurrence of characters)
Pattern pattern = Pattern. compile ("regular expression ");
Matcher matcher = pattern. matcher ("regular expression Hello World, regular expression Hello World ");
// Replace the first regular data
System. out. println (matcher. replaceFirst ("Java "));
◆ Text replacement (all)
Pattern pattern = Pattern. compile ("regular expression ");
Matcher matcher = pattern. matcher ("regular expression Hello World, regular expression Hello World ");
// Replace the first regular data
System. out. println (matcher. replaceAll ("Java "));
◆ Text replacement (replacement character)
Pattern pattern = Pattern. compile ("regular expression ");
Matcher matcher = pattern. matcher ("regular expression Hello World, regular expression Hello World ");
StringB? R sbr = new StringB? R ();
While (matcher. find ()){
Matcher. appendReplacement (sbr, "Java ");
}
Matcher. appendTail (sbr );
System. out. println (sbr. toString ());
◆ Verify whether the email address is used
String str = "ceponline@yahoo.com.cn ";
Pattern pattern = Pattern. compile ("[\ w \. \-] + @ ([\ w \-] + \.) + [\ w \-] + ", Pattern. CASE_INSENSITIVE );
Matcher matcher = pattern. matcher (str );
System. out. println (matcher. matches ());
◆ Remove html tags
Pattern pattern = Pattern. compile ("<. +?> ", Pattern. DOTALL );
Matcher matcher = pattern. matcher ("<a href =" index.html "> homepage </a> ");
String string = matcher. replaceAll ("");
System. out. println (string );
◆ Search for the corresponding condition string in html
Pattern pattern = Pattern. compile ("href =" (. + ?) "");
Matcher matcher = pattern. matcher ("<a href =" index.html "> homepage </a> ");
If (matcher. find ())
System. out. println (matcher. group (1 ));
}
◆ Intercept http: // address
// Intercept the url
Pattern pattern = Pattern. compile ("(ps: //) {1} [\ w \. \-/"> http: // | https: //) {1} [\ w \. \-/:] + ");
Matcher matcher = pattern. matcher ("dsdsds <fdf"> http: // dsds // gfgffdfd> fdf ");
StringB? R B? R = new StringB? R ();
While (matcher. find ()){
B? R. append (matcher. group ());
B? R. append ("rn ");
System. out. println (B? R. toString ());
}
◆ Replace the specified {} text
String str = "Java's current development history is from {0}-{1} years ";
String [] [] object = {new String [] {"\ {0 \}", "1995"}, new String [] {"\ {1 \}", "2007 "}};
System. out. println (replace (str, object ));
P lic static String replace (final String sourceString, Object [] object ){
String temp = sourceString;
For (int I = 0; I <object. length; I ++ ){
String [] result = (String []) object [I];
Pattern pattern = Pattern. compile (result [0]);
Matcher matcher = pattern. matcher (temp );
Temp = matcher. replaceAll (result [1]);
}
Return temp;
}
◆ Query files in a specified directory with regular conditions
// Used to cache the file list
Private ArrayList files = new ArrayList ();
// Used to carry the file path
Private String _ path;
// It is used to carry the unmerged regular expression.
Private String _ regexp;
Class MyFileFilter implements FileFilter {
/**
* Matching file name
*/
P lic boolean accept (File file ){
Try {
Pattern pattern = Pattern. compile (_ regexp );
Matcher match = pattern. matcher (file. getName ());
Return match. matches ();
} Catch (Exception e ){
Return tr;
}
}
}
/**
* Parse the input stream
* @ Param inputs
*/
FilesAnalyze (String path, String regexp ){
GetFileName (path, regexp );
}
/**
* Analyze the file name and add files
* @ Param input
*/
Private void getFileName (String path, String regexp ){
// Directory
_ Path = path;
_ Regexp = regexp;
File directory = new File (_ path );
File [] filesFile = directory. listFiles (new MyFileFilter ());
If (filesFile = null) return;
For (int j = 0; j <filesFile. length; j ++ ){
Files. add (filesFile [j]);
}
Return;
}
/**
* Display output information
* @ Param out
*/
P lic void print (PrintStream out ){
Iterator elements = files. iterator ();
While (elements. hasNext ()){
File file = (File) elements. next ();
Out. println (file. getPath ());
}
}
P lic static void output (String path, String regexp ){
FilesAnalyze fileGroup1 = new FilesAnalyze (path, regexp );
FileGroup1.print (System. out );
}
P lic static void main (String [] args ){
Output ("C: \", "[A-z |.] *");
}
There are many other functions of Java regular expressions. In fact, as long as it is character processing, there will be no things that cannot be done by regular expressions. (Of course, regular expression interpretation is time-consuming. | ......)

**************************************** **************************************** **************************************** **********

The regular expression operation can be used on the String. In fact, java. util. regex. Pattern and java. util. regex. Matcher functions are used. When you call the matches () method of String, it actually calls the static matches () method of Pattern. This method returns a boolean value, indicating whether the String conforms to the regular expression.
If you want to treat a regular expression as an object for reuse, you can use the static Pattern method compile () for compilation. The compile () method returns a Pattern instance. This instance represents a regular expression. Then, you can use the matcher () method of the Pattern instance to return a Matcher instance, it indicates an instance that meets the regular expression. There are some methods on this instance that can be used to find a regular expression condition. Example 6.11 is used as an example.
** Example 6.11UsePatternMatcher.java
Import java. util. regex .*;
Public class UsePatternMatcher {
Public static void main (String [] args ){
String phones1 =
"Justin's mobile phone number: 0939-100391n" +
"Momor mobile phone number: 0939-666888n ";
Pattern pattern = Pattern. compile (". * 0939-\ d {6 }");
Matcher matcher = pattern. matcher (phones1 );
While (matcher. find ()){
System. out. println (matcher. group ());
}
String phones2 =
"Mobile phone number of caterpillar: 0952-600391n" +
"Bush's mobile phone number: 0939-550391 ";
Matcher = pattern. matcher (phones2 );
While (matcher. find ()){
System. out. println (matcher. group ());
}
}
}
Example 6.11 looks for a number starting with "0939". If there are more than one number source (such as phones1 and phones2), you can compile the regular expression and return a Pattern object, then you can use this Pattern object again, and use matcher () to return matching Matcher instances during comparison. The find () method indicates whether a matching string exists. The group () method returns the matching string. The execution result of the program is as follows:
Justin's mobile phone number: 0939-100391
Momor mobile phone: 0939-666888
Bush's mobile phone number: 0939-550391
The following uses Pattern and Matcher to rewrite example 6.9, so that the program can return a string that conforms to the regular expression, rather than returning a string that does not conform to the regular expression.
** Example 6.12RegularExpressionDemo2.java
Import java. util. regex .*;
Public class RegularExpressionDemo2 {
Public static void main (String [] args ){
String text = "abcdebcadxbc ";
Pattern pattern = Pattern. compile (". bc ");
Matcher matcher = pattern. matcher (text );
While (matcher. find ()){
System. out. println (matcher. group ());
}
System. out. println ();
}
}
Style = 'font-family: '> execution result:
Abc
Ebc
Xbc

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.