Java regular expressions match multiple rows
By default.. only characters other than \ n can be matched. If the string to be matched contains the carriage return line break (multiple lines), the regular expression stops when it encounters a line break, as a result, the string containing the carriage return linefeed cannot be correctly matched. The solution is as follows:
1. Use Pattern and Matcher object
Set the Pattern mode:Pattern. DOTALL
2. Use String. replaceAll ()
Regular Expression Syntax:
String reg = "(? S )'.*'";
The following is an example of how to replace a regular expression that contains the line breaks.
Static String teststr = "UAPPROJECT_ID = '402894cb4833decf014833e04fd70002; \ n \ r */'select";/*** processing of line breaks containing carriage return */public void testa () {Pattern wp = Pattern. compile ("'. *? '", Pattern. CASE_INSENSITIVE | Pattern. DOTALL); Matcher m = wp. matcher (teststr); String result = m. replaceAll (""); System. out. println ("result:" + result);}/*** contains the handling of carriage return linefeeds */public void testb () {String result = teststr. replaceAll ("(? S )'.*? '"," "); System. out. println (" result: "+ result );}
Refer:
Java Regular Expression Function and Application Author: Font: [Increase or decrease] type: reprinted since jdk1.4 launched java. util. the regex package provides a good Java Regular Expression application platform, Because Java regular expressions are a very complex system.
A regular expression is a formula used to match a type of strings in a certain pattern. A regular expression consists of some common characters and metacharacters. Common characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings, no matter. net platform or Java platform, regular expressions all mean the same. Next we will mainly analyze the functions and specific applications of Java regular expressions. I hope the article will help you for your reference only.
Since jdk1.4 launched the java. util. regex package, we have provided a good Java Regular Expression application platform, Because Java regular expressions are a very complex system.
\ Backslash
\ T interval ('\ u0009 ')
\ N line feed ('\ u000a ')
\ R press enter ('\ u000d ')
The \ d number is equivalent to [0-9].
\ D is equivalent to [^ 0-9].
\ S blank symbol [\ t \ n \ x0B \ f \ r]
\ S non-blank symbol [^ \ t \ n \ x0B \ f \ r]
\ W single character [a-zA-Z_0-9]
\ W non-single character [^ a-zA-Z_0-9]
\ F page feed
\ E Escape
\ B boundary of a word
\ B a non-word boundary
End of the match before \ G
^ Starts with a limit.
^ Java conditions must start with Java
$ Is the end of the limit.
Java $ condition is limited to the end character of java
. The condition limits any single character except \ n.
Java .. the condition is limited to any two characters except line breaks after java
Add the specified condition "[]"
[A-z] the condition is limited to one character in the lowercase a to z range.
[A-Z] conditions are limited to one character in the upper case A to Z range
[A-zA-Z] the condition is limited to one character in the lowercase a to z or uppercase A to Z range.
[0-9] the condition is limited to one character in the lowercase 0 to 9 Range
[0-9a-z] the condition is limited to one character in the lowercase 0 to 9 or a to z range.
[0-9 [a-z] the condition is limited to one character (intersection) in the lowercase 0 to 9 or a to z range)
[] Add ^ and then add the restriction "[^]"
[^ A-z] the condition is limited to one character in the range of non-lowercase a to z
[^ A-Z] conditions are limited to one character in the range of not uppercase A to Z
[^ A-zA-Z] the condition is limited to one character in the range of non-lowercase a to z or uppercase A to Z.
[^ 0-9] the condition is limited to one character in the range of 0 to 9 in non-lowercase letters.
[^ 0-9a-z] the condition is limited to one character in the range of 0 to 9 or a to z in non-lowercase letters.
[^ 0-9 [a-z] the condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to z)
When the limit is 0 or more times for a specific character, you can use 「*」
J * more than 0 J
. * 0 or more arbitrary characters
J. * d j and D must contain more than 0 arbitrary characters.
When the condition is that a specific character appears more than once, you can use "+ 」
J + 1 or more J
. + 1 or more arbitrary characters
More than one arbitrary character between J. + d j and D
You can use "?" When the limit is 0 or more times for a specific character 「?」
JA? J or JA appears
Limit to the number of consecutive occurrences of the specified character "{}」
J {2} JJ
J {3} JJJ
More than a characters, and "{,}」
J {3,} JJJ, JJJJ, JJJJJ ,??? (More than three J events coexist)
More than one text, less than B "{a, B }」
J {3, 5} JJJ, JJJJ, or JJJJJ
Take the two as "| 」
J | a j or
Java | Hello Java or Hello
Specifies a combination type in "()".
For example, if I query the data in the middle of the index, I can write (. + ?)
When using the Pattern. compile function, you can add parameters that control the Matching Behavior of Java Regular Expressions:
Pattern. compile (String regex, int flag)
The flag value range is as follows:
Pattern. CANON_EQ is determined to be matched only when the "canonical decomposition" of the two characters are identical. For example, if this sign is used, the expression "a \ u030A" will match "? ". By default, canonical equivalence is not considered )".
Pattern. CASE_INSENSITIVE (? I)
By default, Case Insensitive matching applies only to the US-ASCII character set. This flag allows the expression to ignore the case sensitivity for matching. To match the size of a Unicode character with an unknown size, you just need to combine the UNICODE_CASE and the flag.
Pattern. COMMENTS (? X)
In this mode, the space characters in the Java regular expression are ignored during matching, press ENTER ). The comment starts from # And ends until the end of this line. You can enable the Unix line mode through the embedded flag.
Pattern. DOTALL (? S)
In this mode, the expression '.' can match any character, including the end character of a row. By default, the expression '.' does not match the end character of the row.
Pattern. MULTILINE (? M)
In this mode, '^' and '$' match the start and end of a row respectively. In addition, '^' still matches the start of the string, '$' also matches the end of the string. By default, these two expressions only match the start and end of the string.
Pattern. UNICODE_CASE (? U)
In this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters in case insensitive. By default, case-insensitive matching applies only to the US-ASCII character set.
Pattern. UNIX_LINES (? D)
In this mode, only '\ n' is regarded as the stop of a row and matches'. ',' ^ ', and' $. Aside from the vague concept, let's write a few simple Java regular use cases:
◆ For example, when the string contains Verification
Copy codeThe Code is as follows:
// Search for any string starting with Java and ending
Pattern pattern = Pattern. compile ("^ Java .*");
Matcher matcher = pattern. matcher ("Java is not a person ");
Boolean B = matcher. matches (); // if the condition is met, true is returned; otherwise, false is returned.
System. out. println (B );
When a string is separated by multiple conditions
Copy codeThe Code is as follows:
Pattern pattern = Pattern. compile ("[, |] + ");
String [] strs = pattern. split ("Java Hello World Java, Hello, World | Sun ");
For (int I = 0; I System. out. println (strs [I]);
}
Text replacement (first occurrence character)
Copy codeThe Code is as follows:
Pattern pattern = Pattern. compile ("Java regular expression ");
Matcher matcher = pattern. matcher ("Java Regular Expression Hello World, regular expression Hello World ");
// Replace the first regular data
System. out. println (matcher. replaceFirst ("Java "));
Text replacement (all)
Copy codeThe Code is as follows:
Pattern pattern = Pattern. compile ("Java regular expression ");
Matcher matcher = pattern. matcher ("Java Regular Expression Hello World, regular expression Hello World ");
// Replace the first regular data
System. out. println (matcher. replaceAll ("Java "));
Text replacement (replacement character)
Copy codeThe Code is as follows:
Pattern pattern = Pattern. compile ("Java regular expression ");
Matcher matcher = pattern. matcher ("Java Regular Expression Hello World, regular expression Hello World ");
StringBuffer sbr = new StringBuffer ();
While (matcher. find ()){
Matcher. appendReplacement (sbr, "Java ");
}
Matcher. appendTail (sbr );
System. out. println (sbr. toString ());
Verify whether the email address is used
Copy codeThe Code is as follows:
String str = "ceponline@yahoo.com.cn ";
Pattern pattern = Pattern. compile ("[\ w \\. \-] + @ ([\ w \-] + \\.) + [\ w \-] + ", Pattern. CASE_INSENSITIVE );
Matcher matcher = pattern. matcher (str );
System. out. println (matcher. matches ());
Remove html tags
Copy codeThe Code is as follows:
Pattern pattern = Pattern. compile ("<. +?> ", Pattern. DOTALL );
Matcher matcher = pattern. matcher ("Homepage ");
String string = matcher. replaceAll ("");
System. out. println (string );
Search for the corresponding condition string in html
Copy codeThe Code is as follows:
Pattern pattern = Pattern. compile ("href = \" (. + ?) \"");
Matcher matcher = pattern. matcher ("Homepage ");
If (matcher. find ())
System. out. println (matcher. group (1 ));
}
◆ Intercept http: // address
Code
Copy codeThe Code is as follows:
// Intercept the url
Pattern pattern = Pattern. compile ("(http: // | https: //) {1} [\ w \. \-/:] + ");
Matcher matcher = pattern. matcher ("dsdsds Fdf ");
StringBuffer buffer = new StringBuffer ();
While (matcher. find ()){
Buffer. append (matcher. group ());
Buffer. append ("\ r \ n ");
System. out. println (buffer. toString ());
}
◆ Replace the specified {} text
Code
Copy codeThe Code is as follows:
String str = "Java's current development history is from {0}-{1} years ";
String [] [] object = {new String [] {"\\{ 0 \\}", "1995 "}, new String [] {"\\{ 1 \}"," 2007 "}};
System. out. println (replace (str, object ));
Public static String replace (final String sourceString, Object [] object ){
String temp = sourceString;
For (int I = 0; I String [] result = (String []) object [I];
Pattern pattern = Pattern. compile (result [0]);
Matcher matcher = pattern. matcher (temp );
Temp = matcher. replaceAll (result [1]);
}
Return temp;
}
◆ Query files in a specified directory with regular conditions
Code
Copy codeThe Code is as follows:
// Used to cache the file list
Private ArrayList files = new ArrayList ();
// Used to carry the file path
Private String _ path;
// It is used to carry the unmerged regular expression.
Private String _ regexp;
Class MyFileFilter implements FileFilter {
/**
* Matching file name
*/
Public boolean accept (File file ){
Try {
Pattern pattern = Pattern. compile (_ regexp );
Matcher match = pattern. matcher (file. getName ());
Return match. matches ();
} Catch (Exception e ){
Return true;
}
}
}
/**
* Parse the input stream
* @ Param inputs
*/
FilesAnalyze (String path, String regexp ){
GetFileName (path, regexp );
}
/**
* Analyze the file name and add files
* @ Param input
*/
Private void getFileName (String path, String regexp ){
// Directory
_ Path = path;
_ Regexp = regexp;
File directory = new File (_ path );
File [] filesFile = directory. listFiles (new MyFileFilter ());
If (filesFile = null) return;
For (int j = 0; j <filesFile. length; j ++ ){
Files. add (filesFile [j]);
}
Return;
}
/**
* Display output information
* @ Param out
*/
Public void print (PrintStream out ){
Iterator elements = files. iterator ();
While (elements. hasNext ()){
File file = (File) elements. next ();
Out. println (file. getPath ());
}
}
Public static void output (String path, String regexp ){
FilesAnalyze fileGroup1 = new FilesAnalyze (path, regexp );
FileGroup1.print (System. out );
}
Public static void main (String [] args ){
Output ("C: \", "[A-z |.] *");
}
Java regular expressions have many other functions. In fact, as long as it is character processing, there will be no things that cannot be done by regular expressions.