Java regular expression matches multiple lines

Source: Internet
Author: User


By default. * in. Only characters other than \ n can be matched, and if a string to be matched contains a carriage return newline (multiple lines), the regular expression will stop after encountering a newline character, causing the string containing the carriage return newline character to not match correctly, and the workaround is to:

1. Using Pattern and Matcher objects

Set pattern mode to:pattern.dotall

2. Using String.replaceall ()

Regular expression notation:

String reg = "(? s) '. * '";

The following is an example of a regular expression substitution process that contains a carriage return newline character.

static String teststr = "uapproject_id= ' 402894cb4833decf014833e04fd70002; \n\r */' SELECT ';/** * contains carriage return newline character handling */public void Testa () {Pattern WP = PATTERN.COMPILE ("'. *? '", Pattern.case_insensitive | Pattern.dotall); Matcher m = Wp.matcher (TESTSTR); String result = M.replaceall (""); SYSTEM.OUT.PRINTLN ("Result:" + result);} /** * contains carriage return newline character processing */public void Testb () {String result = Teststr.replaceall ("(? s) '. *? '", ""); SYSTEM.OUT.PRINTLN ("Result:" + result);}






Reference:

Java Regular expression function and application font: [Increase decrease] type: Reprint since jdk1.4 launched the Java.util.regex package, we have provided a good Java regular Expression application platform, because Java regular expression is a very complex system.

A regular expression is a formula that uses a pattern to match a type of string, and the regular expression consists of some ordinary characters and some metacharacters (metacharacters). Ordinary characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings, whether they are. NET platform or Java platform, the meaning of the expression is the same, the following we mainly analyze the Java regular expression in the function and the specific application, I hope the article is helpful to you, for reference only.
Since jdk1.4 launched the Java.util.regex package, we have provided a good Java regular Expression application platform, because Java regular expression is a very complex system.
\ \ counter Slash
\ t interval (' \u0009 ')
\ n line break (' \u000a ')
\ r Enter (' \u000d ')
\d number equivalent to [0-9]
\d non-numeric equivalent to [^0-9]
\s blank symbol [\t\n\x0b\f\r]
\s non-blank symbol [^\t\n\x0b\f\r]
\w individual characters [a-za-z_0-9]
\w non-individual characters [^a-za-z_0-9]
\f Page Break
\e Escape
\b The boundary of a word
\b A non-word boundary
\g the end of a previous match
^ Opening for the limit
^java conditions are limited to Java as the starting character
$ for Limit End
java$ conditions are limited to Java-terminated characters
. conditional limit except \ n any single character
Java.. Any two characters in addition to line breaks after a condition is restricted to Java
add a specific restriction condition "[]"
[A-Z] condition is limited to one character in the lowercase a to Z range
[A-Z] condition is limited to one character in the uppercase A to Z range
[A-za-z] Condition limited to one character in lowercase A to Z or uppercase A to Z range
[0-9] conditions limited to one character in the lowercase 0 to 9 range
[0-9a-z] conditions are limited to one character in the lowercase 0 to 9 or a to Z range
[0-9[a-z]] condition limited to one character (intersection) in lowercase 0 to 9 or a to Z range
[] Add ^ after add again limit condition "[^]"
[^a-z] Condition limited to one character in the non-lowercase a to Z range
[^a-z] condition is limited to one character in the non-uppercase A to Z range
[^a-za-z] conditions are limited to one character in the range of non-lowercase A to Z or uppercase A to Z
[^0-9] Condition limited to one character in a non-lowercase 0 to 9 range
[^0-9a-z] conditions are limited to one character in a non-lowercase 0 to 9 or a to Z range
[^0-9[a-z]] condition limited to one character (intersection) in non-lowercase 0 to 9 or a to Z range
You can use "*" when the limit is more than 0 times for a specific character
J* more than 0 J
. * More than 0 any characters
J.*d 0 or more characters between J and D
You can use "+" when the limit is more than 1 times for a specific character
j+ more than 1 J
. + 1 + any character
J.+d 1 or more characters between J and D
You can use "?" when the limit is 0 or 1 times for a specific character
MAX J or Ja appears
limit to consecutive occurrences of the specified number of characters "{a}"
J{2} JJ
J{3} JJJ
text more than a, and "{a,}"
J{3,} jjj,jjjj,jjjjj,??? (3 times above J co-exist)
More than one word, B below "{a,b}"
j{3,5} JJJ or JJJJ or JJJJJ
take a "|" of both .
j| A J or a
java| Hello java or hello
A combination type is specified in "()"
For example, I query <a href=\ "index.html\" >index</a> <a href></a> between the data, can write <a.*href=\ ". *\" > (. +?) </a>
When using the Pattern.compile function, you can add parameters that control the matching behavior of the Java regular expression:
Pattern Pattern.compile (String regex, int flag)
The value range of flag is as follows:
Pattern.canon_eq The match is determined only if the "normal decomposition (canonical decomposition)" of the two characters is identical. For example, after using this flag, the expression "a\u030a" will match "?". By default, "canonical equality (canonical equivalence)" is not considered.
pattern.case_insensitive (? i)

By default, case-insensitive matches apply only to the US-ASCII character set. This flag allows the expression to ignore casing for matching. To match a Unicode character with an unknown size, just combine the unicode_case with this flag.

pattern.comments (? x)
In this mode, the null characters in the Java regular expression is ignored in the match (translator note: not the "\\s" in the expression, but the space in the Expression, tab, enter, and so on). Comments start with # until the end of the line. You can enable UNIX line mode by using an embedded flag.
Pattern.dotall (? s)
In this mode, the expression '. ' You can match any character, including the Terminator that represents a line. By default, the expression '. ' Does not match the terminator of the row.
pattern.multiline (? m)
In this mode, ' ^ ' and ' $ ' match the start and end of a line, respectively. Also, ' ^ ' still matches the beginning of the string, ' $ ' also matches the end of the string. By default, these two expressions match only the beginning and end of a string.
pattern.unicode_case (? u)
In this mode, if you also enable the Case_insensitive flag, it will match the uppercase and lowercase characters of the Unicode character. By default, case-insensitive matches apply only to the US-ASCII character set.
pattern.unix_lines (? d)
In this mode, only ' \ n ' is considered a line abort and is matched with '. ', ' ^ ', and ' $ '. Throw away the empty concept, and write down a few simple Java regular use cases:
For example, when a string contains validation

Copy CodeThe code is as follows:
Find a string that starts with Java and ends at any end
Pattern pattern = pattern.compile ("^java.*");
Matcher Matcher = Pattern.matcher ("Java is not human");
Boolean b= matcher.matches (); Returns True when the condition is satisfied, otherwise false
System.out.println (b);

when splitting strings in multiple conditions
Copy CodeThe code is as follows:
Pattern pattern = Pattern.compile ("[, |] +");
string[] STRs = Pattern.split ("Java Hello World java,hello,,world| Sun ");
for (int i=0;i<strs.length;i++) {
System.out.println (Strs[i]);
}

text substitution (the first occurrence of a character)
Copy CodeThe code is as follows:
Pattern pattern = Pattern.compile ("Java Regular expression");
Matcher Matcher = Pattern.matcher ("Java Regular expression Hello world, regular expression Hello World");
Replace the first data that matches the regular one
System.out.println (Matcher.replacefirst ("Java"));

text replace (all)
Copy CodeThe code is as follows:
Pattern pattern = Pattern.compile ("Java Regular expression");
Matcher Matcher = Pattern.matcher ("Java Regular expression Hello world, regular expression Hello World");
Replace the first data that matches the regular one
System.out.println (Matcher.replaceall ("Java"));

text substitution (substitution characters)
Copy CodeThe code is as follows:
Pattern pattern = Pattern.compile ("Java Regular expression");
Matcher Matcher = Pattern.matcher ("Java Regular expression Hello world, regular expression Hello World");
StringBuffer sbr = new StringBuffer ();
while (Matcher.find ()) {
Matcher.appendreplacement (SBR, "Java");
}
Matcher.appendtail (SBR);
System.out.println (Sbr.tostring ());

Verify that you are an e-mail address
Copy CodeThe code is as follows:
String str= "[email protected]";
Pattern pattern = Pattern.compile ("[\\w\\.\\-]+@ ([\\w\\-]+\\.)] +[\\w\\-]+ ", pattern.case_insensitive);
Matcher Matcher = Pattern.matcher (str);
System.out.println (Matcher.matches ());

Remove HTML Tags
Copy CodeThe code is as follows:
Pattern pattern = pattern.compile ("<.+?>", Pattern.dotall);
Matcher Matcher = Pattern.matcher ("<a href=\" index.html\ "> Home </a>");
String string = Matcher.replaceall ("");
System.out.println (string);

finding the corresponding conditional string in HTML
Copy CodeThe code is as follows:
Pattern pattern = pattern.compile ("Href=\" (. +?) \"");
Matcher Matcher = Pattern.matcher ("<a href=\" index.html\ "> Home </a>");
if (Matcher.find ())
System.out.println (Matcher.group (1));
}

Intercept/HTTP address
Code
Copy CodeThe code is as follows:
Intercepting URLs
Pattern pattern = Pattern.compile ("(http://|https://) {1}[\\w\\.\\-/:]+");
Matcher Matcher = Pattern.matcher ("dsdsdsStringBuffer buffer = new StringBuffer ();
while (Matcher.find ()) {
Buffer.append (Matcher.group ());
Buffer.append ("\ r \ n");
System.out.println (Buffer.tostring ());
}

Replace the text in the specified {}
Code
Copy CodeThe code is as follows:
String str = "The current History of Java is made up of {0} years-{1} years";
String[][] object={new string[]{"\\{0\\}", "1995"},new string[]{"\\{1\\}", "2007"};
SYSTEM.OUT.PRINTLN (replace (str,object));
public static string replace (final String sourcestring,object[] Object) {
String temp=sourcestring;
for (int i=0;i<object.length;i++) {
String[] result= (string[]) object[i];
Pattern pattern = pattern.compile (result[0]);
Matcher Matcher = pattern.matcher (temp);
Temp=matcher.replaceall (result[1]);
}
return temp;
}

querying the files in the specified directory with regular conditions
Code
Copy CodeThe code is as follows:
For cache file List
Private ArrayList files = new ArrayList ();
Used to host file paths
Private String _path;
Used to host a regular formula that is not merged
Private String _regexp;
Class Myfilefilter implements FileFilter {
/**
* Match file name
*/
Public boolean accept (file file) {
try {
Pattern pattern = pattern.compile (_REGEXP);
Matcher match = Pattern.matcher (File.getname ());
return match.matches ();
} catch (Exception e) {
return true;
}
}
}
/**
* Parse input stream
* @param inputs
*/
Filesanalyze (String path,string regexp) {
GetFileName (PATH,REGEXP);
}
/**
* Analyze file names and add files
* @param input
*/
private void GetFileName (String path,string regexp) {
Directory
_path=path;
_regexp=regexp;
File directory = new file (_path);
file[] Filesfile = directory.listfiles (New Myfilefilter ());
if (Filesfile = = null) return;
for (int j = 0; J < Filesfile.length; J + +) {
Files.add (Filesfile[j]);
}
Return
}
/**
* Display output information
* @param out
*/
public void print (PrintStream out) {
Iterator elements = Files.iterator ();
while (Elements.hasnext ()) {
File file= (file) Elements.next ();
Out.println (File.getpath ());
}
}
public static void Output (String path,string regexp) {
Filesanalyze fileGroup1 = new Filesanalyze (PATH,REGEXP);
Filegroup1.print (System.out);
}
public static void Main (string[] args) {
Output ("c:\\", "[a-z|.] *");
}

There are many functions of the Java regular expression, in fact, as long as the character processing, there is no regular can not do things exist.

Java regular expression matches multiple lines

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.