Java Regular expression function and application _java

Source: Internet
Author: User
Tags character set comments html tags stringbuffer

A regular expression is a formula that matches a type of string in a pattern that consists of ordinary characters and some metacharacters (metacharacters). Ordinary characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings, whether they are. NET platform or Java platform, the regular expression is the same meaning, the following is our main analysis of the Java regular expression in the function and specific applications, I hope the article for you to help, for reference only.
Since jdk1.4 launched the Java.util.regex package, we have provided a good Java regular expression application platform because Java regular expressions are a very complex system.
\ reverse Slash
\ t interval (' \u0009 ')
\ n Line wrap (' \u000a ')
\ r Carriage return (' \u000d ')
\d number equivalent to [0-9]
\d non-numeric equivalent to [^0-9]
\s blank symbol [\t\n\x0b\f\r]
\s non-whitespace symbol [^\t\n\x0b\f\r]
\w individual characters [a-za-z_0-9]
\w non-individual characters [^a-za-z_0-9]
\f Page Breaks
\e Escape
\b The boundary of a word
\b A non-word boundary
\g the end of a previous match
^ for limit opening
^java condition is limited to Java-beginning characters
$ for Limit End
The java$ condition is limited to the end of Java character
. Condition limits any single character other than \ n
Java.. Condition is limited to Java after any two characters except for newline
add a specific restriction condition "[]"
[A-Z] condition is limited to one character in the lowercase a to Z range
[A-Z] condition is limited to one character in the uppercase A to Z range
The [a-za-z] condition is limited to one character in the lowercase A to Z or uppercase A to Z range
[0-9] The condition is limited to one character in the lowercase 0 to 9 range
The [0-9a-z] condition is limited to one character in the lowercase 0 to 9 or a to Z range
[0-9[a-z]] condition is limited to one character (intersection) in lowercase 0 to 9 or a to Z range
[] Add in ^ "[^]" again restriction condition
[^a-z] condition is limited to one character in a to Z range of not lowercase
[^a-z] condition is limited to one character in the non-uppercase A to Z range
[^a-za-z] condition is restricted to one character in a to Z range of not lowercase A to Z or uppercase
[^0-9] condition is limited to one character in a non-lowercase 0 to 9 range
[^0-9a-z] condition is limited to one character in the range of non-lowercase 0 to 9 or a to Z
[^0-9[a-z]] condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to Z
You can use the "*" when the limit condition is more than 0 occurrences of a particular character
J* more than 0 J
. * More than 0 arbitrary characters
More than 0 arbitrary characters between J.*d J and D
You can use the "+" when the limit condition is more than 1 occurrences of a particular character
j+ more than 1 J
. + More than 1 arbitrary characters
More than 1 arbitrary characters between J.+d J and D
You can use the "?" when a limit condition is 0 or 1 times the occurrence of a particular character
JA? J or Ja appears
limit to consecutive occurrences of the specified secondary number character "{a}"
J{2} JJ
J{3} JJJ
text A more than, and "{a,}"
J{3,} jjj,jjjj,jjjjj,??? (More than 3 times J coexist)
Text above, B below "{a,b}"
j{3,5} JJJ or JJJJ or JJJJJ
both take a "|"
j| A J or a
java| Hello java or hello
A combination type is specified in "()"
For example, I query <a href=\ "index.html\" >index</a> <a href></a> between the data, can write <a.*href=\ ". *\" > (. +?) </a>
When using the Pattern.compile function, you can add parameters that control the matching behavior of the Java regular expression:
Pattern Pattern.compile (String regex, int flag)
The range of values for flag is as follows:
Pattern.canon_eq if and only if the two-character "normal decomposition (canonical decomposition)" is exactly the same, the match is determined. For example, after using this flag, the expression "a\u030a" matches "?". By default, the specification equality (canonical equivalence) is not considered.
pattern.case_insensitive (? i)
By default, case-insensitive matching applies only to the US-ASCII character set. This flag allows an expression to ignore case matching. To match a Unicode character with an unknown size, just combine the unicode_case with the logo.
pattern.comments (? x)
In this mode, the match ignores the null characters in the Java Regular expression (translator Note: not the "\\s" in the expression, but the space in the Expression, tab, enter, etc.). Comments start at # until the end of the line. You can enable the UNIX line mode with embedded flags.
Pattern.dotall (? s)
In this mode, the expression '. ' You can match any character, including a Terminator that represents a line. By default, an expression '. ' does not match the end character of the line.
pattern.multiline (? m)
In this mode, ' ^ ' and ' $ ' match the start and end of a row, respectively. Furthermore, ' ^ ' still matches the beginning of the string, ' $ ' also matches the end of the string. By default, these two expressions only match the start and end of a string.
pattern.unicode_case (? u)
In this mode, if you also enable the Case_insensitive flag, it will match the case of the Unicode character with a case-insensitive sense. By default, case insensitive matches are only applicable to the US-ASCII character set.
pattern.unix_lines (? d)
In this mode, only ' \ n ' is recognized as a row abort and is matched with '. ', ' ^ ', and ' $ '. To put aside the vague concepts, write a few simple Java regular use cases below:
For example, when a string contains validation

Copy Code code as follows:

Find a string that starts with Java and ends at any end
Pattern pattern = pattern.compile ("^java.*");
Matcher Matcher = Pattern.matcher ("Java is not a person");
Boolean b= matcher.matches (); Returns True when the condition is satisfied, otherwise returns false
System.out.println (b);

when splitting a string with multiple criteria
Copy Code code as follows:

Pattern pattern = Pattern.compile ("[, |] +");
string[] STRs = Pattern.split ("Java Hello World java,hello,,world| Sun ");
for (int i=0;i<strs.length;i++) {
System.out.println (Strs[i]);
}

text substitution (first occurrence of characters)
Copy Code code as follows:

Pattern pattern = Pattern.compile ("Java Regular expression");
Matcher Matcher = Pattern.matcher ("Java Regular expression Hello world, regular expression Hello World");
Replace the first data that matches the regular
System.out.println (Matcher.replacefirst ("Java"));

text substitution (all)
Copy Code code as follows:

Pattern pattern = Pattern.compile ("Java Regular expression");
Matcher Matcher = Pattern.matcher ("Java Regular expression Hello world, regular expression Hello World");
Replace the first data that matches the regular
System.out.println (Matcher.replaceall ("Java"));

text substitution (substitution character)
Copy Code code as follows:

Pattern pattern = Pattern.compile ("Java Regular expression");
Matcher Matcher = Pattern.matcher ("Java Regular expression Hello world, regular expression Hello World");
StringBuffer sbr = new StringBuffer ();
while (Matcher.find ()) {
Matcher.appendreplacement (SBR, "Java");
}
Matcher.appendtail (SBR);
System.out.println (Sbr.tostring ());

Verify that you are the mailbox address
Copy Code code as follows:

String str= "ceponline@yahoo.com.cn";
Pattern pattern = Pattern.compile ("[\\w\\.\\-]+@" [\\w\\-]+\\.) +[\\w\\-]+ ", pattern.case_insensitive);
Matcher Matcher = Pattern.matcher (str);
System.out.println (Matcher.matches ());

Remove HTML Tags
Copy Code code as follows:

Pattern pattern = pattern.compile ("<.+?>", Pattern.dotall);
Matcher Matcher = Pattern.matcher ("<a href=\" index.html\ "> Homepage </a>");
String string = Matcher.replaceall ("");
System.out.println (string);

find the corresponding condition string in HTML
Copy Code code as follows:

Pattern pattern = pattern.compile ("Href=\" (. +?) \"");
Matcher Matcher = Pattern.matcher ("<a href=\" index.html\ "> Homepage </a>");
if (Matcher.find ())
System.out.println (Matcher.group (1));
}

intercept http://address
Code
Copy Code code as follows:

Intercepting URLs
Pattern pattern = Pattern.compile ("(http://|https://) {1}[\\w\\.\\-/:]+");
Matcher Matcher = Pattern.matcher ("dsdsdsStringBuffer buffer = new StringBuffer ();
while (Matcher.find ()) {
Buffer.append (Matcher.group ());
Buffer.append ("\ r \ n");
System.out.println (Buffer.tostring ());
}

Replace text in specified {}
Code
Copy Code code as follows:

String str = "Java current phylogeny is by {0} years-{1}";
String[][] object={new string[]{"\\{0\\}", "1995"},new string[]{"\\{1\\}", "2007"}};
SYSTEM.OUT.PRINTLN (replace (str,object));
public static string replace (final String sourcestring,object[] Object) {
String temp=sourcestring;
for (int i=0;i<object.length;i++) {
String[] result= (string[]) object[i];
Pattern pattern = pattern.compile (result[0]);
Matcher Matcher = pattern.matcher (temp);
Temp=matcher.replaceall (result[1]);
}
return temp;
}

querying the specified directory for files in a regular condition
Code
Copy Code code as follows:

Used to cache a list of files
Private ArrayList files = new ArrayList ();
Used to host file paths
Private String _path;
Used to host a regular formula that is not merged
Private String _regexp;
Class Myfilefilter implements FileFilter {
/**
* Matching file name
*/
Public boolean accept (file file) {
try {
Pattern pattern = pattern.compile (_REGEXP);
Matcher match = Pattern.matcher (File.getname ());
return match.matches ();
catch (Exception e) {
return true;
}
}
}
/**
* Parse input stream
* @param inputs
*/
Filesanalyze (String path,string regexp) {
GetFileName (PATH,REGEXP);
}
/**
* Analyze file name and add files
* @param input
*/
private void GetFileName (String path,string regexp) {
Directory
_path=path;
_regexp=regexp;
File directory = new file (_path);
file[] Filesfile = directory.listfiles (New Myfilefilter ());
if (filesfile = null) return;
for (int j = 0; J < Filesfile.length; J + +) {
Files.add (Filesfile[j]);
}
Return
}
/**
* Display output information
* @param out
*/
public void print (PrintStream out) {
Iterator elements = Files.iterator ();
while (Elements.hasnext ()) {
File file= (file) Elements.next ();
Out.println (File.getpath ());
}
}
public static void Output (String path,string regexp) {
Filesanalyze fileGroup1 = new Filesanalyze (PATH,REGEXP);
Filegroup1.print (System.out);
}
public static void Main (string[] args) {
Output ("c:\\", "[a-z|.] *");
}

The function of Java regular expressions is still a lot, in fact, as long as the character processing, there is no regular things do not exist.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.