Introduction to Java Regular Expressions (novice must SEE) _ Regular expressions

Source: Internet
Author: User
Tags character set html tags lowercase regular expression stringbuffer

A regular expression is a specification that can be used for pattern matching and substitution, and a regular expression is a literal pattern consisting of ordinary characters (such as characters A through Z) and special characters (metacharacters) that describe one or more strings to be matched when looking for text bodies. A regular expression is used as a template to match a character pattern with the string being searched for.

As we all know, in the process of development, it is inevitable to encounter the need to match, find, replace, judge the situation of strings, and these situations are sometimes more complex, if the use of pure coding to solve, will often waste the programmer's time and energy. Therefore, learning and using regular expressions are the main means to solve this contradiction.

As we all know, regular expressions are a specification that can be used for pattern matching and substitution. A regular expression is a literal pattern consisting of ordinary characters (such as characters A through Z) and special characters (metacharacters) that describe one or more strings to be matched when finding the body of the text. A regular expression is used as a template to match a character pattern with the string being searched for.

Since jdk1.4 launched the Java.util.regex package, it provides us with a good Java regular Expression application platform.

Because the regular expression is a very complex system, so I just take a few examples of the concept of getting started, more see the relevant books and explore on their own.

Back slash
/t interval ('/u0009 ')
/n Line Wrap ('/u000a ')
/R carriage return ('/u000d ')
/d number is equivalent to [0-9]
/d non-numeric equivalent to [^0-9]
/s blank symbol [/T/N/X0B/F/R]
/S non-blank symbol [^/T/N/X0B/F/R]
/w individual characters [a-za-z_0-9]
/w non-individual characters [^a-za-z_0-9]
/F Page Feed character
/e Escape
/b The boundary of a word
/b A non-word boundary
/g the end of a previous match

^ for limit opening
^java condition is limited to Java-beginning characters
$ for Limit End
The java$ condition is limited to the end of Java character
. Conditional limit any single character except/n
Java.. Condition is limited to Java after any two characters except for newline

Add a specific restriction condition "[]"
[A-Z] condition is limited to one character in the lowercase a to Z range
[A-Z] condition is limited to one character in the uppercase A to Z range
The [a-za-z] condition is limited to one character in the lowercase A to Z or uppercase A to Z range
[0-9] The condition is limited to one character in the lowercase 0 to 9 range
The [0-9a-z] condition is limited to one character in the lowercase 0 to 9 or a to Z range
[0-9[a-z]] condition is limited to one character (intersection) in lowercase 0 to 9 or a to Z range

[] Add in ^ "[^]" again restriction condition
[^a-z] condition is limited to one character in a to Z range of not lowercase
[^a-z] condition is limited to one character in the non-uppercase A to Z range
[^a-za-z] condition is restricted to one character in a to Z range of not lowercase A to Z or uppercase
[^0-9] condition is limited to one character in a non-lowercase 0 to 9 range
[^0-9a-z] condition is limited to one character in the range of non-lowercase 0 to 9 or a to Z
[^0-9[a-z]] condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to Z

You can use the "*" when the limit condition is more than 0 occurrences of a particular character
J* more than 0 J
. * More than 0 arbitrary characters
More than 0 arbitrary characters between J.*d J and D

You can use the "+" when the limit condition is more than 1 occurrences of a particular character
j+ more than 1 J
. + More than 1 arbitrary characters
More than 1 arbitrary characters between J.+d J and D

You can use the "?" when a limit condition is 0 or 1 times the occurrence of a particular character
JA? J or Ja appears

Limit to consecutive occurrences of the specified secondary number character "{a}"
J{2} JJ
J{3} JJJ
Text a more than, and "{a,}"
J{3,} jjj,jjjj,jjjjj,??? (More than 3 times J coexist)
Text above, B below "{a,b}"
j{3,5} JJJ or JJJJ or JJJJJ
Both take a "|"
j| A J or a
java| Hello java or hello

A combination type is specified in "()"
For example, I query <a href=/"index.html/" >index</a> <a href></a> between the data, can write <a.*href=/". * *" > (. +?) </a>

When you use the Pattern.compile function, you can add parameters that control the matching behavior of the regular expression:
Pattern Pattern.compile (String regex, int flag)

The range of values for flag is as follows:

Pattern.canon_eq if and only if the two-character "normal decomposition (canonical decomposition)" is exactly the same, the match is determined. For example, after using this flag, the expression "a/u030a" matches "?". By default, the specification equality (canonical equivalence) is not considered.

Pattern.case_insensitive (? i) by default, case-insensitive matching applies only to the US-ASCII character set. This flag allows an expression to ignore case matching. To match a Unicode character with an unknown size, just combine the unicode_case with the logo.
Pattern.comments (? x) in this mode, the match is ignored (in the regular expression) empty characters (translator Note: Not the expression in the "//s", but refers to the expression in the Space, tab, enter and so on). Comments start at # until the end of the line. You can enable the UNIX line mode with embedded flags.
Pattern.dotall (? s) in this mode, the expression '. ' You can match any character, including a Terminator that represents a line. By default, an expression '. ' does not match the end character of the line.

Pattern.multiline

(? m) in this mode, ' ^ ' and ' $ ' match the start and end of a row, respectively. Furthermore, ' ^ ' still matches the beginning of the string, ' $ ' also matches the end of the string. By default, these two expressions only match the start and end of a string.

Pattern.unicode_case
(? u) in this mode, if you also enable the Case_insensitive flag, it will match the case of Unicode characters. By default, case insensitive matches are only applicable to the US-ASCII character set.

Pattern.unix_lines (? d) in this mode, only '/n ' is recognized as a row abort and is matched with '. ', ' ^ ', and ' $ '.

To put aside the vague concepts, write a few simple Java regular use cases below:

For example, when a string contains validation

Finds a string with the beginning of Java, arbitrary end pattern
 = Pattern.compile ("^java.*");
 Matcher Matcher = Pattern.matcher ("Java is not a person");
 Boolean b= matcher.matches ();
 Returns True when the condition is satisfied, otherwise it returns false
 System.out.println (b);

When splitting a string with multiple criteria

Pattern pattern = Pattern.compile ("[, |] +");
string[] STRs = Pattern.split ("Java Hello World java,hello,,world| Sun ");
for (int i=0;i<strs.length;i++) {
  System.out.println (strs[i]);

Text substitution (first occurrence of characters)

Pattern Pattern.compile ("regular expression");
Matcher Matcher = Pattern.matcher ("Regular expression Hello world, regular expression Hello World");
Replaces the first consistent data
System.out.println (Matcher.replacefirst ("Java"));

Text substitution (All)

Pattern Pattern.compile ("regular expression");
Matcher Matcher = Pattern.matcher ("Regular expression Hello world, regular expression Hello World");
Replaces the first consistent data
System.out.println (Matcher.replaceall ("Java"));

Text substitution (substitution character)

Pattern Pattern.compile ("regular expression");
Matcher Matcher = Pattern.matcher ("Regular expression Hello world, regular expression Hello World");
StringBuffer sbr = new StringBuffer ();
while (Matcher.find ()) {
  matcher.appendreplacement (SBR, "Java");
}
Matcher.appendtail (SBR);
System.out.println (Sbr.tostring ());

Verify that you are the mailbox address

String str= "ceponline@yahoo.com.cn";
Pattern pattern = Pattern.compile ("[//w//.//-]+@" [//w//-]+//.) +[//w//-]+ ", pattern.case_insensitive);
Matcher Matcher = Pattern.matcher (str);
System.out.println (Matcher.matches ());

Remove HTML tags

Pattern pattern = pattern.compile ("<.+?>", Pattern.dotall);
Matcher Matcher = Pattern.matcher ("<a href=/" index.html/"> Homepage </a>");
String string = Matcher.replaceall ("");
System.out.println (string);

Find the corresponding condition string in HTML

Pattern pattern = pattern.compile ("href=/" (. +?) /"");
Matcher Matcher = Pattern.matcher ("<a href=/" index.html/"> Homepage </a>");
if (Matcher.find ())
 System.out.println (Matcher.group (1));
}

Intercept http://Address

Intercept URL pattern Pattern
= Pattern.compile ("(http://|https://) {1}[//w//.//-/:]+");
Matcher Matcher = Pattern.matcher ("dsdsds 
  

Replace text in specified {}

String str = "Java current phylogeny is by {0} years-{1}";
String[][] object={new string[]{"//{0//}", "1995"},new string[]{"//{1//}", "2007"};
SYSTEM.OUT.PRINTLN (replace (str,object));
public static string replace (final String sourcestring,object[] Object) {
      string temp=sourcestring;  
      for (int i=0;i<object.length;i++) {
           string[] result= (string[]) object[i];
        Pattern Pattern  = Pattern.compile (result[0]);
        Matcher Matcher = pattern.matcher (temp);
        Temp=matcher.replaceall (result[1]);
      return temp;
}

Querying the specified directory for files in a regular condition

For caching file lists private ArrayList files = new ArrayList ();
    Used to host the file path private String _path;
    Used to host a regular formula that is not merged private String _regexp; Class Myfilefilter implements FileFilter {/** * matching file name */public boolean accept (file file)
         {try {pattern = Pattern.compile (_REGEXP);        
         Matcher match = Pattern.matcher (File.getname ());
        return match.matches ();
        catch (Exception e) {return true; }}/** * Parse input stream * @param inputs/filesanalyze (String path,string regexp) {g
    Etfilename (PATH,REGEXP);
      /** * Parse filename and add files * @param input/private void GetFileName (String path,string regexp) {
       Directory _path=path;
       _regexp=regexp;
       File directory = new file (_path);
       file[] Filesfile = directory.listfiles (New Myfilefilter ());
       if (filesfile = null) return; for (int j = 0; J <Filesfile.length;
       J + +) {Files.add (filesfile[j]);
      } return; /** * Display output information * @param out */public void print (PrintStream out) {iterator elements = file
      S.iterator ();
          while (Elements.hasnext ()) {file file= (file) Elements.next ();  
      Out.println (File.getpath ()); } public static void output (String path,string regexp) {filesanalyze fileGroup1 = new Filesanalyze (path,r
      EGEXP);
    Filegroup1.print (System.out); public static void Main (string[] args) {output ("c://", "[a-z|.]
    *"); }

Java is still a lot of functions, in fact, as long as the character processing, there is no positive things do not exist. (Of course, regular explanations are more time-consuming than others.) ...... )

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.