Java gets regular expressions

Source: Internet
Author: User


As we all know, in the process of development, it is inevitable to encounter the need to match, find, replace, infer the string of the situation occurs. And these situations are sometimes more complex, assuming that a pure coding approach, often wasting the time and energy of the program ape. Therefore, the study and use of the principle of the form, it has become the main means to solve this contradiction.
As we all know, the canonical form is a specification that can be used for pattern matching and substitution, and a canonical form is a text pattern consisting of ordinary characters (such as characters A through Z) and special characters (metacharacters). It is used to describe one or more strings to be matched when looking up a text body. The regular form is used as a template to match a character pattern to the searched string.
Since the launch of the Java.util.regex package, jdk1.4 has provided us with a very good Java-like application platform.

Since the normal form is a very complex system, I just cite some introductory concepts. Many others please refer to the relevant books and explore their own.

Back slash
/t interval ('/u0009 ')
/N line break ('/u000a ')
/R Enter ('/u000d ')
/d number equivalent to [0-9]
/d non-numeric equivalent to [^0-9]
/s blank sign [/T/N/X0B/F/R]
/S non-blank symbol [^/T/N/X0B/F/R]
/w individual characters [a-za-z_0-9]
/w non-individual characters [^a-za-z_0-9]
/F page Break
/e Escape
/b The boundary of a word
/b A non-word boundary
/g the end of a previous match

^ Opening for the limit
^java conditions are limited to Java as the starting character
$ for Limit End
java$ conditions are limited to Java-terminated characters
. Conditional restriction in addition to/n arbitrary a single character
Java.. The condition is limited to Java after two characters in addition to line break


Add specific Restrictions "[]"
[A-Z] condition is limited to one character in the lowercase a to Z range
[A-Z] condition is limited to one character in the uppercase A to Z range
[A-za-z] Condition limited to one character in lowercase A to Z or uppercase A to Z range
[0-9] conditions limited to one character in the lowercase 0 to 9 range
[0-9a-z] conditions are limited to one character in the lowercase 0 to 9 or a to Z range
[0-9[a-z]] condition limited to one character (intersection) in lowercase 0 to 9 or a to Z range

[] Add ^ after add again limit condition "[^]"
[^a-z] Condition limited to one character in the non-lowercase a to Z range
[^a-z] condition is limited to one character in the non-uppercase A to Z range
[^a-za-z] conditions are limited to one character in the range of non-lowercase A to Z or uppercase A to Z
[^0-9] Condition limited to one character in a non-lowercase 0 to 9 range
[^0-9a-z] conditions are limited to one character in a non-lowercase 0 to 9 or a to Z range
[^0-9[a-z]] condition limited to one character (intersection) in non-lowercase 0 to 9 or a to Z range

"*" can be used if the limit is more than 0 times for a specific character
J* more than 0 J
. * More than 0 random characters
J.*d 0 or more random characters between J and D

When the constraint is more than 1 occurrences of a specific character. Ability to use "+"
j+ more than 1 J
. + More than 1 random characters
J.+d 1 or more random characters between J and D

"?" can be used when the limit is 0 or 1 times for a specific character
MAX J or Ja appears

Limit to consecutive occurrences of the specified number of characters "{a}"
J{2} JJ
J{3} JJJ
Text a more than one. and "{a,}"
J{3,} jjj,jjjj,jjjjj,??? (3 times above J co-exist)
More than one word, B below "{a,b}"
j{3,5} JJJ or JJJJ or JJJJJ
Take a "|" of both.
j| A J or a
java| Hello java or hello

A combination type is specified in "()"
Example. I query the data between <a href=/"index.html/" >index</a> <a href></a>. Can write <a.*href=/". * *" > (. +?) </a>

When using the Pattern.compile function, you can increase the number of parameters that control the matching behavior of the normal table:
Pattern Pattern.compile (String regex, int flag)

The value range for flag is as follows:
Pattern.canon_eq when and only two characters of the "normal decomposition (canonical decomposition)" are exactly the same situation. To identify the match.

For example, after using this flag, the expression "a/u030a" will match "?". By default, "canonical equality (canonical equivalence)" is not considered.
Pattern.case_insensitive (? i) by default. Uppercase and lowercase ambiguity matching is only available for the US-ASCII character set.

This flag allows the expression to ignore uppercase and lowercase for matching. To match Unicode characters with an unknown size. Just combine the unicode_case with this sign to get it together.
Pattern.comments (?

x) in such a pattern, the null characters (in the form of a statement) is ignored (the translator notes: Not the "//s" in an expression.) Instead, it refers to the space in the Expression, tab. Carriage return, etc.). Staring from the # start. Until the end of the line. The ability to enable UNIX line mode through embedded flags.
Pattern.dotall (? s) in such a mode, the expression '. ' Ability to match random characters, including the Terminator that represents a line. By default. Expression '. ' Does not match the terminator of the row.
Pattern.multiline
(?

m) in such a mode. ' ^ ' and ' $ ' match the beginning and end of a line, respectively.

Also, ' ^ ' still matches the beginning of the string, ' $ ' also matches the end of the string. By default, these two expressions match only the beginning and end of a string.
Pattern.unicode_case
(? u) in this mode, assuming that you also have the CASE_INSENSITIVE flag enabled, it matches the Unicode characters in uppercase and lowercase. By default. Uppercase and lowercase insensitive matches apply only to the US-ASCII character set.


Pattern.unix_lines (?

d) in this mode. Only '/n ' is considered a line abort and is matched with '. ', ' ^ ', and ' $ '.


Throw away the empty concept, and write a few simple Java regular use cases as follows:

For example, when a string includes validation

Find a string that starts with Java and ends at random
Pattern pattern = pattern.compile ("^java.*");
Matcher Matcher = Pattern.matcher ("Java is not human");
Boolean b= matcher.matches ();
Returns True when the condition is satisfied, otherwise false
System.out.println (b);


When you are cutting strings in multiple conditions
Pattern pattern = Pattern.compile ("[, |] +");
string[] STRs = Pattern.split ("Java Hello World java,hello,,world| Sun ");
for (int i=0;i<strs.length;i++) {
System.out.println (Strs[i]);
}

Text substitution (the first occurrence of a character)
Pattern pattern = Pattern.compile ("N-form");
Matcher Matcher = Pattern.matcher ("The regular form Hello World," the statement is the "Hello World");
Replace the first data that matches the regular one
System.out.println (Matcher.replacefirst ("Java"));

Text substitution (All)
Pattern pattern = Pattern.compile ("N-form");
Matcher Matcher = Pattern.matcher ("The regular form Hello World," the statement is the "Hello World");
Replace the first data that matches the regular one
System.out.println (Matcher.replaceall ("Java"));


Text substitution (substitution characters)
Pattern pattern = Pattern.compile ("N-form");
Matcher Matcher = Pattern.matcher ("The regular form Hello World," the statement is the "Hello World");
StringBuffer sbr = new StringBuffer ();
while (Matcher.find ()) {
Matcher.appendreplacement (SBR, "Java");
}
Matcher.appendtail (SBR);
System.out.println (Sbr.tostring ());

Verify that you are an e-mail address

String str= "[email protected]";
Pattern pattern = Pattern.compile ("[//w//.//-][email protected] ([//w//-]+//.) +[//w//-]+ ", pattern.case_insensitive);
Matcher Matcher = Pattern.matcher (str);
System.out.println (Matcher.matches ());

Remove HTML tags
Pattern pattern = pattern.compile ("<.+?>", Pattern.dotall);
Matcher Matcher = Pattern.matcher ("<a href=/" index.html/"> Home </a>");
String string = Matcher.replaceall ("");
System.out.println (string);

Find the appropriate conditional string in HTML
Pattern pattern = pattern.compile ("href=/" (. +?

)/"");
Matcher Matcher = Pattern.matcher ("<a href=/" index.html/"> Home </a>");
if (Matcher.find ())
System.out.println (Matcher.group (1));
}

Intercept/HTTP Address
Intercepting URLs
Pattern pattern = Pattern.compile ("(http://|https://) {1}[//w//.//-/:]+");
Matcher Matcher = Pattern.matcher ("dsdsdsStringBuffer buffer = new StringBuffer ();
while (Matcher.find ()) {
Buffer.append (Matcher.group ());
Buffer.append ("/r/n");
System.out.println (Buffer.tostring ());
}

Replace the text in the specified {}

String str = "Java is now in the history of {0} years-{1} years";
String[][] object={new string[]{"//{0//}", "1995"},new string[]{"//{1//}", "2007"};
SYSTEM.OUT.PRINTLN (replace (str,object));

public static string replace (final String sourcestring,object[] Object) {
String temp=sourcestring;
for (int i=0;i<object.length;i++) {
String[] result= (string[]) object[i];
Pattern pattern = pattern.compile (result[0]);
Matcher Matcher = pattern.matcher (temp);
Temp=matcher.replaceall (result[1]);
}
return temp;
}


Querying a file under a specified folder with regular conditions

For cache file List
Private ArrayList files = new ArrayList ();
Used to host file paths
Private String _path;
Used to host a regular formula that is not merged
Private String _regexp;

Class Myfilefilter implements FileFilter {

/**
* Matching file name is called
*/
Public boolean accept (file file) {
try {
Pattern pattern = pattern.compile (_REGEXP);
Matcher match = Pattern.matcher (File.getname ());
return match.matches ();
} catch (Exception e) {
return true;
}
}
}

/**
* Parse input stream
* @param inputs
*/
Filesanalyze (String path,string regexp) {
GetFileName (PATH,REGEXP);
}

/**
* Analyze file names and add files
* @param input
*/
private void GetFileName (String path,string regexp) {
Folder
_path=path;
_regexp=regexp;
File directory = new file (_path);
file[] Filesfile = directory.listfiles (New Myfilefilter ());
if (Filesfile = = null) return;
for (int j = 0; J < Filesfile.length; J + +) {
Files.add (Filesfile[j]);
}
Return
}

/**
* Display output information
* @param out
*/
public void print (PrintStream out) {
Iterator elements = Files.iterator ();
while (Elements.hasnext ()) {
File file= (file) Elements.next ();
Out.println (File.getpath ());
}
}

public static void Output (String path,string regexp) {

Filesanalyze fileGroup1 = new Filesanalyze (PATH,REGEXP);
Filegroup1.print (System.out);
}

public static void Main (string[] args) {
Output ("c://", "[a-z|.] *");
}

The Java general function is also very much, in fact, only when the character processing, we can not do an irregular existence. (Of course, more time-consuming, when the conventional explanation is | | | ...... )

Java gets regular expressions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.