Application Analysis of Regular Expressions in Java programs [converted]

Source: Internet
Author: User
In the regular expression rules, '\' appears as an escape character. in regular expressions, many characters have been given special meanings as keywords of regular expressions, thus losing their meanings. for example, '*' indicates matching the '*' subexpression zero or multiple times. If you want to match a normal '*' character, '\ *' is required. in the Java source code string, '\' is first interpreted by the Java bytecode compiler as a unicode escape or other character escape defined in the Java language, therefore, to represent a normal '*' character in the Regular Expression in the Java source code, add two '\', that is '\\*', converts the second '\' to a normal character in the Java source code, and this normal character represents the Escape Character in the Regular Expression in the Java source code.
If you want to represent a normal '\' character of the regular expression in the Java source code, You Need To Represent '\' like this '\\\\', the first and third ''\ 'indicate the escape characters in the Java compiler. The second character indicates the escape characters in the regular expression, escape the fourth '\' as a regular '\' character in the regular expression.

1. matching judgment and search
In Java, there are two common classes for matching and judgment operations: Java. util. RegEx. pattern, java. util. RegEx. matcher.
Pattern is a compilation class of Regular Expressions in string form. A regular expression specified as a string must first be compiled as an example of it.
Matcher is defined as "engine for performing matching operations on character sequences by interpreting pattern ".
In layman's terms, a regular expression in the form of a character is compiled into an instance of pattern, which is equivalent to a miner in mine clearance, while the matcher method of the pattern instance (the parameter is equivalent to a location) this is equivalent to the person who takes the mine sweeper. When matcher's instance method matches () is executed, this method returns a Boolean value indicating whether the filter matches, this operation is equivalent to putting the mine sweeper to this location. After this person sees the instructions on the mine sweeper, he will find out whether there is a ray in the location.
If we want to determine whether a file name contains the. Class suffix, because the. Class suffix sometimes has different forms of case, we can use the following method to determine:
1> first, construct a regular expression.
// This regular expression indicates that. in class. the character next to the letter must be other or. the class is no longer followed by other characters, or "," or null characters, or the two are followed by other strings ,". "indicates matching any character. [^] indicates any character except the character followed by it. + indicates matching at least once, and * Indicates matching at least 0 times, {0, 1} indicates matching 0 or 1 time, and [CC] indicates matching uppercase or lowercase C is equivalent to [c |], "|" indicates or ,() represents a complete string expression
String RegEx = ". * [^] + \\. [CC] [ll] [AA] [ss] [ss] ([, |]. *) {0, 1 }";
Pattern P = pattern. Compile (RegEx );
Matcher M = P. matcher ("file1.class,test.txt ")
Boolean B = M. Matches ();
This regular expression can match "XX. class, "or" XX. class "or" XX. class "or" XX. clss, XXX "or" XX. class XXX "and many other forms of strings.

Another pair of useful positioning characters are "^" and "$". They can match certain special strings when used together with "+" and, for example
// B indicates the middle position of a row. ^ indicates the beginning of the row and $ indicates the end of the row.
String RegEx = "^ * at http://www.cnblogs.com/gardenforu/admin/file:// B [1-9425%7b1%7d//d#chapter + [^] +. * [^] ** $ ";
Pattern P = pattern. Compile (RegEx );
Matcher M = P. matcher ("Chapter 1st Nanchang Qianyi ");
Boolean B = M. Matches ();
This expression can match the chapter names of the "Chapter 1st Nanchang Uprising" or "Chapter 1st Nanchang Uprising" and other styles. It has limitations that the line can only start with a space and can only contain the "Number, and the word "Number" is followed by an integer other than 0, followed by a number and followed by one or more spaces, the space must be followed by a non-empty chapter name. the section name must contain at least one non-null character. For other placeholders and delimiters, see related documents.

In addition, pattern predefines some constants to indicate common matching modes. For example, case_insensitive indicates that matching is case insensitive. You can use another compile method:
String RegEx = "http://www.cnblogs.com/gardenforu/admin/file://.class ";
Pattern P = pattern. Compile (RegEx, pattern. case_insensitive );
Matcher M = P. matcher (". Class ");
Boolean B = M. Matches (); // The value of B is true.

Once an instance of the pattern class is compiled, it cannot be changed and is thread-safe. Therefore, it can be used in concurrent operations.
For a pattern that is used only once, you can use the pattern static method matches (string RegEx, charsequence input). This class can be easily used to define the matches method when only one regular expression is used. this method compiles the expression and matches the input sequence with it in a single call. statement
Boolean B = pattern. Matches ("a * B", "aaaaab ");
It is equivalent to the preceding three statements.
However, repeated matching is inefficient because it does not allow reuse of compiled patterns.

Another Verification Method for matching and judgment is to use the string matches (string RegEx) method.
String S = ". Class ";
Boolean = S. Matches (http://www.cnblogs.com/gardenforu/admin/file://.?cc=?ll=*aa=#ss=#ss/])

2. String replacement and segmentation in the string object
In the string method, there are two replacement methods and two segmentation methods that can use regular expressions.
1> replaceall (string RegEx, string replacement) "replace this string with the given replacement string to match all the substrings of the given regular expression
2> string replacefirst (string RegEx, string replacement): replace the string with the given replacement string to match the first substring of the given regular expression.
3> string [] Split (string RegEx): splits the string based on the matching of the given regular expression.
4> string [] Split (string RegEx, int limit) splits the string based on the matching regular expression. The following parameter determines the length of the returned character array.

These methods use the same regular expression construction method, for example, the given mode "http://www.cnblogs.com/gardenforu/admin/file://.?cc=[ll=#ss#ss/]", replacing the string. class, the first three methods are executed in sequence as follows: replace all of the given strings that contain case insensitive. class letter, replace with the first one, which is case-insensitive. class splits the string.

The split (string RegEx, int limit) method is a bit special when used, because it has an additional matching limit parameter.
The document is explained as follows:
The limit parameter controls the number of times that the mode applies, thus affecting the length of the result array. if the limit N is greater than 0, the pattern will be applied up to n-1 times, the length of the array will not be greater than N, and the last entry of the array will contain all input that exceeds the last matching delimiter. if n is not positive, the pattern will be applied as many times as possible, and the array can be any length. if n is zero, the mode will be applied as many times as possible, the array can have any length, and the null string at the end will be discarded.
For example
"Testatestatestatest". Split ("A | A", 2); // The split string is test, testatestatest
"Testatestatestatest". Split ("A | A", 3); // The split string is Test, test, testatest

In addition, this method has the same effect as the split method of pattern.
Pattern. Compile ("A | A"). Split ("testatestatestatest", 2 );

Sometimes, to match line breaks in text content in some text editors or databases, use \ n | \ r \ n.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.