This article describes how to use regular expressions in Java to work with text data. A regular expression is a string, but unlike a regular string, a regular expression is an abstraction of a set of similar strings, such as the following strings:
a98b c0912d c10b a12345678d AB
We analyze the five strings above to see that they have a common feature that the first character must be ' a ' or ' C ', the last character must be ' B ' or ' d ', and that the middle character is any number of digits (including 0 digits). So we can abstract the common features of these five strings, which produces a regular expression: [AC]//D*[BD]. And based on this regular expression, we can write infinite number of strings that satisfy the condition.
There are many ways to use regular expressions in Java, and the simplest is to use them with strings. There are four methods in string that can use regular expressions, they are matches, split, ReplaceAll, and Replacefirst.
First, Matches method
The matches method can determine whether the current string matches a given regular expression. Returns true if a match, otherwise, returns false. The matches method is defined as follows:
Public boolean matches (String regex)
As the regular expression given above we can use the following program to verify.
string[] ss = new string[]{"a98b", "c0912d", "c10b", "a12345678d", "AB"};
for (String S:ss)
System.out.println (S.matches ("[AC]//D*[BD]"));
Output results:
True
True
True
True
True
Here's a quick explanation of what this regular expression means. If we have learned the lexical analysis of the compiler principle, it is easy to understand the regular expression above (because the expression of the regular expression is similar to that in the lexical analysis). As in [...] The equivalent or "|" in, such as [ABCD] is equivalent to a|b|c|d, i.e. A or B or C or D. As the beginning of the regular expression above is [AC], it means that the beginning of the string can only be a or C. The [BD] expression string ending can only be B or D. And the middle/D expression 0-9 number, because/in the regular expression has a special meaning, so in//to express/. and * indicates that there are 0 or infinitely many (which are called * closures in lexical analysis), so that the expression has 0 or infinitely many digits, since * follows/d.
Second, split method
The split method uses a regular expression to split the string and returns the result as a string array. Split has two overloaded forms, which are defined as follows:
Public string[] Split (String regex)
Public string[] Split (String regex, int limit)
The following code will use the first overloaded form of split to split the first line of the HTTP request header, as follows:
String s = "get/index.html http/1.1";
String ss[] = s.split ("+");
for (String Str:ss)
System.out.println (str);
Output results:
Get
/index.html
http/1.1
When using the first overloaded form of split, it should be noted that if the split string finally has an empty string, it is ignored. If you use regular expression/d to split string a0b1c3456, the resulting array has a length of 3 instead of 7.
There is a limit parameter in the second overloaded form of split, which is discussed in three different situations:
1. Greater than 0: if the value of limit is n, then the regular expression will be used n-1 times, the following code:
String s = "a0b1c3456";
String ss[] = S.split ("//d", 3);
for (String Str:ss)
System.out.println (str);
Output:
a
B
c3456
The result shows that the program only uses two regular expressions for "a0b1c3456", that is, after the character ' 1 ' is less scanned, Regardless of the string that satisfies the condition, the following string is used as a whole to return the last value of the array.
2. Less than 0: empty string with end not ignored. The above example returns the length of the array to be 7 instead of 3.
3. Equals 0: This is the default value, equivalent to the first overloaded form of split.
Three, ReplaceAll, and Replacefirst methods
The definition of two methods is as follows:
public string ReplaceAll (string Regex, string replacement)
public string Replacefirst (string regex, string replacement)
& nbsp The two methods replace the replacement string with the regex in the current string. The use of the method is simple, no longer detailed here, interested readers can refer to the relevant documentation.
in Java, in order to find out whether a given string has a character or substring that needs to be looked up, or to segment a string, or to replace/delete some characters of a string, these features are typically implemented with If-else, for. as follows:
Java code public class test{ public static void main (String args[]) { string str= "@Shang hai hong qiao fei ji chang "; boolean rs = false; for (Int i=0;i<str.length (); i++) { Char z=str.charat (i); if (' A ' == z | | ' F ' == z) { rs = true; &nbSp break; } else{ rs= false; } } system.out.println (RS); } }
public class test{public
static void main (string args[]) {
string str= "@Shang Hai the Qiao Ji Chang";
Boolean rs = false;
for (int i=0;i<str.length (); i++) {
char z=str.charat (i);
if (' a ' = = Z | | ' F ' = = Z ' {
rs = true;
break;
else{
rs= false;
System.out.println (RS);
}
This method is simple and intuitive, but it is difficult to solve complex work, and the amount of code will increase a lot, is not conducive to maintenance.
At this point, we can use regular expressions to implement these functions, and the code is easy to maintain. Here are some common features of regular expressions for strings in Java, as shown below (where the Java.util.regex package is used):
1.Java query a character in a string or a substring of the Java code string s = "@Shang Hai Qiao Fei Ji Chang"; String regEx = "a| F "; Represents a or F pattern pat = Pattern.compile (regEx); Matcher mat = Pat.matcher (s); Boolean rs = Mat.find ();
String s = "@Shang Hai Qiao Fei Ji Chang";
String regEx = "a| F "; Represents a or F pattern
pat = pattern.compile (regEx);
Matcher mat = Pat.matcher (s);
If there is a regex in S, then RS is true, otherwise it is flase.
If you want to ignore case when looking, you can write pattern pat=pattern.compile (regex,pattern.case_insensitive);
2. Get a section of string in a file Java code string regEx = ". +/(. +) $"; String s = "C:/test.txt"; Pattern Pat = Pattern.compile (regEx); Matcher mat = Pat.matcher (s); Boolean rs = Mat.find (); for (int i=1;i<=mat.groupcount (); i++) {System.out.println (Mat.group (i)); }
String regEx = ". +/(. +) $";
String s = "c:/test.txt";
Pattern Pat = Pattern.compile (regEx);
Matcher mat = Pat.matcher (s);
Boolean rs = Mat.find ();
for (int i=1;i<=mat.groupcount (); i++) {
System.out.println (Mat.group (i));
The above results are test.txt, and the extracted strings are stored in mat.group (i), where I maximum is mat.groupcount ();
3. Segmentation of strings of Java code string regex= ":"; Pattern Pat = Pattern.compile (regEx); String[] rs = pat.split ("aa:bb:cc");
String regex= ":";
Pattern Pat = Pattern.compile (regEx);
After execution, R is {"AA", "BB", "CC"}
If you use regular expression segmentation as shown above, we will generally use the following simpler method: Java code String s = "AA:BB:CC"; String[] Rs=s.split (":");
String s = "AA:BB:CC";
4. Replace/delete Java code string regex= "@+"; Represents one or more @ Pattern pat=pattern.compile (regEx); Matcher mat=pat.matcher ("@ @aa @b cc@@"); String S=mat.replaceall ("#");
String regex= "@+"; Represents one or more @ Pattern
pat=pattern.compile (regEx);
Matcher mat=pat.matcher ("@ @aa @b cc@@");
The result is "# #aa #b cc##.
If you want to delete the @ in the string, you can replace it with an empty string: Java code string S=mat.replaceall ("");
String S=mat.replaceall ("");
Result is "AaB cc"
Note: Description of the Pattern class: The 1.public final class Java.util.regex.Pattern is a compiled expression of regular expressions. The following statement creates a pattern object and assigns a value to the handle pat:pattern Pat = Pattern.compile (regEx); Interestingly, the pattern class is the final class, and its constructor is private. Maybe someone tells you something about design patterns, or you check the information yourself. The conclusion here is that the pattern class cannot be inherited, and we cannot create objects of the pattern class through new. Therefore, in the pattern class, 2 overloaded static methods are provided with the return value being the pattern object (the reference). such as: Java code public static pattern compile (String regex) {return new pattern (regex, 0); } public static pattern compile (String regex) {return
new pattern (regex, 0);
} Of course, we can declare the handle of the pattern class, such as pattern pat = NULL; 2.pat.matcher (str) represents a match with pattern to generate a string of STR, whose return value is a reference to a Matcher class. We can simply use the following methods: Boolean rs = Pattern.compile (regEx). Matcher (str). find (); |
Attached: Commonly used regular Expressions: Match a specific number: ^[1-9]d*$//Matching positive integer ^-[1-9]d*$//matching negative integers ^-? [1-9]d*$//matching integer ^[1-9]d*|0$//matching nonnegative integer (positive integer + 0) ^-[1-9]d*|0$//matching non positive integer (negative integer + 0) ^[1-9]d*.d*|0.d*[1-9]d*$//matching positive floating-point numbers ^-([1-9]d*.d*|0.d*[1-9]d*) $//matching negative floating-point number ^-? ([1-9]d*.d*|0.d*[1-9]d*|0?. 0+|0) $//matching floating-point number ^[1-9]d*.d*|0.d*[1-9]d*|0? 0+|0$//matching nonnegative floating-point number (positive floating-point number + 0) ^ (-([1-9]d*.d*|0.d*[1-9]d*)) |? 0+|0$//matching non-positive floating-point numbers (negative floating-point number + 0) Commentary: useful when dealing with large amounts of data, pay attention to corrections when applied
Match a specific string: ^[a-za-z]+$//Match a string of 26 English letters ^[a-z]+$//Match a string of 26 uppercase letters ^[a-z]+$//Match string consisting of 26 lowercase letters ^[a-za-z0-9]+$//Match a string of numbers and 26 English letters ^w+$//Match A string of numbers, 26 English letters, or underscores
The validation features and their validation expressions when using the RegularExpressionValidator validation control are described below:
Only numbers can be entered: "^[0-9]*$" Only n digits can be entered: "^d{n}$" You can enter at least n digits: "^d{n,}$" Only m-n digits can be entered: "^d{m,n}$" Only numbers beginning with 0 and not 0 can be entered: ^ (0|[ 1-9][0-9]*) $ " You can only enter positive real numbers with two decimal digits: ^[0-9]+ (. [ 0-9]{2})? $ " You can only enter positive real numbers with 1-3 decimal digits: ^[0-9]+ (. [ 0-9]{1,3})? $ " You can only enter a Non-zero positive integer: "^+?" [1-9] [0-9]*$] You can only enter a Non-zero negative integer: "^-[1-9][0-9]*$" You can only enter characters with a length of 3: "^. {3}$ " You can only enter a string of 26 English letters: "^[a-za-z]+$" You can only enter a string of 26 uppercase English letters: "^[a-z]+$" You can only enter a string consisting of 26 lowercase English letters: "^[a-z]+$" You can only enter a string of numbers and 26 English letters: "^[a-za-z0-9]+$" You can only enter a string of numbers, 26 English letters, or underscores: "^w+$" Verify user password: "^[a-za-z]w{5,17}$" in the correct format: Beginning with the letter, length between 6-18,
Only characters, numbers, and underscores can be included. Verify that there are ^%& ',; =?$ ' characters: "[^%&",; = $x 22]+ " Only Chinese characters can be entered: "^[u4e00-u9fa5],{0,}$" Verify email Address: "^w+[-+." w+) *@w+ ([-.] w+) *.w+ ([-.] w+) *$ " Verify InternetURL: "^http://([w-]+.) +[w-]+ (/[w-./?%&=]*)? $ " Verify phone Number: "^ ((d{3,4}) |d{3,4}-)? d{7,8}$"
The correct format is: "Xxxx-xxxxxxx", "xxxx-xxxxxxxx", "xxx-xxxxxxx",
"Xxx-xxxxxxxx", "XXXXXXX", "XXXXXXXX". Verify ID Number (15-bit or 18-digit): "^d{15}|d{}18$" Verify 12 months of the year: "^" (0?[ 1-9]|1[0-2]) $ "The correct format is:" 01 "-" 09 "and" 1 "" 12 " Verify one months of 31 days: "^ (0?[ 1-9]) | ((1|2) [0-9]) |30|31) $ "
The correct format is: "01" "09" and "1" "31".
Matching regular expressions for Chinese characters: [U4E00-U9FA5] Match Double-byte characters (including Chinese characters): [^x00-xff] A regular expression that matches a blank row: n[s|] *r Regular expression matching HTML tags:/< (. *) >.*|< (. *)/>/ Matching a regular expression with a trailing space: (^s*) | (s*$) Regular expression matching an email address: w+ ([-+.] w+) *@w+ ([-.] w+) *.w+ ([-.] w+) * A regular expression that matches URL URLs: http://([w-]+.) +[w-]+ (/[w-/?%&=]*)? |
Refer to the JDK documentation for details of the Java regular expression