In Java, regular expressions are used to process text data.
This article describes how to use regular expressions in Java to process text data. A regular expression is a string, but unlike a normal string, a regular expression is an abstraction of a group of similar strings, as shown in the following strings:
A98b c0912d c10b a12345678d AB
We carefully analyze the above five strings, and we can see that they have a common feature, that is, the first character must be 'A' or 'C ', the last character must be 'B' or 'D', and the character in the middle is composed of any number (including 0 digits ). Therefore, we can abstract the common features of these five strings, which produces a regular expression: [ac] \ d * [bd]. Based on this regular expression, we can write an infinite number of strings that meet the conditions.
There are many methods to use regular expressions in Java. The simplest is to use them with strings. There are four methods in String that can use regular expressions: matches, split, replaceAll, and replaceFirst.
I. matches Method
The matches method can be used to determine whether the current string matches the given regular expression. If yes, true is returned. Otherwise, false is returned. The matches method is defined as follows:
Copy codeThe Code is as follows:
Public boolean matches (String regex)
As shown in the above regular expression, we can use the following program for verification.
<!---->String[] ss = new String[]{"a98b", "c0912d", "c10b", "a12345678d", "ab"};for(String s: ss) System.out.println(s.matches("[ac]\\d*[bd]"));
Output result:
True
True
True
True
True
The following briefly explains the meaning of this regular expression. If we have learned the lexical analysis of the compilation principle, we can easily understand the regular expression above (because the expression method of the regular expression is similar to the expression in the lexical analysis ). For example, in [...], it is equivalent to or "|". For example, [abcd] is equivalent to a | B | c | d, that is, a, B, c, or d. The first part of the above regular expression is [ac], which means that the string can only start with a or c. [Bd] The end of a string can only be B or d. The \ d in the middle expresses the 0-9 number. \ is used to represent \ because it has a special meaning in the regular expression \. * Indicates that there are 0 or infinite numbers (this is called * closure in lexical analysis). Because * follows \ d, there are 0 or infinite numbers.
Ii. split Method
The split method uses a regular expression to split a String and returns the split result in the form of a String array. Split has two overload forms, which are defined as follows:
<!---->public String[] split(String regex)public String[] split(String regex, int limit)
The following code uses the first reload form of split to split the first line of the HTTP request header. The Code is as follows:
<!---->String s = "GET /index.html HTTP/1.1";String ss[] = s.split(" +");for(String str: ss)System.out.println(str);
Output result:
GET
/Index.html
HTTP/1.1
When using the first reload form of split, note that if the last empty string of the split string is ignored. For example, when the regular expression \ d is used to separate the string a0b1c3456, the length of the obtained array is 3 rather than 7.
There is a limit parameter in the second overload form of split, which should be discussed in three cases:
1. greater than 0: if the limit value is n, n-1 will be used for the regular expression. The following code:
<!---->String s = "a0b1c3456";String ss[] = s.split("\\d", 3);for(String str: ss) System.out.println(str);
Output result:
A
B
C3456
From the output, we can see that the program only uses two regular expressions for "a0b1c3456", that is, after scanning the character '1' less, no matter whether there are any strings that meet the conditions, all strings are taken as a whole to return the last value of the array.
2. less than 0: Empty strings at the end are not ignored. That is, the length of the returned array in the above example should be 7, not 3.
3. equal to 0: This is the default value, which is equivalent to the first reload form of split.
Iii. replaceAll and replaceFirst Methods
The two methods are defined as follows:
public String replaceAll(String regex, String replacement)public String replaceFirst(String regex, String replacement)
The two methods use replacement to replace the string matching the regex in the current string. It is easy to use and will not be detailed here. Interested readers can refer to relevant documents.