JDK 1.4 provides built-in regular expressions, and the corresponding string class also provides many methods related to regular expressions, such as matches, replaceall, and split methods, it provides a lot of convenience for daily applications.
In my work, I found that mastering some less commonly used skills can often greatly improve efficiency. The following is my summary:
1. make proper use of embedded flag.
In some cases, we may encounter the following situation:
It is required to match a string, such as abcdefg, where ABC must be lower case, Def case does not matter, and G is lower case.
The common practice for this type of problem is to generate a pattern as follows:
Pattern P = pattern. Compile ("abcdefg", pattern. case_insensitive); however, this does not meet our requirements. In this case, embedded flag can be used.
Check the JDK documentation. You can see the following in special construts (non-capturing:
(? Idmsux-idmsux) Nothing, but turns match flags on-off
(? Idmsux-idmsux: X) X, as a non-capturing group with the given flags on-off
The meaning of each letter is
Embedded flags construction flags meanings
I pattern. case_insensitive enables case-insensitive matching.
D pattern. unix_lines enables UNIX lines mode.
M pattern. multiline enables multi line mode.
S pattern. dotall enables "." To match line Terminators.
U pattern. unicode_case enables Unicode-aware case folding.
X pattern. Comments permits white space and comments in the pattern.
--- Pattern. canon_eq enables canonical equivalence.
In this way, we can write the pattern string that meets the requirements:
Pattern P = pattern. Compile ("ABC (? I) def (? -I) g ");
In this pattern, we set case_insensitive to on before the def substring, and then set it to off to ensure that abcg is lowercase, while def can be case-sensitive.
Some people may think that it is not so complicated to write directly
Pattern P = pattern. Compile ("ABC [DD] [EE] [ff] G ");
It can also solve the problem.
Indeed, in this case, the two methods have the same effect. However, sometimes we cannot determine the content of a string in advance (for exampleCodeTo extract a node, that is, the content between a pair of tags, and the tag is dynamically determined at runtime). In this case, only match flag can be used.
If the match flag is modified only once, its function range starts from this point until it reaches the end of pattern, that is, the following two pattern:
Pattern p1 = pattern. Compile ("(? Is) abcdefg ");
Pattern P2 = pattern. Compile ("abcdefg", pattern. case_insensitive | pattern. dotall );
They serve the same purpose.
Match flag can be used in combination, for example (? Is) indicates that it is case insensitive and can be used to match line breaks. At the same time, it can also be used with an external int flag, for example:
Pattern pattern = pattern. Compile ("(? -I: [A-Z]) [A-Z] * ", pattern. case_insensitive );
It indicates a word whose first character is a capital English letter. Note that the first character is in the non-capturing group. Therefore, pay special attention to the group count.
In general, we call the matches method of the string class to verify the validity of the string. In the case of a complicated situation, the rational use of match flag will help us get twice the result with half the effort.
2. Use split
Before introducing split, if you need to split strings, you must use the stringtokenizer class, which is very troublesome to use and lacks flexibility (because there are multiple choices for strings that cannot be split). jdk1.4 introduces the split method, it is convenient to use regular expressions to separate strings.
In the past, we may need to write:
Stringtokenizer St = New Stringtokenizer ( " This is a test " );
While (St. hasmoretokens ()){
System. Out. println (St. nexttoken ());
}
Now you only need:
String [] Words = " This is a test " . Split ( " \ S " );
For ( String / S: words ){
System. Out . Println (s ); }
3. About replaceall
The replaceall method is introduced for the string in jdk1.4 to facilitate string replacement.
String source = "abcdefg ";
String result = source. replaceall ("BCD", "BCD ");
System. Out. println ("result is:" + result );
The result is:
Abcdefg
Note that the two replaceall parameters are regular expressions.
Str. replaceall (RegEx, REPL) is equivalent
Pattern. Compile (RegEx). matcher (STR). replaceall (repl)
In some cases, we need to use back reference when replacing, that is, the content to be replaced is uncertain, and the replacement result is related to the content to be replaced, in this case, you will find the benefits of using regular expressions.
For example, if a text contains many "Class-student ID" (for example, 172-04) strings, you must change them to the format of "student ID-class" (04-172, we can use the $ symbol in the replacement result to back the part of the replaced content:
String Source = " 172-04 " ;
System. Out. println (source. replaceall ( " (\ D +)-(\ D +) " , " $2-$1 " );
Result:
04-172