Java String. replaceAll () and Back Reference (backreference), stringreplace
Problem
Yesterday I saw a blog post about a Java interview question:
If the string contains "*", the "*" is deleted. If the string starts with or ends with "*", the "*" is retained "*".
For example ):
* --> *
** --> **
* ** --> **
* AB ** de ** --> * abde *
I think the regular expression should be used for processing, but I cannot figure out how to write the regular expression.
Solution 1
Someone in the reply to this blog post gives the following answer:
str.replaceAll("(^\\*)|(\\*$)|\\*", "$1$2");
Verify on the machine. The answer is correct, but I don't know why regular expressions should be written like this. I posted a post on stackoverflow and asked about the problem. When I asked this question, I was confused about it.
The following is my understanding. If there is something wrong with it, please make more photos:
ReplaceAll () is a Java String class method:
public String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
(Note that the first parameter of this method is a regular expression. I used to plant a heel on the first parameter. However, this time I planted it on the second parameter .)
"(^ \ *) | (\ * $) | \ *" Explanation:
(^ \ *): Capturing group 1, matching * (\ * $) at the start of the string: capturing group 2, matching * \ * at the end of the string *\\*: match any location *
- Because "*" is a special character in a regular expression, you must use the Escape Character "\". However, "\" is a special character in Java, so you need to use "\" again, which leads to two "\" in front "\".
- Parentheses "()" Use the content in the brackets as a capturing group to prepare for the backreference. For more information about capturing group, see here.
- "|" Indicates that the Left and Right expressions are "or.
- "\ *" Can be used separately to match "*" at any position in the string "*". However, in the above expression, the "*" at the beginning and end is preferentially matched by "(^ \ *)" or "(\ * $.
Therefore, the above expression can match the "*" at the beginning of the string "*",
OrMatch "*" at the end of the string "*",
OrMatches "*" at any position of the string "*". That is to say, all "*" in the string are matched. "$1 $2" explanation:
$1: backreference first capturing group $2: backreference second capturing group
The content of "$1" and "$2" in this parameter is used to replace the matching string in the previous parameter.
Take the string "* AB ** de **" as an example:
Note: In a regular expression, backreference is represented by "backslash + number", for example, \ 1, \ 2. However, when the backreference appears in the replacement string, Java's backreference uses the dollar sign + number, for example, $1, $2. It is said that this is learned from Perl. If you are not tired, read this post.
The second method is to use a regular expression. If "*" is neither the header nor the end, it is replaced with null. This idea is natural, but it is not easy to implement.
String repl = str.replaceAll("(?<!^)\\*+(?!$)", "");
Regular Expression explanation:
(? <! ^) # If the previous position is not the beginning of the line \ * + # match one or more *(?! $) # If the next position is not the end of the row
"? <! "Negative Lookahead ,"?! "Indicates Negative Lookbehind. For more information, see here and here.
Solution 3
String repl = str.replaceAll("(^\\*)|(\\*$)|\\*+", "$1$2");
The second answer is answered by the same person, but this answer has some questions: if there are two or more "*" at the end, these "*" are replaced with null.
For example, if the input is "* AB ** de **", the output is "* abde", and the last "*" is missing.
This is because by default, regular expression matching is in Greediness (Greedy) Match mode and will match as many characters as possible. "\ * +" Can match one or more "*". In the second to last "*", match a "*" or two. But it is greedy, so the last two "*" are matched, and then replaced with null by "$1 $2.
Changing the regular expression matching to Laziness can solve this problem. Add "? "It turns into a match for Laziness:" \ * +? ".
String repl = str.replaceAll("(^\\*)|(\\*$)|\\*+?", "$1$2");
For more information about Greediness and Laziness, see here.
Regular Expression Efficiency
This website can test regular expressions and provide detailed explanations. It also provides the number of steps required for matching. You can use this number to compare the expression efficiency. From the perspective of this website, the second method is the most efficient.
Reference