Java String.replaceall () and back reference (backreference)

Source: Internet
Author: User

Problem

I saw a blog post yesterday, and I talked about a Java interview question:

Given a string, if the middle of the string contains "*", the "*" is deleted, and if the string first or last character is "*", the "*" is preserved.

Give a few examples (the arrow to the left is the input, the arrow to the right is the output):

*--*

*--*

-* *

*ab**de**-*abde*

I think it should be handled with regular expressions, but I can't figure out how to write the regular expression.

The first kind of solution

The following answer is given in the Post's reply:

Str.replaceall ("(^\\*) | ( \\*$) |\\* "," $1$2 ");

on the computer to verify the answer is correct, but do not understand why the regular expression to write. To StackOverflow on the Post asked, just about understand is how to go. When asked, the question is not clear, so the question is also confused.

Here is my understanding, the wrong place please make more bricks:

ReplaceAll () is a method of the Java String class:
 Public String ReplaceAll (string regex, string replacement)
Replaces each substring of this string, matches the given regular expression with the given replacement.
(especially note that the first parameter of this method is a regular expression.) I used to have a somersault on the first parameter. However, this time I planted on the second parameter. )
"(^\\*)| (\\*$) |\\* "Explanation:
(^\\*): Capturing group 1, matching the beginning of the string(\ \*$): Capturing group 2, matching the string at the end of the *\ \*: Match any position *
    • because "*" is a special character in a regular expression, you need to use the escape character "\". But in Java , "\" is also a special character, so you need to use "\" again, so that "*" is preceded by two "\".
    • The parenthesis "()" takes the bracketed content as a capturing group, preparing for the subsequent backreference. For capturing group please see here.
    • "|" indicates that the left and right expressions are "or" relationships.
    • " \\* "If you use it alone, you can match the" * "anywhere in the string. In the above expression, however, the "*" priority at the beginning and end is matched by "(^\\*)" or "(\\*$) " .
So the above expression can match the "*" at the beginning of the string, orMatch the "*" at the end of the string, orMatches the "*" anywhere in the string. That is, all the "*" in the string match up. "$1$2" Explanation:
$: backreference First capturing group$2:backreference second capturing group

The contents of "$" and "$" in this parameter are used to replace the matched string in the previous parameter.

take the string "*ab**de**" for example:

    1. The first "*" Match, using "$1$2" to replace. The contents of "$" are "*" and the contents of "$" are empty, so the first "*" is replaced by itself.

    2. Next, "A" and "B" do not match, skip, and continue to walk backwards.

    3. The second "*" Match, using "$1$2" to replace. The contents of "$" are empty and "$" is empty, so this "*" is replaced with empty.

    4. The third "*", like the second "*", is also replaced with an empty one.

    5. The following "D" and "E" do not match, continue to go backwards.

    6. The fourth "*" match, like the second and third "*", is replaced with an empty one.

    7. The last "*" Match, using "$1$2" to replace. At this point the contents of "$" are empty and "$" is "*", so the Last "*" is replaced by itself.

    8. The final result is: "*abde*"

one thing to note here: in regular expressions, backreference is represented by "backslash + number", for example: \1, \2. However, when Backreference appears in the replacement string, Java's backreference is denoted by the dollar sign + number, for example: $ $, $ $. It is said to have been studied with Perl. Take a look at this post if you are not too tired .

Second solution another way to use regular expressions is to replace empty if "*" is not in the header or at the end. The idea is natural, but it's not easy to achieve.
String repl = Str.replaceall ("(? <!^) \\*+ (?! $)", "");

The regular expression explains:

(? <!^)   # If the previous position is not the beginningof the line \ \ *+     # matches one or more *(?!) $)    # If the next position is not the end of the line

"? <!" means negative Lookahead, "?!" represents negative lookbehind. Please refer to here and here for detailed instructions.

The Third kind of solution
String repl = Str.replaceall ("(^\\*) | ( \\*$) |\\*+ "," $1$2 ");

This second answer to the above is answered by the same person, but the answer is a bit problematic: if there are two or more "*" At the end, these "*" are replaced with empty.

For example, if the input is "*ab**de**", then the output is "*abde", and the Last "*" is missing.

This is because, by default, the regular match is in greediness (greedy) match mode and matches as many characters as possible. "\*+" can match one or more "*". At the bottom of the second "*", Match a "*" or two "*" can be. But it is more greedy, so the last two "*" are matched, and then replaced by "$1$2" empty.

Changing the regular match to laziness (lazy) matching solves this problem. Adding a "?" after the expression becomes a laziness match: "\*+?".

String repl = Str.replaceall ("(^\\*) | ( \\*$) |\\*+? "," $1$2 ");

See here for greediness and laziness.

Regular Expression Efficiency

The site can test the regular expression and give a detailed explanation. It also gives the number of steps required to match, and you can use this step to compare the efficiency of the expression. From this website, the second method is the most efficient.

Reference Links
    • Huawei Machine Questions-it's time to wake up, Sao
    • Lookahead and Lookbehind zero-length assertions
    • Java String.replaceall () with the back reference
    • Backreferences Syntax in Replacement Strings (why Dollar sign?)
    • Java Docs on Java.lang.String.replaceAll ()
    • Regular Expressions 101
    • Capturing Groups
    • Regular expression 30-minute introductory tutorial (highly recommended)

Java String.replaceall () and back reference (backreference)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.