The trouble and pitfalls of a backslash (\) in a Python regular expression

Source: Internet
Author: User

here is a little bit of experience: For the following two reasons, using backslashes in regular expressions creates a double-conversion problem.
(1), when Python handles the string itself, the backslash is used to escape the character

(2), regular expressions also use backslashes to escape characters

    How should the regular expression be written to match 1 backslashes in a string? "\ \" Is this OK? Try to know, re module thrown exception, because in the regular expression, "\ \" is a backslash, for the regular expression parser, is an escape character, but nothing behind, the natural error, "\\\" Three is definitely not, try four "\\\\", perfect match.
The code is as follows:
Import re
Re_str_patt = "\\\\"
Reobj = Re.compile (Re_str_patt)
Str_test = "Abc\\cd\\hh"
Print Reobj.findall (str_test)

Output :[' \ \ ', ' \ \ '] remark:1. The second line of code uses only Python non-native strings, so it represents a backslash in a regular expression. (That is, si he one) 2, because the backslash in the Python string means escape, so the string in line fourth represents the following: ABC is a back slash, and then answerCD, then connect a back slash, and then the HH3, the code snippet output is a list, the list has two elements.     Each element is a string (a string in Python), so the first element of the list actually represents a backslash, and the second element of the list also represents a backslash. 4, the output may also be this:[R ' \ ', R ' \ ']The output effect of the two kinds of outputs is consistent. The code changes as follows: Import re
Re_str_patt = r "\\\\"
Reobj = Re.compile (Re_str_patt)
Str_test = "Abc\\cd\\hh"
Print Reobj.findall (str_test)

Output:[] remark:1. The second line of code is changed to the native string, at which point the regular expression is matched by two consecutive backslashes. (namely the two -in-oneness) 2, the string in the fourth line of code represents: After ABC is a back slash, and then answerCD, then connect a back slash, followed by HH. 3, so there is no matching content, the output is an empty list.
For the first piece of code to understand this, first re-conversion isstringIts own escape, then "\\\\", is actually representing two backslashes (two characters), and then passing inRegular ExpressionsParser, because the backslash is still the escape character, then the second conversion, two backslash represents a backslash, so it can be matched with a backslash, then match the continuous two backslash, write the regular expression will write 8 times "\", quite spectacular. \d+ in a regular expression that matches more than 11 consecutive numeric characters, but if you want to match: a backslash, followed by the letter D, followed by a plus sign, how do you write this string? (Answer: "\\\\d\\+") The code is as follows:
Import re
Re_str_patt = "\\\\d\\+"
Print Re_str_patt
Reobj = Re.compile (Re_str_patt)
Print Reobj.findall ("\\d+") output:\\d\+
[' \\d+ ']
Write Re_str_patt = "\\\\d\+" is also OK, because \+ for the string, there is no escape meaning, so it is a backslash.
What does it mean to write regular expressions in Python with the most used raw string, the native string? There is only one conversion, no string conversion, only within the regular expression of the conversion, so that a regular expression that matches a backslash can be written like this, Re_str_patt = r "\ \". Some people will think, later write Windows file path of what convenient, hehe direct path = r "c:\myforder\xx" fix, yes, this sentence no problem, but if you write path = r "c:\myforder\xx\", direct error, why?     Because the backslash does not act as an escape character, it has an effect on the quotation marks (including single quotes) behind it, so that the quotation mark is not treated as a termination of the string, because there are characters behind it, but it does not, so it will be an error. In fact, you can turn to the raw string inside to indicate the quotation mark what to do? , you can find that path = r "\\123\" xxx is OK, and that raw string is not a limitation? However, Raw is designed to support the regular expression, and in the regular backslash is the escape character, so it is not possible to appear at the end of the string, so it is recommended not to use raw in other places.

The trouble and trap of a backslash (\) in a Python regular expression (go)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.