Python Regular expression (escaping problem)

Source: Internet
Author: User
First of all, a rather embarrassing thing: in writing the shrimp music to try to listen to the downloader encountered a problem, because the saved files are named after the title of the music, so encountered some such as "'s really into Zhi/out border" such as contain illegal characters (hem, said you →_→ Windows) title, It will save the failure. So I think of the solution of Thunderbolt: to replace all illegal characters with underscores.

The use of regular expressions is then introduced. After searching for swallowed, I wrote the following function:

Copy the Code code as follows:


def sanitize_filename (filename):
return re.sub (' [\ \:*? <>|] ', ' _ ', filename)

Recently realized many of the problems in this function:

    • Unlike a Shell, a backslash is an escape character, regardless of the single or double quotation marks. The dog poop is that Python's handling of meaningless escapes \/ is kept intact.
    • Even so, sanitize_filename('\\/:*?<>|') The return \_______ is still not all underlined.

So I felt Turkey to look at the document.

Raw strings

After reading the document, we realized that the escape of the Python regular expression module was independent. For example, matching a backslash character requires a parameter to be written: ' \\\\ ':

Python escapes the string: \\\\ is escaped to \ \
The RE module obtains the incoming \ \ to interpret it as a regular expression, escaping it as a regular expression by escaping the rule as \
In such a troublesome premise, Raw string can do a very much, as the name implies (except the trailing backslash) will not be escaped the string. So you can write R ' \ \ ' by matching a backslash character.

So the above Sanitize_filename changed to:

Copy the Code code as follows:


def sanitize_filename (filename):
Return Re.sub (R ' [\\/:*? <>|] ', ' _ ', filename)

Regex and Match

So seriously look at the RE module bar ~ The following is a running account for the impatient watch.

Python's regular expression module the main objects in re are these two:

Regular Expression Regexobject
Match Matchobject
Regexobject is a regular expression object, and all operations such as match sub are owned by it. Generated by re.compile (pattern, flag).

Copy the Code code as follows:


>>> Email_pattern = re.compile (R ' \w+@\w+\.\w+ ')
>>> email_pattern.findall (' My email is abc@def.com and he is user@example.com ')
[' abc@def.com ', ' user@example.com ']

One of the methods:

Search starts from any character and returns Matchobject or None
Match starts with the first character, returns Matchobject or None
Split returns the List that was split by the match
FindAll returns all matching List
Finditr returns an iterator to Matchobject
Sub returns the replaced string
SUBN return (replacement string, number of replacements)
Functions provided in the RE module, such as Re.sub Re.match Re.findall, can actually be thought of as a shortcut to eliminate the direct creation of regular expression objects. And since the Regexobject object itself can be reused, this is the advantage of its relative to these shortcut functions.

Matchobject is a matching object that represents the result of a regular expression match. Returned by some methods of Regexobject. Matching objects are always True, and there is a whole bunch of ways to get information about grouping in regular expressions.

Copy the Code code as follows:


>>> for M in Re.finditer (R ' (\w+) @\w+\.\w+ ', ' My e-mail is abc@def.com and he is user@example.com '):
... print '%d-%d%s%s '% (M.start (0), m.end (0), M.group (1), M.group (0))
...
12-23 ABC abc@def.com
35-51 User user@example.com

Reference

    • The Python standard Library: http://docs.python.org/2/library/re.html
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.