Python regular expression), python escape

Source: Internet
Author: User

Python Regular Expression (Escape problem), python escape

Let's first talk about a relatively embarrassing thing: When I write a Xiami music audition package, I encountered a problem, because the saved files are all named by the music title, therefore, when you encounter titles that contain illegal characters such as "logging handler/out border" (hum, it means you → _ → Windows), it will fail to be saved. So I think of the Solution of Thunder: replace all invalid characters with underscores.

So we introduced the use of regular expressions. After some searching, I wrote the following function:

Copy codeThe Code is as follows:
Def sanitize_filename (filename ):
Return re. sub ('[\/:*? <> |] ',' _ ', Filename)

Recently I realized many problems in this function:

  • Python and Shell are different. whether single quotes or double quotation marks, the backslash is an escape character. Python does not make any sense.\/Is unchanged.
  • Even so,sanitize_filename('\\/:*?<>|')Still return\_______Not all are underscores.

So I felt like I was reading the document.

Raw strings

After reading the document, I realized that the escape Function of the Python Regular Expression module is independent. For example, to match a backslash character, you must write the parameter :'\\\\':

Python escapes the string \\\\ \\
The re module obtains the passed \ and interprets it as a regular expression. According to the escape rules of the regular expression, it is escaped \
In this case, Raw String has a lot to do. As the name suggests, it is a String (except the backslash at the end) that will not be escaped. Therefore, you can write R' \ 'By matching a backslash character '\\'.

So the above sanitize_filename is changed:

Copy codeThe Code is as follows:
Def sanitize_filename (filename ):
Return re. sub (R '[\\/:*? <> |] ',' _ ', Filename)

Regex and Match

So let's take a look at the re module ~ The following is a flow account for acute viewing.

The main objects in the re module of Python's regular expression are actually:

RegexObject
Match MatchObject
RegexObject is a regular expression object, and all operations such as match sub belong to it. Generated by re. compile (pattern, flag.

Copy codeThe Code is as follows:
>>> Email_pattern = re. compile (R' \ w + @ \ w + \. \ w + ')
>>> Email_pattern.findall ('My e-mail is abc@def.com and his is user@example.com ')
['Abc @ def.com ', 'user @ example.com']

The method is as follows:

Search matches any character and returns MatchObject or None
Match starts from the first character and returns MatchObject or None
Split returns the List separated by a match.
Findall returns all matched lists.
Finditr returns the MatchObject iterator
Sub returns the replaced string
Return Value of subn (replacement string, replacement times)
Functions provided by the re module, such as re. sub re. match re. findall, can be considered as a shortcut to directly create a regular expression object. The RegexObject itself can be used repeatedly, which is also the advantage of RegexObject over these shortcut functions.

MatchObject is a matching object, indicating the result of a regular expression match. Returned by some RegexObject methods. The matching object is always True, and there are also a lot of methods to obtain group-related information in regular expressions.

Copy codeThe Code is as follows:
>>> For m in re. finditer (R' (\ w +) @ \ w + \. \ w + ', 'My email is abc@def.com and his is user@example.com '):
... Print '% d-% d % s % s' % (m. start (0), m. end (0), m. group (1), m. group (0 ))
...
12-23 abc abc@def.com
User user@example.com 35-51

Reference
  • The Python Standard Library: http://docs.python.org/2/library/re.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.