Python standard library 01 regular expressions (re-packages)

Last Update:2016-12-24 Source: Internet

Author: User

Tags repetition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Vamei Source: Http://www.cnblogs.com/vamei Welcome reprint, Please also keep this statement. Thank you!

I'll start with the regular expression of Python's standard library. Regular expressions are a common tool in word processing and do not require additional system knowledge or experience. We will explain the system-related packages in the back.

The primary function of regular expressions (regular expression) is to search for what you want to find by using a specific pattern (pattern) from a string.

Grammar

Previously, we introduced string-related processing functions. We can implement simple search functions through these functions, such as searching for a substring of "you" from the string "I love You". But sometimes we just have a vague idea of what we're looking for, and we can't say specifically that I'm looking for "you", for example, I want to find out the numbers contained in the string, which can be anywhere from 0 to 9. These vague targets can be passed to Python as information written to the regular expression, allowing Python to know what we are looking for.

(Official documentation)

Using regular expressions in Python requires a package re in the standard library.

Import rem = Re.search (' [0-9] ', ' Abcd4ef ') print (M.group (0))

Re.search () receives two parameters, the first ' [0-9] ' is what we call a regular expression, and it tells Python, "Listen, I'm looking for a numeric character from 0 to 9 from a string."

Re.search () If the required substring is found from the second argument, an object m is returned, and you can view the results of the search by using the M.group () method. If no qualifying characters are found, Re.search () returns none.

If you are familiar with Linux or Perl, you should already be familiar with regular expressions. When we open the Linux shell, we can use regular expressions to find or delete the files we want, such as:

$RM Book[0-9][0-9].txt

This is to delete a file similar to Book02.txt. Book[0-9][0-9].txt contains information that begins with a book, followed by two numeric characters, followed by a ". txt" file name. If the file name does not meet the criteria, say:

Bo12.txt

Book1.txt

Book99.text

Will not be selected.

The ability to have regular expressions built into Perl is said to be the strongest of all regular expression systems, which is one reason why Perl is a powerful tool for system administrators.

Functions of regular expressions

m = Re.search (pattern, string)  # searches the entire string until a matching substring is found. m = Re.match (pattern, string)   # Check to see if the string conforms to the regular expression from the beginning. Must match from the beginning of the first character of a string.

You can select one of these two functions to search. In the above example, if we use Re.match (), we will get none because the string starts with ' a ' and does not conform to the ' [0-9] ' requirement.

For the returned m, we use M.group () to invoke the result. (We'll explain in more detail later m.group ())

We can also replace the searched substring after the search:


# Search using regular transform pattern in string, replace with another string replacement for the searched string. Returns the replaced string.

In addition, the usual regular expression functions have

Re.split () # Splits a string according to a regular expression, placing all the substrings in a table (list) back

Re.findall () # Searches a string based on a regular expression, placing all of the matching substrings in a given table (list) and returning

(Once you're familiar with the above function, you can look at Re.compile () to improve your search efficiency.) )

Write a regular expression

The key is to write information into a regular expression. Let's look at the common syntax for regular expressions:

1) Single character:

. Any one of the characters

A|b character A or character B

[AFG] A or F or a character of G

[0-4] 0-4 in the range of a character

[A-f] A character in the A-f range

[^m] is not a character of M

\s a space

\s a non-whitespace

\d [0-9]

\d [^0-9]

\w [0-9a-za-z]

\w [^0-9a-za-z]

2) Repeat

followed by a single character, representing several such similar characters

* Repeat >=0 times

+ Repeat >=1 times

? Repeat 0 or 1 times.

{m} repeats m times. For example, a{4} is equivalent to AAAA, then for example [1-3]{2} equivalent to [1-3][1-3]

{m, n} repeats M to n times. For example, a{2, 5} means a repeats 2 to 5 times. A repetition less than m, or a repetition greater than N, does not meet the criteria.

Examples of strings that match regular expressions

[0-9] {3,5} 9678

A?b b

A+b Aaaaab

3) Location

^ Starting position of the string

$ end position of the string

The regular expression matches the string example does not match the string

^ab.*c$ ABEEC CABEEC (if used with Re.search (), will not be found. )

4) Return control

It is possible to further refine the results of the search. For example, one of the following regex expressions:

Output_ (\d{4})

The regular expression encloses a small regular expression with parentheses (), \d{4}. This small regular expression is used to filter the desired information from the results (here is the four-digit number). This is part of a regular expression that is enclosed in parentheses, called a group.
We can query the group by M.group (number). Group (0) is the entire regular expression of the search results, group (1) is the first group ...

Import rem = Re.search ("Output_ (\d{4})", "Output_1986.txt") Print (M.group (1))

We can also name the group to better use the M.group query:

Import rem = Re.search ("Output_ (? P<YEAR>\D{4}) "," Output_1986.txt ")   # (? P<name>, ...) Name Print for group (M.group ("Year"))

Practice
There is a file that has a file name of Output_1981.10.21.txt. Use Python below: Read the datetime information in the file name and find out what the day is. Rename the file to Output_yyyy-mm-dd-w.txt (YYYY: Four-bit year, MM: two-bit month, DD: Two-bit day, W: one-digit week, and assumes Monday is the first day of the week)

Summarize

Re.search () Re.match () re.sub () Re.findall ()

Regular Expression Composition method

Python standard library 01 regular expressions (re-packages)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More