Python standard library 01 regular expression (Re package)

Source: Internet
Author: User

Author: vamei Source: http://www.cnblogs.com/vamei welcome reprint, please also keep this statement. Thank you!

 

I will start with the regular expression to talk about the python standard library. Regular Expressions are commonly used tools in Text Processing and do not require additional system knowledge or experience. We will explain the system-related packages later.

 

The main function of a regular expression isSlaveString(String)Through a specificMode(Pattern),SearchContent to be found.

1. Regular expression syntax

Previously, we introduced the string-related processing functions. We can use these functions to implement simple search functions. For example, we can search for the substring "you" from the string "I love you. But sometimes we justFuzzy KnowledgeWhat do we want to find, instead of specifying that I am looking for "You"? For example, if I want to find the numbers contained in a string, these numbers can be any one of 0 to 9. These fuzzy targets are acceptable.As informationWrite a regular expression and pass it to Python to let Python know what we are looking.

(Official documentation)

Using Regular Expressions in Python requires a package in the standard library.Re.

ImportRem= Re. Search ('[0-9]','Abcd4ef')Print(M. group (0 ))

Re. search () receives two parameters. The first '[0-9]' is what we call a regular expression. What it tells python is, "listen, what I want to find from the string isFrom 0 to 9OfA numeric character".

Re. Search () if you find the required substring from the second parameter, an object m is returned.M. Group ()To view the search results. If no matching characters are found, re. Search () returnsNone.

 

If you are familiarLinuxOrPerlYou should be familiar with regular expressions. When we open the Linux Shell, we can use regular expressions to find or delete the files we want, for example:

$ RM book00000-9000000000-90000.txt

Delete the file on book02.txt. The information contained in book1_0-92.160-92.16.txt is a file name that starts with book, followed by two numeric characters, and followed by ". txt. If the file name does not meet the conditions, for example:

Bo12.txt

Book1.txt

Book99.text

Are not selected.

Perl has a regular expression function, which is said to be the strongest among all regular expression systems. This is also one of the reasons why Perl has become a powerful tool for system administrators.

 

2. Regular Expression Functions

 
M = Re. Search (pattern, string)#Search for the entire string until a matched substring is found.M = Re. Match (pattern, string)#Check whether the string matches the regular expression from the beginning. It must begin with the first character of the string.

You can select one of the two functions.Search. In the above example, if we use re. match () will get none, because the string starts with 'A' and does not meet the requirements of '[0-9.

For returned M, we use M. Group () to call the result. (We will explain M. Group () in more detail later ())

 

We can also search for the searched substring.Replace:

 
STR = Re. sub (pattern, replacement, string)
#Use regular expression Transformation Pattern in string to search. Replace the searched string with another string replacement. Returns the replaced string.

 

In addition, common Regular Expression functions include

Re. Split () # Based on the Regular ExpressionSplitString to put all the substrings in a table (list) and return

Re. findall () # Based on the Regular ExpressionSearchString, SetAllThe matched substring is placed in a table (list) and returned.

 

(After getting familiar with the above functions, you can take a look at re. Compile () to improve search efficiency .)

 

3. Write a regular expression

The key is to write information into a regular expression. Let's first look at the common syntax of Regular Expressions:

1) single character:

. ArbitraryOneCharacter

A | B characterOrCharacter B

A character of [AFG] A, F, or G.

[0-4] a character in the 0-4 range

[A-F] a character in the-F range

[^ M] is not a character of m

\ S a space

\ S a non-space

\ D [0-9]

\ D [^ 0-9]

\ W [0-9a-za-z]

\ W [^ 0-9a-za-z]

 

2) duplicate

Keep upAfter a single character, it indicates multiple such similar characters

* Repeated> = 0Times

+ Repeated> = 1Times

? Repeated0 or 1Times

{M}RepeatedMTimes. For example, a {4} is equivalent to AAAA. For example, [1-3] {2} is equivalent to [1-3] [1-3].

{M, n}RepeatedM to nTimes. For example, a {2, 5} indicates that a repeats two to five times. The number of duplicates smaller than MB or greater than N does not meet the conditions.

 

Example of a regular expression string

[0-9] {3, 5} 9678

A? B

A + B aaaaab

 

3) Location

^ StringStartLocation

$ StringEndLocation

 

The regular expression matches the string example and does not match the string.

^ AB. * C $ abeec cabeec (if Re. Search () is used, it cannot be found .)


4) Return Control

We may further streamline the search results. For example, the following regular expression:

Output _ (\ D {4 })

This regular expression enclose a small regular expression with parentheses (), \ D {4 }. This small regular expression is used to filter the expected information from the result (here it is a four-digit number ). A part of the regular expression enclosed in parentheses is called a group ).
We can use the M. Group (number) method to query a group. Group (0) is the search result of the entire regular expression, and group (1) is the first group ......

 
ImportRem= Re. Search ("Output _ (\ D {4 })","Output_1986.txt")Print(M. Group (1 ))

 

We can also name the Group to better use the M. group query:

 
ImportRem= Re. Search ("Output _(? P <year> \ D {4 })","Output_1986.txt")#(? P <Name>...) is namedPrint(M. Group ("Year"))

 

4. Exercise
There is a file named output_1981.10.21.txt
. Python is used below: Read the date and time information in the file name and find out the day of the week. Change the file name to output_yyyy-mm-dd-0000txt (yyyy: four-digit year, MM: two-digit month, DD: two-digit day, W: one-digit Week, and assume Monday is the first day of a week)

 

 

Summary:

Re. Search () Re. Match () Re. sub () Re. findall ()

Regular Expression Construction Method

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.