Python basics 3: re Regular Expressions and python Regular Expressions

Source: Internet
Author: User

Python basics 3: re Regular Expressions and python Regular Expressions

A regular expression (or RE) is a small, highly specialized programming language embedded in Python that only matches characters.

1. Character Type:

Common characters and metacharacters;

1) common characters: Most characters and letters will match themselves, for example:

T2 = re. findall ('ahh', 'yahhkkgggtngslahh ') # matching result ['ahh', 'ahh']

2) metacharacters:. ^ $ * +? {} [] | ()/

  • . Match any character except linefeed

  • ^ Indicates the start of the matching string. For example, ^ hello 'matches 'helloworld' but does not match 'aaaahellobbb.

  • $ Indicates matching the end of a string. Refer to ^

  • [] Is used to match a specified character set.

  • | Or. Match | either of the left and right expressions matches from left to right. If | is not included in (), its range is the entire regular expression.

    * Repeated zero or more times

  • + Repeat once or more times

  • ? Zero or one repetition

  • {N} repeats n times, {n,} repeats n times or more times, and {n, m} repeats n to m times.

  • \ Escape characters followed by metacharacters indicate that special functions are removed. \ special functions are implemented after common characters. Strings matched by the word group corresponding to the serial number are referenced, mainly including:

      • \ A only matches strings.

      • \ Z only matches at the end of the string

      • \ B matches an empty string at the beginning or end

      • \ B matches an empty string that does not start or end

      • \ D is equivalent to [0-9]

      • \ D is equivalent to [^ 0-9]

      • \ S matches any blank characters: [\ t \ n \ r \ v]

      • \ S matches any non-blank characters: [^ \ t \ n \ r \ v]

      • \ W match any number and letter: [a-zA-Z0-9]

      • \ W match any non-digit and letter: [^ a-zA-Z0-9]

Case use:

Case 1:. match any character except the line break. One. represents one character, and multiple. represents multiple characters

Case 2: ^ must start with a string

Case 3: $ end of matching string, which must end with string

Case 4: * match * the first character 0 times or unlimited times, such as abc *, which can be AB, abc, abcc, abcccc...

Case 5: + match + the first character 1 or unlimited

Case 6 :? Match the first character 0 or 1 time

Case 7: {m} matches the previous character m times

Case 8: {m, n} matches the previous character m-n times. If m is omitted, it refers to repeating 0-n times. If n is omitted, it refers to repeating m-unlimited times.

Case 9: [...] Character Set characters and conversion

The character set can be listed one by one, or the range can be [abc] or [a-c]. If the first character is ^, the character set is reversed, for example, [^ abc] indicates that it is not another character of abc. All special characters lose their original special meaning in the character set. If you want to use],-, or ^ In the character set, you can add a backslash to the front, or put],-in the first character, put ^ in not the first character r

Other Conversion characters:

  • \ D matches any decimal number: it is equivalent to the class [0-9].

  • \ D matches any non-numeric character: it is equivalent to the class [^ 0-9]

  • \ S matches any blank character: it is equivalent to the class [\ t \ n \ r \ f \ v]

  • \ S matches any non-blank characters: it is equivalent to the class [^ \ t \ n \ r \ f \ v]

  • \ W matches any alphanumeric character, which is equivalent to the class [a-zA-Z0-9]

  • \ W matches non-any letter or number characters, that is, [^ \ w]

  • \ B matches a word boundary, that is, the position between a word and a space. It matches the boundary between \ w and \ W, such as some special punctuation marks and spaces. For example, 'er \ B 'can match 'er' in "never", but cannot match 'er 'in "verb '.

  • B [^ \ B]

Case 10: \ B wants to match the word I, not the word I


2. Main functions:

  • Match (regular model, string to be matched, flags = 0), matching starting from the starting position. If the match is successful, an object is returned. If the match is not successful, None is returned, the difference with search is whether to start from the first position. Cooperation required

  • Search (regular model, the string to be matched, flags = 0, browse the entire string to match, until a match is found, None is returned if no match is successful.

  • Findall (regular model, string to be matched, flags = 0) Browse all strings, match all string matching rules, and put the matched strings in a list, if no matching is successful, an empty list is returned. The matched string is not involved in the next match

  • Split (regular model, string to be matched, specify the number of splits, flags = 0) Splits string based on regular match, returns a list after segmentation

  • Sub (regular model, string to be replaced, string to be matched, specify the number of matches, flags = 0), replace the matched string at the specified position.

  • Subn (regular model, string to be replaced, string to be matched, specify the number of matches, flags = 0), replace the matched string at the specified position, and return the number of replicas, you can use two variables to receive them separately.

  • Re. compile (pattern [, flags]): converts the regular expression syntax into a regular expression object. The flags definition includes:
    Re. I: case insensitive
    Re. L: Special Character Set \ w, \ W, \ B, \ B, \ s, \ S depends on the current environment
    Re. M: multiline Mode
    Re. S: '.' And any characters including line breaks (Note: '.' does not include line breaks)
    Re. U: Special Character Set \ w, \ W, \ B, \ B, \ d, \ D, \ s, \ S depends on the Unicode Character Attribute Database

Group: the group is matched with the obtained items.

  • Group () is used to obtain all matched results. No matter whether a group exists or not, all matched results are obtained. The number of matched parameters is 2.

  • Groups () gets the matching grouping results in the model and returns only the grouping results of the matched strings.

  • Groupdict () gets the matched grouping results in the model, and only obtains the group results of the key defined in the group part of the matched string.


Case 11: match and group effects

Case 12: Group

Case 13: Effects of groupdict

Case 14: findall + grouping

Case 15: split + group

Case 16: Replace sub

 

Source URL: https://www.jianshu.com/p/df2e26c5b2b5


Program link: https://pan.baidu.com/s/1nuDdaGh password: dvzj

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.