Regular expressions and the Python re module

Source: Internet
Author: User
Tags locale keras

0 Regular Expressions

0.1 common meta-characters

.: matches any single character except \ r \ n

*: Matches the preceding subexpression any time, for example zz* can match Z, can match ZZ, or can match zzzzzzzzzz

+: Matches the preceding subexpression any time, for example hh+ can match hh,hhh,hhhhhhhhhhhhhhh

{n}: matches the preceding subexpression n times, for example h{5} matches Hhhhhh

{N,}: matches the preceding subexpression at least n times, for example H{5,} matches Hhhhhh,hhhhhhhh,hhhhhhhhhhhhhhhhhh

{n,m}: matches the preceding subexpression N to M times

?: matches the preceding subexpression 0 or 1 times, equivalent to {0,1}, if followed by * +? {n} {N,} {n,m} is followed by non-greedy mode. For example Hhhhh, with H + match to get a result hhhhh, with H +? Then match to 5 results, 5 are H.

\: Escape character, such as \\n match \n,\n match line break, \ \ Match \,\ (Match (

^: Matches the beginning of the string, for example ^hello matches Hello World

$: Matches the end of the string, for example scut$ matches I am from scut

[...]: Character set, matches any character in brackets,-represents a range, ^ represents a non, if you want to indicate []-^ to be preceded by a \ or put]-put in the first place, put ^ to non-first, for example [^0-9] match non-number

|: Or, match any one of the left and right expressions, and if the left match succeeds, it will no longer match the expression on the other, such as convex|function match convex or match function

(...): grouped, as a whole is matched, for example, (really) {5} matches really really really really really

(? P<name>, ...): Group with an alias at the same time

(? P=name): A string that references a group with alias name to match to

\<number>: A string that references a grouping of number numbers to match

\d: Match number, equivalent to [0-9]

\d: Match non-numeric, equivalent to [^0-9]

\s: Matches an invisible character, equivalent to [\f\n\r\t\v],\t is a tab \x09,\n is a newline character \x0a,\v is a vertical tab \x0b,\f is a page break \x0c,\r is a carriage return \x0d

\s: Match visible characters, equivalent to [^\s]

\w: Matches any word character (Unicode character set) that includes an underscore, similar but not equivalent to [a-za-z0-9_]

\w: Matches any non-word character, equivalent to [^\w]

1 regular in Python

1.1 Back Slash

Use regular expressions in programming languages, when we want to match \ The time needs to match 4 \, \\\\ match \, because the first programming language will transfer \\\\ to \ \, and then the second time will be transferred \ \. If you use the native string r of Python to write the regular, you can write less two \, that is, R ' \ \ ' matches \,r ' \\d ' match ' \d ', R ' \d ' matches the number

Use of the 1.2 re module

#first compile the regular expression into the pattern objectPattern = Re.Compile('Keras')#You can use the match method of the pattern object to match the text, starting from scratch but not requiring a complete match, you can add $ in the end to exactly match, return the match object, or noneMatch = pattern.Match('Keras is a high-level neural networks API')#You can also use the search method of the Pattern object to match the text, look for a substring match in the text, and return to the match object or noneMatch = pattern.Search('Keras is a high-level neural networks API')#output matching results using the Match object methodifmatch:Printmatch. Group ()#you can also directly use the RE method to match, which saves the compilation of the line, but can not be reused. Re.Match('TensorFlow','TensorFlow is a open-source library for machine Intelligence') Re. Search ('Machine Intelligence','TensorFlow is a open-source library for machine Intelligence')

1.3 Re.compile (Pattern, flags=0)

Return: Pattern Object

Pattern: A regular expression in the form of a string that can be combined using a series of metacharacters and ordinary characters above

Flags: matching pattern, divided into the following (available | Use multiple, such as re. I | Re.

Re. I or re. IGNORECASE: Ignoring case

Re. L or re. Locale: Use local locale. (There is a locale module in Python that represents a different language, region, and character set)

Re. U or re. Unicode: Locale using Unicode

Re. M or re. MULTILINE: Using ^ or $ will match the beginning or end of each line

Re. S or re. Dotall: use. To match line breaks

Re. X or re. Verbox: Ignores whitespace characters, and can add comments

1.4 Pattern Object

The Pattern object represents a regular expression, including the following methods, which have a corresponding method in the re, the parameters are slightly different, the following also gives

Match (String, pos=0, endpos=-1) | Re.match (Pattern, String, flags=0): Matches from the beginning, returning a match object or none.

Search (String, pos=0, endpos=-1) | Re.search (Pattern, String, flags=0): Finds a match for a substring, returns a Match object, or none

Split (String, maxsplit=0) | Re.split (Pattern, String, maxsplit=0, flags=0): Cut string,maxsplit by pattern to indicate maximum number of cuts

FindAll (String, pos=0, endpos=-1) | Re.findall (Pattern, String, flags=0): Search returns all substrings that can be matched

Finditer (String, pos=0, endpos=-1) | Re.finditer (Pattern, String, flags=0): Search for an iterator that returns a match object

Sub (Repl, String, count=0) | Re.sub (Pattern, Repl, String, count=0, flags=0): Replaces each substring that can be matched in string with REPL, returns the replaced substring, and count specifies the maximum number of replacements.

subn (Repl, String, count=0) | RE.SUBN (Pattern, Repl, String, count=0, flags=0): More than a sub, returns a number of times, forming a tuple.

1.5 Match Object

The Match object represents the matching result and contains information about the match.

Contains the following properties: String, re (the pattern object used when matching), POS, Endpos, lastindex (subscript of the last matching grouping), Lastgroup (the alias of the last matching grouping)

The following methods are included:

Group (Group1, group2, Group3,...):

Returns one or more groups of matched substrings, the default is group (0), which represents the entire matched substring.

Group1 can be a number, or it can be an alias for a group.

The group that does not match returns none, and the group that matches to multiple substrings returns the last one.

groups (Default=none): Returns all groups of matched substrings. Returns default when the group does not match to a substring.

groupdict (Default=none): Returns the dictionary, the key is the alias of the group, and the value is the substring to which the group matches. Groups that do not have aliases are not returned.

Start (groupnum=0): Returns the start subscript in string for the specified group of matched substrings, and returns 1 if no match

End (groupnum=0): Returns the end subscript in string for the specified group of matched substrings, and returns 1 without a match

span (group=0): Return (Start (group), End (group))

1.6 Unicode Encoding

The regular of the RE module in Python is best used in Unicode encoding, the form of U ' ... '.

Because Unicode was not used before, the result of matching errors occurred when matching Chinese.

The guess is that the methods in the RE module are executed by default using Unicode encoding.

Regular expressions and the Python re module

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.