python Regex

來源:互聯網
上載者:User

標籤:迭代   引用   was   多行   doc   nic   source   哪些   逆序   

1.迭代器:對象在其內部實現了iter(), iter()方法,可以用next方法實現自我遍曆。

二.pythonRegex

1.python通過re模組支援Regex

2.查看當前系統有哪些python模組:help(‘modules‘)

help():互動式模式,支援兩種方式調用(互動式模式調用,函數方式調用)

例:互動式調用

>> help()

Welcome to Python 3.5‘s help utility!

If this is your first time using Python, you should definitely check out

the tutorial on the Internet at http://docs.python.org/3.5/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing

Python programs and using Python modules. To quit this help utility and

return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type

"modules", "keywords", "symbols", or "topics". Each module also comes

with a one-line summary of what it does; to list the modules whose name

or summary contain a given string such as "spam", type "modules spam".

help> modules

函數式調用

help(‘modules‘)

3.Regex的元字元

\s  :空白符;
\S  :非空白符;
[\s\S]  :任一字元;
[\s\S] :0個到多個任一字元;
[\s\S]
? : 0個字元,匹配任何字元前的位置;

\d:數字;

\B:非數字 ;

\w:匹配單詞 單詞等價於:[a-zA-Z0-9_];

\W:匹配非單詞;

規則:

.  匹配任意單個字元;

*  匹配前一個字元0次或多次;

+  匹配前一個字元1次或多次;

?  匹配前一個字元0次或一次;

{m} 匹配前一個字元m次;

{m,n} 匹配前一個字元 m - n 次;

{m,} 匹配前一外字元至少 m次 至多無限次;

{,n} 匹配前一個字元 0 到 n次;

\  逸出字元;

[...] 字元集 例:[a-z];

.? ? +? ?? {}? 使* + 等 變成非貪婪模式

邊界匹配(不消耗待匹配的待匹配字串的字元)

^:匹配字串開頭,在多行模式中匹配每一行的行首;

$:匹配字串結尾,在多行模式中匹配每一行的行尾;

\b:匹配單詞邊界,不匹配任何字元,\b匹配的只是一個位置,這個位置的一側是構成單詞的字元,另一側為非字元、字串的開始或結束位置,\b是零寬度的。(“單詞”是由\w所定義的單詞子串) \b相當於:(?<!\w)(?=\w)|(?<=\w)(?!\w);

\B:[^\b];

\A:僅匹配字串開頭;

\Z:僅匹配字串結尾;

分組:

|  或,左右運算式任意匹配一個,它先嘗試匹配 | 左邊的運算式,如果匹配成功則跳過匹配右邊的運算式;如果 | 沒有被包括在()中,則它在範圍是整個Regex。

()  分組 ;從運算式左邊開始,第遇到一個分組,編號加1;分組運算式作為一個整體,後面可接數量詞;分組運算式中的 | 僅在該分組中有效。 例:(abc){3} (abc|def)123 (abc|def){3}123

\number  引用編號為 number 的分組匹配到的字串。 例:(\d)([a-z])\1\2

環視(lookhead)

(?=) :順序肯定環視

(?!) :順序否定環視

(?<=) :逆序肯定環視

(?<!) :逆序否定環視

4.調用re的內建方法完成Regex分析

5.match(匹配)對象:

match(pattern, string, flags=0)

Try to apply the pattern at the start of the string, returninga match object, or None if no match was found.

m = re.match(‘a‘,‘abc‘)

所有:

m.end m.group m.lastgroup m.re m.start

m.endpos m.groupdict m.lastindex m.regs m.string

m.expand m.groups m.pos m.span

group([group1, …]):

獲得一個或多個分組截獲的字串;指定多個參數時將以元組形式返回。group1可以使用編號也可以使用別名;編號0代表整個匹配的子串;不填寫參數時,返回group(0);沒有截獲字串的組返回None;截獲了多次的組返回最後一次截獲的子串。

groups([default]):

以元組形式返回全部分組截獲的字串。相當於調用group(1,2,…last)。default表示沒有截獲字串的組以這個值替代,預設為None。

m.pos (pos:postion):返回從哪個位置開始搜尋

m.endpos:返回從哪個位置結束搜尋

m.start():返回指定pattern在作匹配時所截獲的子串在原串的起始位置

m.end():返回指定pattern在作匹配時所截獲的子串在原串的結束位置

6.search:執行Regex搜尋並且在搜尋結束後返回所匹配到的串,只返回第一次匹配到的結果

search(pattern, string, flags=0)

Scan through string looking for a match to the pattern, returninga match object, or None if no match was found.

m.group()

m.groups()

7.findall :匹配所有的對象,返回一個列表

findall(pattern, string, flags=0)

Return a list of all non-overlapping matches in the string.If one or more capturing groups are present in the pattern, returna list of groups; this will be a list of tuples if the patternhas more than one group.Empty matches are included in the result.

直接列印結果

8.finditer(用的不多)

finditer(pattern, string, flags=0)

Return an iterator(迭代器) over all non-overlapping matches in thestring.  For each match, the iterator returns a match object.Empty matches are included in the result.

9.split

split(pattern, string, maxsplit=0, flags=0)

Split the source string by the occurrences of the pattern,returning a list containing the resulting substrings.  Ifcapturing parentheses are used in pattern, then the text of allgroups in the pattern are also returned as part of the resultinglist.  If maxsplit is nonzero, at most maxsplit splits occur,and the remainder of the string is returned as the final elementof the list.

例:a = re.split(‘.‘,‘www.baidu.com‘)

直接列印結果

10.sub:實現尋找替換

sub(pattern, repl, string, count=0, flags=0)

Return the string obtained by replacing the leftmostnon-overlapping occurrences of the pattern in string by thereplacement repl.  repl can be either a string or a callable;if a string, backslash escapes in it are processed.  If it isa callable, it‘s passed the match object and must returna replacement string to be used.

例:In [47]: re.sub(‘baidu‘,‘BAIDU‘,‘www.baidu.com‘)

Out[47]: ‘www.BAIDU.com‘

11.subn :尋找替換,並顯示替換的次數

例:

In [48]: re.subn(‘baidu‘,‘BAIDU‘,‘www.baidu.com‘)

Out[48]: (‘www.BAIDU.com‘, 1)

flags:

re.I或IGNORECASE:忽略字元大小寫

re.M或MULTILINE:多行匹配

re.A或ASCII:僅執行8位的ASCII碼字元匹配

re.U或UNICODE:使用\w,\W

re.S (DOTALL): "." matches any character at all, including the newline. 使 . 可以匹配 \n 符。

re.X (VERBOSE): Ignore whitespace and comments for nicer looking RE‘s. 允許在Regex規則中加入注釋,但預設會去掉所有空格。

12.去除優先捕獲:

xxx(?:)xxx

?: :分組時去除優先捕獲

?P<> :

(?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.

Named groups can be referenced in three contexts. If the pattern is (?P<quote>[‘"]).*?(?P=quote) (i.e. matching a string quoted with either single or double quotes):

Context of reference to group “quote” Ways to reference it

in the same pattern itself

(?P=quote) (as shown)

\1

when processing match object m

m.group(‘quote‘)

m.end(‘quote‘) (etc.)

in a string passed to the repl argument of re.sub()

\g<quote>

\g<1>

\1

python Regex

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.