python基礎-正則2

最後更新：2017-06-15 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

標籤：html pil 使用 test html 標籤正則練習也有 logs

正則函數

Python提供re模組，包含所有Regex的功能

由於python的字串本身也有\轉義,所以需要注意:

s = "ABC\\-001"

對應的Regex應為:‘ABC\-001‘

用python的r首碼,就不用考慮轉義問題

可以使用 s = r‘ABC\-001‘

對應的Regex為:‘ABC\-001‘

match()　　判斷是否匹配成功,如果匹配成功,返回一個match對象,否則返回None

test = "使用者輸入的字串"if re.match(r‘Regex‘,test):    print("OK")else:    print("failed")

結果:failed

#正則函數import reprint("---re.match只匹配字串的開始，如果字串開始不符合Regex，則匹配失敗，函數返回None")print(re.match(‘www‘,‘wwwcom‘).group())#在起始位置匹配print(re.match(‘www‘,‘comwww‘))#不在起始位置匹配print("---re.search，掃面整個字串並返回第一個成功的匹配,後面匹配到的都不會返回")print(re.search(‘baidu‘,‘www.baidu.com‘).group())print(re.search(‘ai‘,‘www.baidu.com‘).group())print("---re.findall，從左至右掃描字串，按順序返回匹配，如果無匹配結果則返回空列表")#返回匹配列表；compile，編譯後執行速度更快#p = re.compile(‘\d+‘)#print(p.findall(‘one1two2three3four4‘))print(re.findall(‘\d+‘,‘one1two2three3four4‘))print(re.findall(‘four‘,‘one1two2three3four4‘))

結果:

---re.match只匹配字串的開始，如果字串開始不符合Regex，則匹配失敗，函數返回None

www
None
---re.search，掃面整個字串並返回第一個成功的匹配,後面匹配到的都不會返回
baidu
ai
---re.findall，從左至右掃描字串，按順序返回匹配，如果無匹配結果則返回空列表
[‘1‘, ‘2‘, ‘3‘, ‘4‘]
[‘four‘]

分組:

除了簡單地判斷是否匹配之外，Regex還有提取子串的強大功能。用()表示的就是要提取的分組（Group）。比如：

^(\d{3})-(\d{3,8})$分別定義了兩個組，可以直接從匹配的字串中提取出區號和本地號碼：

import rem = re.match(r"^(\d{3})-(\d{3,8})$", ‘010-12345‘)print(m)print(m.group(0))print(m.group(1))print(m.group(2))

結果:

<_sre.SRE_Match object at 0x00000000026360B8>
010-12345
010
12345

如果Regex中定義了組，就可以在Match對象上用group()方法提取出子串來。

注意到group(0)永遠是原始字串，group(1)、group(2)……表示第1、2、……個子串

import reprint("---sub用於替換字串中的匹配項")#第一個參數表示正則,第二個表示替換的字串,第三個表示要掃描的字串print(re.sub(‘g..t‘,‘abc‘,‘gaat gbbt gcct‘))print("---split,返回切割後的列表")print(re.split(‘\+‘,‘123+456*789‘))

結果:

---sub用於替換字串中的匹配項
abc abc abc
---split,返回切割後的列表
[‘123‘, ‘456*789‘]

練習1:

假設有這樣一個網址：http://xqtesting.sxl.cn/archive/6688431.html，
請擷取這個網址的副檔名，也就是.html這個東東。

import reprint(re.findall(‘.html‘,‘http://xqtesting.sxl.cn/archive/6688431.html‘))

結果:

[‘.html‘]

練習2:

用Python匹配HTML 標籤的時候，<.*>和<.*?>有什麼區別？別著急，用這兩個來分別匹配
下<div><span>test</span></div>

import reprint(re.findall(‘<.*>‘,‘<div><span>test</span></div>‘))print(re.findall(‘<.*?>‘,‘<div><span>test</span></div>‘))

結果:

[‘<div><span>test</span></div>‘]
[‘<div>‘, ‘<span>‘, ‘</span>‘, ‘</div>‘]

python基礎-正則2

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More