標籤:內容 數字 正則表達 空格 正則 文字 font Regex 字串函數
Regex
特殊字元序列,匹配檢索和替換文本
一般字元 + 特殊字元 + 數量,一般字元用來定邊界
更改字元思路
字串函數 > 正則 > for迴圈
元字元 匹配一個字元
# 元字元大寫,一般都是取小寫反
1. 0~9 整數 \d 取反 \D
import reexample_str = "Beautiful is better than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r"\d", example_str))print(re.findall(r"\D", example_str))
2. 字母、數字、底線 \w 取反 \W
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘\w‘, example_str))print(re.findall(r‘\W‘, example_str))
3. 空白字元(空格、\t、\t、\n) \s 取反 \S
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘\s‘, example_str))print(re.findall(r‘\S‘, example_str))
4. 字元集中出現任意一個 [] 0-9 a-z A-Z 取反 [^]
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘[0-9]‘, example_str))print(re.findall(r‘[^0-9]‘, example_str))
5. 除 \n 之外任一字元
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r".", example_str))
數量詞 指定前面一個字元出現次數
1. 貪婪和非貪婪
a. 預設情況下是貪婪匹配,儘可能最大匹配直至某個字元不滿足條件才會停止(最大滿足匹配)
b. 非貪婪匹配, 在數量詞後面加上 ? ,最小滿足匹配
c. 貪婪和非貪婪的使用,是程式引起bug重大原因
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘.*u‘, example_str))print(re.findall(r‘.*?u‘, example_str))
2. 重複指定次數 {n} {n, m}
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘\d{3}‘, example_str))
3. 0次和無限多次 *
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘.*‘, example_str))
4. 1次和無限多次 +
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘\d+‘, example_str))
5. 0次或1次 ? 使用思路: 去重
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘7896?‘, example_str))
邊界匹配
1. 從字串開頭匹配 ^
2. 從字串結尾匹配 $
Regex或關係 |
滿足 | 左邊或者右邊的Regex
import reexample_str = "Beautiful is better_ than ugly 78966828 $ \r \r\n ^Explicit is better than implicit"print(re.findall(r‘\d+|\w+‘, example_str))
組
() 括弧內的Regex當作單個字元,並且返回()內正則匹配的內容,可以多個,與關係
Python-正則相關模組-re
1. 從字元中找到匹配正則的字元 findall()
import rename = "Hello Python 3.7, 123456789"total = re.findall(r"\d+", name)print(total)
2. 替換正則匹配者字串 sub()
import redef replace(value): return str(int(value.group()) + 1)result_str = re.sub(r"\d", replace, name, 0)print(result_str)
匹配一個中文字元 [\u4E00-\u9FA5]
Python-字串解析-正則-re