1 #Coding=utf-82 ImportRe3 ImportChardet#module for detecting Web page encoding form4 5p = Re.compile (r'\d+') 6 PrintP.findall ('ONE1TWO2THREE3FOUR4') 7 8A="REWFD231321EWQ21WEQEQW"9P=re.compile (R"(\d+) \d+ (\d+)", Re. S)Tenb=P.findall (a) One Printb A -A=u"I love @ embarrassed hundred, you?" - Printa theB=re.findall (U"(.+?) @ Embarrassing hundred (. +)", A,re. S) - Printb - forIinchB: - forJinchI: + PrintJ
Results:
[' 1 ', ' 2 ', ' 3 ', ' 4 ']
[(' 231321 ', ' 21 ')] #findall的结果是 [(), ()] This form, if the tuple has only one element, is ["", ""] such
I love @ embarrassed hundred, you?
[(U ' \u6211\u7231 ', U ' \uff0c\u4f60\u5462 ')]
I love
What about you
——————————————————————————————————————————
If you do not know the encoding of Chinese characters, for example, is a piece of text crawling on the Internet (usually do not know)
1 ImportRe2 ImportChardet#module for detecting Web page encoding form3 4A="I love @ embarrassed hundred, you?"5 ifisinstance (A, Unicode):6 Pass7 Else:8codesty=Chardet.detect (a)9A=a.decode (codesty['encoding'])Ten Printa OneB=re.findall (U"(.+?) @ Embarrassing hundred (. +)", A,re. S) A Printb - forIinchB: - forJinchI: the PrintJ
The Chardet module is used to encode it and convert it to Unicode.
Results:
I love @ embarrassed hundred, you?
[(U ' \u6211\u7231 ', U ' \uff0c\u4f60\u5462 ')]
I love
What about you
Of course, if you want to double-click, py under Windows demo, get the string should be added J.encode ("GBK")
Note: Before processing the Chinese to convert it to Unicode, do not ASCII code directly match, ASCII code how to convert Unicode? Come on.
Python-re's Chinese match