1, there is a file, the word between the use of spaces, semicolons, commas, or periods separated, please extract all the words.
Solution:
use \w to match and extract words, but there is a miscarriage of judgment
Use Str.split to separate character strings, but multiple separators are required
Separating strings with Re.split
In [4]: "Help (Re.split)" Help "on Function" split in module Re:split (pattern, String, maxsplit=0, flags=0) split the source String by the occurrences of the pattern, returning a list containing the resulting substrings.
in [23]: text = "I ' m xj, i love python,,linux; i don ' T like windows. " In [24]: fs = re.split (R "(, |\.|;| \s) +\s* ", text) in [25]: fsout[25]: [" I ' M ", ' ', ' XJ ', ' ', ' I ', ' ', ' love ', ' ', ' Python ', ', ', ' Linux ', ' ', ' I ', ' ', "don ' t", ' ', ' like ', ' ', ' windows ', '. ', ']in [26]: fs[::2] #提取出单词Out [26]: ["I ' M", ' XJ ', ' i ', ' love ', ' Python ', ' Linux ', ' i ', "don ' t", ' Like ', ' windows ', ']in [27]: fs[1::2] #提取出符号Out [27]: [' ', ' ', ' ', ' ', ', ', ' ', ' ', ' ', ' ', '. in [53]: fs = re.findall (r "[^,\.;\ s]+ ", text) in [54]: fsout[54]: [" I ' M ", ' XJ ', ' i ', ' love ', ' Python ', ' Linux ', ' i ', "don ' t", ' like ', ' windows ']in [55]: fh = Re.findall (R ' [, \.;\ S] ', text) in [56]: fhout[56]: [' ', ', ', ' ', ' ', ' ', ', ', ', ', '; ', ' ', ' ', ' ', ' ', '. '
2, there is a directory, saved a number of files, find all of them C source files (. C and. h)
Solution:
Using Listdir
Use Str.endswith to judge
in [13]: s = "XJ.C" In [14]: s.endswith (". C") out[14]: truein [15]: s.endswith (". H") Out[15]: falsein [16]: import osin [17]: os.listdir ("/usr/ include/") out[17]: [' libmng.h ', ' netipx ', ' ft2build.h ', ' ' FlexLexer.h ', ' SELinux ', ' qtsql ', ' resolv.h ', ' gio-unix-2.0 ', ' wctype.h ' ", ' python2.6 ', ' ' scsi ', . . . ' Qtopengl ', ' mysql ', ' byteswap.h ',, ' xj.c ' ' mntent.h ', ' Semaphore.h ', ' stdio_ext.h ', ' libxml2 '] in [21]: for filename in Os.listdir ("/usr/include"): if filename.endswith (". C"): print filename ....: xj.cin [22]: for filename in os.listdir ("/usr/include"): if filename.endswith (". C",   ". H ")): #这里元祖是或的关系 print filename ....: libmng.hft2build.hflexlexer.hnss.hpng.hutime.hieee754.hfeatures.hxj.c...verto-module.hsemaphore.hstdio_ EXT.HIN [23]:
3. Fnmath Module
support for Shell-like wildcard characters
In [24]: help (Fnmatch) # Case sensitivity is consistent with operating system Help on function fnmatch in module fnmatch:fnmatch (NAME, PAT) Test whether FILENAME matches PATTERN. Patterns are Unix shell style: * matches everything ?       MATCHES ANY SINGLE CHARACTER    [SEQ] matches any character in seq [!seq] matches any char not in seq an initial period in filename is not special. both filename and pattern are first case-normalized if the operating system Requires it. if you don ' T want this, use fnmatchcase (Filename, pattern). ~ (END) in [47]: fnmatch.fnmatch ("Sba.txt", "*txt") out[47]: Truein [48]: fnmatch.fnmatch ("Sba.txt", "*t") out[48]: truein [49]: Fnmatch.fnmatch ("Sba.txt", "*b") Out[49]: falsein [50]: fnmatch.fnmatch ("Sba.txt", "*b * ") out[50]: true
Case: you have a program that handles files, the file names are entered by the user, and you need to support the same wildcard characters as the shell.
[email protected] src]# cat test1.py #!/usr/local/bin/python2.7#coding:utf-8import osimport sysfrom fnmatch Import Fnma Tchret = [name for name in Os.listdir (sys.argv[1]) if Fnmatch (name, sys.argv[2])]print ret[[email protected] src]# Python2 .7 test1.py/usr/include/*.c[' XJ.C ']
4, Re.sub () text Substitution
in [53]: help (re.sub) help on function sub In module re:sub (pattern, repl, string, count=0, flags=0) Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a String or a callable; if a string, backslash escapes in it are processed. if it is a callable , it ' s passed the match object and must return a replacement string to be used.
Case: There is a text, the date in the text using the%m/%d/%y format, you need to convert it all to%y-%m-%d format.
in [+]: Text = "Today is 11/08/2016, next class time 11/15/2016" in []: New_text = re.sub (R ' (\d+)/(\d+)/(\d+) ', R ' \3-\2- \1 ', text) in [the]: new_textout[57]: ' Today is 2016-08-11, next class time 2016-15-11 '
5, Str.format string Formatting
Case: You need to create a small template engine that does not require logical control, but needs to use variables to populate the template
in [[+]: Help (Str.format) Help on Method_descriptor:format (...) S.format (*args, **kwargs), string Return A formatted version of S, using substitutions from args and Kwargs. The substitutions is identified by braces (' {' and '} '). (END)
This article is from the "Xiexiaojun" blog, make sure to keep this source http://xiexiaojun.blog.51cto.com/2305291/1870832
"Python advanced" 02, Text Processing and IO in-depth understanding