Text in Python (1)
This article mainly records and summarizes the learning and understanding of this chapter in the Python standard library.
In Python, some methods such as text are particularly common. In general, a class such as String is used, which should be the most basic standard class in Python.
1.1 Functions
Capwords () and maketrans () in the string class ().
Capwords () is used to uppercase the first letter of all words in a string;
The maketrans () function will create a conversion table. You can use the translate () method to change a group of characters to another group, which is more efficient than calling replace () repeatedly.
String has a function called template. It is also used for character concatenation.
The advanced Template can modify the default Syntax of string. Template. Therefore, you need to adjust the regular expression used to search for variable names in the Template.
##############################################################test about matetrans()leet = string.maketrans('asdfghjk', '12345678') print s.translate(leet)print s##############################################################test about Template()values = {'var':'foo'}t=string.Template("""Variable : $varEscape : $$Variable in text: ${var}iable""")print 'TEMPLATE:', t.substitute(values)s="""Variable : %(var)ssEscape : %%Variable in text: %(var)sssssiable"""print 'INTERPOLATION:', s%values
1.2 textwrap () -- format text paragraphs
Purpose: format the text by adjusting the position where the line break appears in the paragraph.
1.3 re-Regular Expression
Purpose: use the formal mode to search for and modify texts.
Regular expression.
1.3.1 search for the mode in text in re.
Import reprint '-' * 30 # about regular expression search () pattern = 'eas' text = 'Does this text match the pattern? 'Match = re. search (pattern, text) s = match. start () e = match. end () print 'und "% s" \ nin "% s" \ nfrom % d to % d ("% s") '% \ (match. re. pattern, match. the string, s, e, text [s: e]) # start () and end () methods can provide the corresponding indexes in the string.
1.3.2 compile a regular expression
Re contains some module-level functions used to process regular expressions used as text strings. For frequently used expressions, compiling these expressions will be more efficient. The compile () function converts an expression string to a RegexObject.
print '-'*30#about the Compile()regexes=[re.compile(p) for p in ['this','that'] ]text='Does this text match the pattern?' print 'Text: %r\n' % text for regex in regexes: print 'seeking "%s" ->' % regex.pattern if regex.search(text): print 'match!' else: print 'no match!'
Module-level functions maintain a cache of compiled expressions, but the cache size is limited. Directly Using compiled expressions can avoid cache search overhead. Another advantage of using compiled expressions is to advance the compilation process and optimize the efficiency of the program running to some extent.
1.3.3 multi-match
Search () is used to find a single instance in a text string. The findall () function returns all substrings that match the pattern in the input but do not overlap.
Print '-' * 30 # about the findall () text = 'bbbbbababbabbababa 'pattern = 'ba' for match in re. findall (pattern, text): print matchprint '-' * 30 # about the finditer () # finditer returns an iterator that generates a match instance, unlike findall () is a string directly returned. Text = 'aaaadaaadadadadadadada 'pattern = 'da' for match in re. finditer (pattern, text): s = match. start () e = match. end () print 'found "% s" at % d: % d' % (text [s: e], s, e)
1.3.4 mode syntax
The pattern syntax of the Python regular expression.
1.3.5 Restricted Search
If you already know that you only need to search for a subset of the entire input, you can tell re Xianzhi to search for a range to further constrain the regular expression.
Print '-' * 30 # An inefficient implementation of iterall. Text = 'this is some text -- with punctuation. 'pattern' = re. compile (r '\ B \ w * is \ w * \ B') print 'text: ', textpos = 0 while True: match = pattern. search (text, pos) print match if not match: break s = match. start () e = match. end () print s, e print '% d: % d = "% s"' % (s, E-1, text [s: e]) pos = e