Python for ICs ICS Chapter 1 Regular Expressions (1), pythoninformatics
So far, we have been reading files through the search mode and extracting various information we are interested in between lines. We have been using string methods like split and find, as well as the list and string cutting functions to extract part of the content in the row.
Such search and extraction tasks are so common that Python has a very powerful library-regular expressions (regular expressions) to handle these tasks elegantly. We didn't introduce regular expressions a little earlier because although they are powerful, they are a little complicated and take some time to adapt to their syntax.
The formal expression is almost a small programming language used to find and analyze strings. In fact, you can write a complete book about regular expressions. In this chapter, we will only cover the basics. For more information about regular expressions, see:
Http://en.wikipedia.org/wiki/Regular_expression
Http://docs.python.org/library/re.html
Before using a regular expression, you must import the library file to your program. The simplest application of regular expressions is the search () function. The following program will demonstrate the small application of the query function.
1 import re2 hand = open('mbox-short.txt')3 for line in hand:4 line = line.rstrip()5 if re.search('From:', line):6 print(line)
Open the file, read each row cyclically, search () with the regular expression to find the rows containing the "From:" string, and print the output rows. This program does not use the real capabilities of regular expressions, because we can achieve the same result simply by using line. find.
When we add special characters to the query string so that we can more accurately control the rows matching the string, then the regular expression will become available. Adding special characters to regular expressions allows us to implement complex matching and extraction with a very small amount of code.
For example, the regular expression uses the insert symbol ^ to match the beginning of a row. You only need to add an insert character before the query character to modify the program to print only the rows starting with "From:". The specific code is as follows:
1 import re2 hand = open('mbox-short.txt')3 for line in hand:4 line = line.rstrip()5 if re.search('^From:', line)6 print(line)
Now, we will only match the rows starting with "From. In this simple example, we can use the startswich () method of the string library. This example tells us that regular expressions containing special characters give us more control.
Note: the original code is version 2.7. Because I use version 3.4, I changed print line to print (line ). Due to limited capabilities, we will first arrive here today and continue tomorrow.