Python for ICs ICS Chapter 1 Regular Expressions (1), pythoninformatics
Regular Expression, also known as Regular Expression and Regular Expression (English: Regular Expression, often abbreviated as regex, regexp or RE in code), is a concept of computer science. Regular Expressions use a single string to describe and match a series of strings that conform to a certain syntax rule. In many text editors, regular expressions are usually used to retrieve and replace texts that match a certain pattern.
Note: The original article is from Python for Informatics by Dr Charles Severance.
So far, we have been reading files through the search mode and extracting various information we are interested in between lines. We have been using string methods like split and find, as well as the list and string slicing functions to extract part of the content in the row.
Such search and extraction tasks are so common that Python has a very powerful library-regular expressions (regular expressions) to handle these tasks elegantly. We didn't introduce regular expressions a little earlier because although they are powerful, they are a little complicated and take some time to adapt to their syntax.
The formal expression is almost a small programming language used to find and analyze strings. In fact, you can write a complete book about regular expressions. In this chapter, we will only cover the basics. For more information about regular expressions, see:
Http://en.wikipedia.org/wiki/Regular_expression
Http://docs.python.org/library/re.html
Before using a regular expression, you must import the library file to your program. The simplest application of regular expressions is the search () function. The following program will demonstrate the small application of the search function.
import rehand = open('mbox-short.txt')for line in hand:line = line.rstrip()if re.search('From:', line):print(line)
Open the mbox-short file, read each row cyclically, search () with the regular expression to find the rows containing the "From:" string, and print the output rows. This program does not use the real capabilities of regular expressions, because we can achieve the same result simply by using line. find.
When we add special characters to the query string so that we can control the rows matching the string more accurately, the ability of Regular Expressions begins to show. Adding special characters to regular expressions allows us to implement complex matching and extraction with a very small amount of code.
For example, the regular expression uses the insert symbol ^ to match the beginning of a row. You only need to add an insert character before the query character to modify the program to print only the rows starting with "From:". The specific code is as follows:
import rehand = open('mbox-short.txt')for line in hand:line = line.rstrip()if re.search('^From:', line)print(line)
Now, we will only match the rows starting with "From. In this simple example, we can use the startswith () method of the string library. This example tells us that regular expressions containing special characters give us more control.
Note: the original code is version 2.7. Because I use version 3.4, I changed print line to print (line ).
Related reading:
Python for ICs ICS Chapter 1 Regular Expressions (1)
Regular Expressions in Chapter 11th of Python for ICs ICS (2)
Regular Expressions in Chapter 4 of Python for ICs ICS (4)
I will introduce so much about Python for ICs ICS Chapter 1 Regular Expression (1) and will be updated continuously in the future. for more highlights, please stay tuned!