Python for ICs ICS Chapter 2 Regular Expressions (2), pythoninformatics
After a busy day, I finally had a quiet time at night to continue my work yesterday.
11.1 regular expression character match
We can use many other special characters to create more powerful regular expressions. The most common special symbol is the period (".") that can match any character ("."). In the following example, the regular expression "F.. m:" will match "From:", "Fxxm:", "F12m:", or "F! @ M: "It is similar to a string because the period in the expression can match any character.
1 import re2 hand = open('mbox-short.txt')3 for line in hand:4 line = line.rstrip()5 if re.search('^F..m:', line):6 print line
The combination of asterisks ("*") and plus signs ("+") that represent any number of repetitions of a character in a regular expression makes the expression particularly powerful. The asterisk indicates that, in the searched string, the matching character can appear more than zero times, while the plus sign is repeated more than once.
In the following example, we use repeated wildcards to further narrow our search range:
1 import re2 hand = open('mbox-short.txt')3 for line in hand:4 line = line.rstrip()5 if re.search('^From:.+@', line):6 print(line)
The search string "From:. + @" will successfully match the rows starting with "From:", followed by more than one arbitrary character, and followed by a "@" character. Therefore, this will match rows similar to the following:
From: stephen. marquard @ uct. ac. za
This ". +" wildcard extension matches all characters from the colon to the @ character.
From:. + @
The plus sign and the star sign are regarded as good. For example, the following string will be pushed to the last @ and matched:
From: stephen.marquard@uct.ac.za, csev@umich.edu, and cwen @ iupui.edu
It is also possible to make the asterisks and the plus sign less greedy, but you need to add another symbol. For more information about how to disable their greedy behavior, see the detailed documentation.