has been a programmer for many years, the first time to open a blog, welcome to fellow critics, comments.
Recently obsessed with Python, after reading the Python Basics tutorial, and holding a "Python Core programming (third edition)" began to chew.
After reading the first chapter, found a lot of very good exercises, I tried to do some, but on the internet can not find a very good answer. So decided to write their own write a record of their answers, each answer is their own superior measured.
Python environment is 2.7.13, the tool is JetBrains pycharm 2017.1.1 x64
# Before you copy the code, import the
import re
1-1 Identify subsequent strings: bat, bit, but, hat, hit or hut.
# 1-1
# output results [' bat ', ' bit ', ' but ', ' hat ', ' hit ', ' hut '] pattern
= R ' [bh][aiu]t '
string = ' ASDFBATKKJBITLLWBUTP Phatoouhitwwhut '
print Re.findall (pattern, string)
1-2 matches a single space separating the word pairs, which is the first and last name.
# 1-2
# output results [' Tom ', ' Jerry ', ' Hello ', ' Bye ', ' house ', ' good ', ' God '] pattern
= R ' (. *) \s (. *) '
string = ' Tom Jer Ry Hello Bye House good God's
print Re.match (pattern, string). Group ()
1-3 matches any word and single letter separated by a single comma and a single white delimiter, such as the first character of a last name
# 1-3
# Output results S.n.owfall and S.N. Owfall pattern
= R ' ([A-z]\.) + ? [A-z] [a-z]+ '
string1 = ' S.n.owfall '
string2 = ' s.n. Owfall '
print re.match (pattern, string1). Group ()
Print Re.match (pattern, string2). Group ()
1-4 matching the collection of all valid Python identifiers
# 1-4 Pattern
= R ' [a-za-z_]\w+ '
1-5 match the street address (make your regular expression generic enough to match any number of street words, including type names) according to the reader's local format. For example, the U.S. results address uses the following format: 1180 Bordeaux Drive. Make your regular expression flexible enough to support multiple-word result names, such as 3120 De la Cruz boutlevard.
# 1-5 Pattern
= R ' \d+ [a-za-z]+ ' string1
= ' 1180 Bordeaux Drive '
stirng2 = ' 3120 De la Cruz Boulevard '
print Re.match (pattern, string1). Group ()
print re.match (pattern, string2). Group ()
1-6 matches a simple web domain name starting at "www" and ending with ". com"; for example, www.://www.yahoo.com/. Choose a question: your regular expression can also support other advanced domain names, such as. edu,. NET, etc. (for example, http://www.foothilledu)
# 1-6 Pattern
= R ' ((Http:|https:)//)? [ w]{3}\.\w+ (. edu|.com|.net) '
string = ' http://www.foothill.edu '
print re.match (pattern, string). Group ()
1-7 matches all string sets that can represent Python integers
# 1-7 Pattern
= R '-? ( \d+) '
string = ' -212312 '
print re.match (pattern, string). Group ()
1-8 Match all string sets that can represent Python long integers
# 1-8 Pattern
= R '-? ( \d+) L '
string = ' -212312l '
print re.match (pattern, string). Group ()
1-9 matches all string sets that can represent Python floating-point numbers
# 1-9 Pattern
= R '-?\d+\.\d+ '
string = ' -3.1415926 '
print re.match (pattern, string). Group ()
1-10 Match all string sets that can represent Python complex numbers
# 1-10 Pattern
= R '-?\d+\.? \d+\+\d+\.+\d+j '
string = ' -1.4+1.5j '
print re.match (pattern, string). Group ()
1-11 matches all collections that can represent a valid e-mail address (starting with a loose regular expression and then trying to make it as rigorous as possible, but keeping the right features)
# 1-11 Pattern
= R ' \w+@\w+\.com '
string = ' abc_abc111@abc111_abc.com '
print re.match (pattern, string). Group ()
1-12 Match all collections (URLs) that can represent a valid URL (start with a loose regular expression and try to make it as rigorous as possible, but keep the right features)
# 1-12 Pattern
= R ' ((Http:|https:)//)? ( [W] {3}\.)? \w+\.\w+ '
string = ' http://foothill.edu '
print re.match (pattern, string). Group ()
1-13 type (). The built-in function type () returns a type object, as shown below, which is represented as a string of pythonic types.
>>> type (0)
<type ' int ' >
>>> Type (.)
<type ' float ' >
>> > Type (dir)
<type ' Builtin_function_or_method ' >
Creates a regular expression that can extract the actual type name from a string. The function returns int for a string similar to <type ' int ' > (other types, such as ' float ', ' builtin_function_or_method ', and so on). Note: The value you implement will be stored in the _name_ attribute of the class and some of the built-in types. 1-14 processing date. Section 1.2 provides a regular expression pattern to match a single or two numeric string to represent the 1~9 month (0?[ 1-9]). Create a regular expression to represent the remaining three months in the standard calendar.
Pattern = R ' 1[0-2] '
string = ' a '
print re.search (pattern, string). Group ()
1-15 processing credit card number. Section 1.2 also provides a regular expression pattern that can match the credit card (CC) number ([0-9]{15,16}). However, the pattern does not allow hyphenation characters to be used to split a number block. Creates a regular expression that allows the use of hyphens, but can only be used in the correct location. For example, a 15-bit credit card number uses the 4-6-5 mode, indicating that 4 digits-hyphens-6 digits-hyphens-5 digits, and 16-bit credit card numbers use the 4-4-4-4 mode. Keep in mind that you want to group the entire string appropriately. Choose a question: There is a standard algorithm to determine whether a credit card number is valid. Write code that recognizes not only the numbers in the correct format, but also the valid credit card numbers.
# 1-15 Pattern
= R ' ([0-9]{4}-[0-9]{6}-[0-9]{5}) | ( [0-9] {4}-[0-9]{4}-[0-9]{4}-[0-9]{4}) '
string = ' 4444-444465-44446 '
print re.search (pattern, string). Group ()
Use gendata.py. The following set of exercises (1-16~1-27) deals specifically with data generated by gendata.py. Before trying to practice 1-17 and 1-18, the reader needs to complete exercise 1-16 and all regular expressions first. 1-16 Update the code for the gendata.py so that the data is output directly to the redata.txt instead of the screen.
From random import randrange, choice from
string import ascii_lowercase as LC from
sys import Maxint
from time Import CTime
TLDs = (' com ', ' edu ', ' net ', ' org ', ' gov ')
f=open (' Redata.txt ', ' W ') for
I in xrange (1 ):
dtint = Randrange (maxint)
dtstr = CTime (dtint)
Llen = Randrange (4, 8)
login = '. Join (Choice (LC For J-in Xrange (Llen))
Dlen = Randrange (Llen,)
dom = '. Join (choice (LC) to J in Xrange (Dlen))
input = STR (DTSTR) + ':: ' + str (login) + ' @ ' + str (DOM) + '. ' + str (choice (TLDs)) + ':: ' + str (dtint) + '-' + str (llen) + '-' + s TR (Dlen)
f.write (input + ' \ n ')
1-17 determine the number of occurrences of a week in Redata.tex (in other words, the reader can also calculate the number of occurrences per month in the selected year).
Week_list = []
month_list = []
f = open (' Redata.txt ', ' R ') for
eachline in F:
Week_list.append (Re.split ( R ' \s+ ', eachline) [0])
month_list.append (Re.split (R ' \s+ ', eachline) [1])
week_day_tmp_list = set (Week_list)
month_tmp_list = set (month_list)
print "____________________"
print "Week times:" For
item in Week_ Day_tmp_list:
print "%s appears%d time (s)"% (item, Week_list.count (item))
print "____________________"
print ' Month times: '
for item in month_tmp_list:
print '%s appears%d time (s) '% (item, Month_list.count (i TEM))
f.close ()
1-18 Ensure that there is no data corruption in the redata.txt by confirming that the first integer in the integer field matches the timestamp at the start of each output line.
# 1-18 from time
import ctime
Num_pattern = R '. +::(\d+)-'
Time_stamp_pattern = R ' ^ (. { :. + '
try:
f = open (' Redata.txt ', ' R ')
for I, eachline in enumerate (f):
# Get the first integer
second = Re.search (Num_pattern, Eachline.strip ()). Group (1)
time_stamp_str = Re.search (Time_stamp_pattern, Eachline.strip ()). Group (1)
# match timestamp is correct
if Time_stamp_str!= str (ctime (int (second)):
print "line%d is Not wrong! Correct Timestamp is%s% (i, time_stamp_str)
else: print ' This ' is
ok! '
Except ValueError as Value_err:
print "The ' is ' is ' not ' Type of INT:" + value_err.message
except IOError as Io_err:
print (' File Error: ' + io_err.message ')
finally:
f.close ()
Create the following regular expression. 1-19 extracts the full timestamp in each line.
# 1-19
time_stamp_pattern = R ' ^ (. { :. + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F:
print Re.search (Time_stamp_pattern, Eachline.strip ()). Group (1)
except IOError as Io_err:
print (' File Error: ' + io_err.message ')
finally:
F.close ()
1-20 Extract the full e-mail address in each line.
# 1-20
Email_pattern = R '. +::(. +)::. + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F:
Print Re.search (Email_pattern, Eachline.strip ()). Group (1)
except IOError as Io_err:
print (' File Error: ' + IO _err.message)
finally:
f.close ()
1-21 only extracts the month in the timestamp.
# 1-21
Month_pattern = R ' ^\w{3}\s (\w{3}). + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F:
Print Re.search (Month_pattern, Eachline.strip ()). Group (1)
except IOError as Io_err:
print (' File Error: ' + io_err.message)
finally:
f.close ()
1-22 only extracts the year in the timestamp.
# 1-22
Year_pattern = R '. + (\d{4})::. + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F:
print Re.search (Year_pattern, Eachline.strip ()). Group (1)
except IOError as Io_err:
print (' File Error: ' + io_ Err.message)
finally:
f.close ()
1-23 only extracts the time in the timestamp (HH:MM:SS).
# 1-23
Time_pattern = R '. + (\d{2}:\d{2}:\d{2}). + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F :
print Re.search (Time_pattern, Eachline.strip ()). Group (1)
except IOError as Io_err:
print (' File Error: ' + io_err.message)
finally:
f.close ()
1-24 only extracts the login name and domain name (including the primary and advanced domain names) from the email address.
# 1-24 Pattern
= R '. +::(\w+) @ (\w+\.\w+). + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F:
# Login Name
Print Re.search (pattern, Eachline.strip ()). Group (1)
# Main domain name and advanced domain name
print re.search (pattern, Eachline.strip ()). Group (2)
except IOError as Io_err:
print (' File Error: ' + io_err.message)
finally:
f.close ()
1-25 only extracts logins and domain names (including primary and advanced domain names) from e-mail addresses.
# 1-25 Pattern
= R '. +::(\w+) @ (\w+) \. (\w+). + '
try:
f = open (' Redata.txt ', ' R ') for
eachline in F:
# login name
print re.search (pattern, Eachline.strip ()). Group (1)
# main domain name
print re.search (pattern, Eachline.strip ()). Group (2)
# Advanced domain name
Print Re.search (pattern, Eachline.strip ()). Group (3)
except IOError as Io_err:
print (' File Error: ' + io_ Err.message)
finally:
f.close ()
1-26 Use your e-mail address to replace the e-mail address in each line of data.
# 1-26 Pattern
= R ' (. +::) (\w+@\w+\.\w+) (::. +) '
My_email = R ' \1snowfall_dan@outlook.com\3 '
try:
f = Open (' Redata.txt ', ' R ') for
eachline in F:
print re.sub (pattern, My_email, Eachline.strip ())
except IOError as Io_err:
print (' File Error: ' + io_err.message ')
finally:
f.close ()
1-27 extracts the month, day, and year from the timestamp, and then in the format "Day, month, year", each row is only iterated once. (the title of the book has a problem, my own modified topic)
# 1-27 Pattern
= R ' (. +) (\w{3}) \s (\d{2}) (. +) (\d{4}) (. +) '
My_email = R ' \1\3 \2\4\5\6 '
try:
f = open ( ' Redata.txt ', ' R ') for
eachline in F:
print re.sub (pattern, My_email, Eachline.strip ())
except IOError As Io_err:
print (' File Error: ' + io_err.message ')
finally:
f.close ()
Process phone numbers. For exercises 1-28 and 1-29, review the regular expression \d{3}-\d{3}-\d{4} described in section 1.2, which matches the phone number, but allows an optional area code to be prefixed. Update the regular expression so that it satisfies the following criteria. The 1-28 area code (the first and subsequent hyphens in a set of three integers) is optional, that is, the regular expression should match 800-555-1212, or it can match 555-1212.
# 1-28 Pattern
= R ' (\d{3}-)-\d{3}-\d{4} '
phone = ' 555-1212 '
print re.match (pattern, phone). Group ()
1-29 supports an area code (not to mention optional content) that uses parentheses or hyphens to match the regular expression to 800-555-1212, 555-1212, and (800) 555-1212.
# 1-29 Pattern
= R ' (\d{3}-|\ (\d{3}\)-|\d{3}-) \d{3}-\d{4} '
phone1 = ' (888) -555-1212 ' phone2
= ' 555-1212 '
phone3 = ' 888-555-1212 '
print re.match (pattern, phone1). Group ()
print re.match (pattern, phone2). Group ()
Print Re.match (pattern, phone3). Group ()