Python Regular Expression module (re module) and python Regular Expression
Python is the first programming language I have come into contact with. Although it is simple enough, it is still difficult for me to get started with programming languages, so I only learned some basic Python syntaxes, but I didn't know much about them a little deeper. However, so far, I have been in touch with this programming language for a year. During this period, I have read and understood various Python features and packages. However, the regular expression is still unknown. In addition, some time ago, it helped Cong ke sort out the Chinese maintenance version of <deep into Python3> to join in on this National Day holiday, so let's talk about the regular expression module re.
The first is the replacement of Characters in the string. If the native Python method is used, the replace method is generally used. Next we will use the re. sub method to compare it.
First case: Our goal is to replace ROAD with RD.
It seems that the replace method works well and completes the tasks I have given. What if the following example is used?
It is obvious that there is a problem here, because this string contains BROAD, which also contains four characters of ROAD, but our goal does not require replacement. Here it is replaced with BRD, which is enough to explain the limitations of the replace method. Of course, this method can also be used if it is well designed.
With slice, we can do this by simply replacing the last four characters. However, this algorithm also has limitations. Should we keep the last six characters if we want to replace STREET with ST? In this way, you must modify the code for each replacement and cause errors easily, which is very troublesome for debugging.
Let's try the re. sub method.
Note that the first parameter, 'road $ ', indicates the end of the string, that is, matching the end of the ROAD character. Similarly, ^ indicates the start of a string. I soon found that sometimes ROAD is not necessarily at the end of the string, for example, s = '2017 BROAD ROAD apt.3'. At this time, the above method is not applicable. It doesn't matter, we also have \ B.
Is it amazing? \ B indicates a space on the left and a space on the right. Therefore, \ B indicates that ROAD is an independent word on both sides. That is to say, replacing an independent ROAD with RD. is the same as our goal.
Case 2: match the roman numerals
In Roman numerals, a number is expressed by a combination of seven letters.
I = 1
V = 5
X = 10
L = 50
C = 100.
D = 500
M = 1000
The following are several general rules to form the roman numerals:
Most of the time, A number is represented by a combination of characters. I is 1, II is 2, III is 3. VI is 6 (in a word, it is a combination of "5 and 1"), VII is 7, and VIII is 8.
Up to three characters (I, X, C, and M) can be repeated. To represent 4, you must subtract one from the next larger number 5 in the same digit. Instead of IIII representing 4, it should be IV (meaning 1 smaller than 5 ). 40 write XL (10 smaller than 50), 41 write XLI, 42 write XLII, 43 write XLIII, 44 write XLIV (10 smaller than 50 and 1 smaller than 5 ).
Sometimes the representation is the opposite. To represent an intermediate number, it is necessary to subtract from a final value. For example, 9 needs to be reduced from 10: 8 is VIII, but 9 is indeed IX (1 smaller than 10), not VIII (I character cannot be repeated 4 times ). 90 is XC and 900 is CM.
It indicates that 5 characters cannot be repeated in a number. 10 can only be represented by X, but not VV. 100 can only be expressed in C, rather than LL.
The number is calculated from left to right, so the order of characters is very important. DC represents 600, while CD is totally another number 400 (500 smaller than 100 ). CI is 101, and IC is not a Roman number (because you cannot subtract 1 from 100, you can only write it as XCIX, which means 10 smaller than 100 and 1 smaller than 10 ).
Matching of thousands of BITs:
We set the matching mode to three M, where '? 'Indicates that this parameter is optional, that is, three optional M parameters.
When M is matched for the first time, it starts from ^ (starting with a string) and matches to one M. Because the other two are optional, they are skipped and then matched to $ (ending with a string ), after the matching is completed, a matching object is returned. Second, the third match is successful. In the fourth match, because up to three M can be matched, but four M is given, the matching fails because $ is not found when three M are matched, return None.
It is worth noting that, because all three parameters are optional, empty strings can also be matched.
Matching of hundreds of BITs:
100 = C
200 = CC
300 = CCC
400 = CD
500 = D
600 = DC
700 = DCC
800 = DCCC
900 = CM
Therefore, there are four matching modes:
CM
CD
It may contain 0 to 3 characters C (0 indicates that the kilobytes are 0 ).
D is followed by 0 to 3 characters C.
The last two types can be combined into one:
An optional D followed by 0 to 3 characters C.
In this case, the matching mode contains thousands and hundreds of digits. '|' Is the meaning of or in logical operations. When a condition is used in parallel in several cases and is met, the execution is stopped.
Similarly, null characters can be matched.
Similarly, we have analyzed the rule of ten digits and one digit, which can also be matched.
Regular Expressions are very powerful, but they are not the correct answer to every question. You need more information to determine which situations are suitable for using regular expressions. Sometimes it can solve your problem, and sometimes it may bring more problems.
Reserved questions:
1. "115.28.66.99 [port = 8080]". This string indicates that port 8080 of the server whose IP address is 115.28.66.99 is enabled. Please use a program to parse this string, then print out That the ** port of the server whose IP address is *** is open ".
2. "115.28.66.99 [port = 21, type = ftp]". This string indicates that port 21 of the server with the IP address 115.28.66.99 provides the ftp service. If ", if the "type = ftp" part is omitted, the http service is used by default. Use the program to parse this string and print out "the service provided by the ** port of the server with the IP address of *** is ***"