Last time a lot of friends write a text screen to use regular expression, actually not I do not want to use (I am not a lot, I have seen the crawler before I know, I directly use BeautifulSoup Web label to find content, because easy to understand also convenient,), It is very difficult to use proficient mastery (see the regular table should know that the symbol corresponding to the rules of the method has a lot of, very flexible), for the contact programming soon friends are likely to waste a lot of time in the programming process, today I will often use the regular simple introduction, if not very special basic are covered using.
1. A brief introduction to the regular
First, you have to import the regular method. An import re regular expression is a powerful tool for working with strings, with its own independent processing mechanism, which may not be as efficient as Str's own method, but with a very flexible function. It runs the process of setting a matching rule ("What you want + regular grammar rules"), putting the string to match, and retrieving the information you want through the regular internal mechanism.
several common poses for 2.findall
The basic structure is roughly: Nojoke = Re.findall (r ' matching rule ', ' wish string to retrieve ') Nojoke is the result of our last regular return, and re is findall to find all r identifiers that represent the statements that are followed by a regular statement (so that when the code is much better), Let's look at a couple of examples to get a better understanding
This code finds all the bi in the retrieved string and returns it as a list, which is often used to calculate the number of occurrences of a uniform character. Keep looking at the next
This adds a symbol ^ that matches the return of a string starting with the ABI, or whether the string starts with an ABI.
Here, a string with a $ symbol is used to return the string ending in GBI, and to determine whether the string ends.
Here [...] The meaning matches the values in parentheses A and F, or B and F, or C and F, and returns the list.
"\d" is a regular grammar rule used to match the number returned from 0 to 9, it should be noted that 11 will return the string as String ' 1 ' and ' 1 ' instead of returning ' 11 ', remember that this is a big pit.
Of course, the solution is that you have to take a few numbers to write a few \d, shown here to take a string of 3 digits, here shows the regular flexible one hand.
Here the small d is to take the number 0-9, Big D means no number, that is, outside the number of the content returned.
"\w" in the regular inside represents a match from A to Z, uppercase A to Z, the number 0 to 9 contains the preceding three kinds as printed above.
"\w" in the regular inside represents a match except for letters and numbers of special symbols, but here \ Slash usage to note in the string \ is the escape symbol specific Baidu to learn.
The use of parentheses () here indicates that the match is within the parentheses inside the contents, here. * is a regular greedy matching grammar hundred points is the most greedy interest the maximum range of matching criteria as shown.
A question mark is added here. is to restrict it from making his maximum range match also known as non-greedy pattern matching. The result is to match the contents of the two P to return.
Here Plus re. I (uppercase I) means that the match regardless of the male and female uppercase and lowercase are all take, otherwise there will be a case after the above match can not find the return empty list to you.
Here's another thing. is commonly known as the newline character, once the line-up program on SB, so we added the re. S (uppercase) This represents the return of all characters, including the matching line, basically you put the above syntax and usage after learning basic 70% above the matching methods are all done, of course, there are many ways I do not list, we can learn by themselves (the rest of the basic I have rarely used).
2.match and search usage and differences:
Re.match attempts to match a pattern from the starting position of the string, and if the match is not successful, match () returns none. Re.search scans the entire string and returns the first successful match. A look at the code is easy to understand. As follows:
Here the direct print end plus. span () can get the location of the matching string in tuple tuple return (start position, end position), one is not written because he returned empty plus the compiler error.
is not at a glance, match will only begin to match, can not find the return to none, here I did not add. Group () is because the return value is null I added the compiler will error, search not picky scanning the entire string, of course, inside can also use the above regular method to match, Here is not too much to introduce the people can do hands-on practice.
usage of 3.sub substitution
A sub is used to replace a match in a string, and the syntax is generally re.sub (r ' regular match rule ', ' substituted string ', string to retrieve)
This is a very intuitive response to the result, the # number and the following string to replace the string you want to change.
4. Final Benefits
Before the final benefits, I hope that we can practice the above usage and use of rules, only a lot of mistakes and more summary will accumulate experience, the final benefits to everyone a few commonly used mailbox matching rules are as follows: