Python: Common Regular Expressions and python
Python common methods of Regular Expressions over 70%
The last time many of my friends used regular expressions to write text blocking, they didn't actually mean I didn't want to use it (I didn't use many regular expressions. I know what crawlers I used to do before, I directly use the webpage tag of BeautifulSoup to find the content, because it is easy to understand and convenient.) instead, it is very difficult to use regular expressions. (You should know that you have read regular expressions, there are many methods and rules corresponding to the symbols in them, which are flexible). For those who are not familiar with programming, it is likely to waste a lot of time in the programming process, today, I will give a brief introduction to regular expressions. If it is not special, it will be overwritten.
1. Introduction to Regular Expressions
First, you need to import the regular expression method import re regular expression, which is a powerful tool for processing strings. It has its own processing mechanism, which may be less efficient than the built-in str method, but its functions are very flexible and powerful. Its running process is to first set a matching rule ("the content you want + Regular Syntax Rules"), put the string to be matched, you can use the regular expression internal mechanism to retrieve the information you want.
2. Several Common findall postures
Basic Structure: nojoke = re. findall (r 'matched rule', 'desired string') nojoke is the final result returned by regular expressions, re-regular findall: Find all the r identifiers. The regular statements are followed by the regular expressions (so that you can check them when there are many codes). Let's take a look at several examples to learn more.
This code is used to find all bi in the search string and return them in the form of a list. This will often be used to calculate the number of occurrences of unified characters. Continue to look at the next
Here, a character ^ is added to indicate that the string starting with abi is matched. You can also determine whether the string starts with abi.
Here, the $ symbol is used to indicate the string that ends with gbi and determine whether the string ends.
Here [...] means matching the values of a and f, B and f, or c and f in parentheses to return the list.
"\ D" is a regular expression used to match the number between 0 and 9, it should be noted that 11 will be treated as strings '1' and '1', instead of returning the string '11.
Of course, the solution is to write a few \ d numbers for a few digits. The above example shows how to take the three digits in the string. Here we show the flexibility of regular expressions.
Here, d indicates that the number is 0-9, and D indicates that the number is not a number, that is, content other than a number is returned.
In the regular expression, "\ w" indicates that the match is from a to z, uppercase A to Z, and numbers 0 to 9 contain the preceding three types, as shown in the preceding figure.
"\ W" indicates matching special characters except letters and numbers in the regular expression. However, here, the use of the \ slash should be noted that the character string \ is an escape symbol specific to Baidu.
Here, the use of parentheses () indicates that matching is the content inside the brackets. Here, * is the matching criterion with the largest scope of greedy interests.
A question mark is added here .*? It is also called non-Greedy pattern matching. The result is to match the content of the two divs to return.
Here, re. I (capital I) indicates that matching is required regardless of the upper or lower case of the male mother. Otherwise, an empty list is returned if the above match is not found.
Here again, \ n is commonly known as the line break. Once the line feed program is executed, SB does not recognize it, so we added re. S (in upper case) means that it returns more than 70% matching methods after learning the above syntax and usage than matching all characters including line breaks, of course, there are still many ways I will not list them. You can learn them by yourself (I rarely use the rest ).
2. Usage and differences between match and search:
Re. matchTry to match a pattern from the starting position of the string. If the match is not successful, match () returns none.Re. searchScan the entire string and return the first successful match. It is easy to understand the code. As follows:
Print and Add. span () to the end. The position of the matched string is returned with a tuple (starting position, ending position). One of them is not written because it returns NULL and the compiler reports an error.
Is it clear at a glance? match only matches at the beginning. If no match is found, None is returned. I didn't add it here. group () is because the returned value is a null value. If the compiler is added, an error is returned. search does not scan the entire string. Of course, the regular expression method can be used to match the string, here we will not discuss more about practical training.
3. Usage of sub replacement
Sub is used to replace the matching items in a string. The syntax is generally re. sub (r 'regular matching rule', 'replacement string', and the string to be retrieved)
Here we intuitively reflect the result and replace # and the subsequent strings with the desired strings.
4. Final benefits
Before giving the final benefit, I hope you can practice the above usage and usage rules more effectively. You will only accumulate experience when there are many mistakes and many summary. The final benefits will be described as follows:
The killer product finally delivers a combination of multiple matches or sends | used to match multiple different mailboxes, as long as you are familiar with the above methods, more than 70% of the places can be used, the last statement is just for the younger brother to understand and share. If you ignore it, ignore it. Thank you. The last line is the old one: Thank you for watching. Goodbye next time!