Python crawler Regular Expression common symbols and methods, python Crawler
Regular expressions are not part of Python. Regular Expressions are powerful tools used to process strings. They have their own unique syntax and an independent processing engine, which may not be as efficient as the built-in str method, but are very powerful. Thanks to this, in languages that provide regular expressions, the syntax of regular expressions is the same. The difference is that different programming languages support different syntaxes, unsupported syntax is usually not commonly used.
1. common symbols
.: Match any character, except for line break \ n
: Match the first character 0 times or unlimited times
? : Match the first character 0 times or 1 time
. *: Greedy algorithm, matching as many characters as possible
.*? : Non-Greedy Algorithm
(): The data in parentheses is returned as a result.
2. Common Methods
Findall: matches all contents that conform to the rule and returns a list containing results
Search: matches and extracts the first regular content, and returns a regular expression object.
Sub: Replace the regular content and return the value after replacement.
3. Example
(1) Example of use, matching any character, except line break \ n
Import re # import the re Library File
A = 'xy123'
B = re. findall ('x .. ',)
Print B
The printed result is: ['xy1']. Each. represents a placeholder.
(2) * example, matching the first character 0 times or unlimited times
A = 'xy123'
B = re. findall ('x',)
Print B
The output is: ['x', '', 'x','']
(3 )? For example, match the first character 0 or 1 time
A = 'xy123'
B = re. findall ('x? ',)
Print B
The output is as follows: ['x', '']
(4). * example
Secret_code = 'hadkfalifexxixxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'
B = re. findall ('xx. * xx', secret_code)
Print B
The printed result is: ['xxixxfasdjifja134xxlovexx23345sdfxxyouxx']
(5 ).*? Examples
Secret_code = 'hadkfalifexxixxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'
C = re. findall ('xx .*? Xx', secret_code)
Print c
The printed result is: ['xxixx', 'xxlovexx', 'xxyouxx']
(6) () Example
Secret_code = 'hadkfalifexxixxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'
D = re. findall ('xx (.*?) Xx', secret_code)
Print d
The printed result is: ['I', 'love', 'you']. The data in the brackets is returned.
(7) Example of re. S
S = ''' sdfxxhello
Xxfsdfxxworldxxasdf '''
D = re. findall ('xx (.*?) Xx ', s, re. S)
Print d
The printed result is: ['Hello \ n', 'World']. The role of re. S is to enable. \ n
(8) Example of findall
S2 = 'asdfxxixx123xxlovexxdfd'
F2 = re. findall ('xx (.?) Xx123xx (.?) Xx', s2)
Print f20
The printed result is: love.
At this time, f2 contains a list of one tuples. The tuples contain two elements. The two elements in the tuples are the content matched by two (). If s2 contains multiple 'xx (.?) Xx123xx (.?) Xx, f2 contains multiple tuples;
(9) Example of search
S2 = 'asdfxxixx123xxlovexxdfd'
F = re. search ('xx (.?) Xx123xx (.?) Xx ', s2). group (2)
Print f
The printed result is: love.
. Group (2) indicates that the content matching the second parentheses is returned. If it is. group (1), the printed content is: I
(10) Example of sub
S = '123rrr123'
Output = re. sub ('100 (.*?) 123 ', '2014 d123' % 123%, s)
Print output
The output is 123789123.
% D is similar to % d in C. If output = re. sub ('100 (.*?) 123 ', '100', s), and the output result is also: 123789123
(11) Examples of \ d used to match numbers
A = 'asdfasf1234567fasd555fas'
B = re. findall ('(\ d +)',)
Print B
The printed result is: ['123', '123'], \ d + can match a numeric string;
The above are some commonly used symbols and syntaxes for python crawler regular expressions. I hope it will be helpful for beginners of python.