Python crawler Regular Expression common symbols and methods, python Crawler

Source: Internet
Author: User

Python crawler Regular Expression common symbols and methods, python Crawler

Regular expressions are not part of Python. Regular Expressions are powerful tools used to process strings. They have their own unique syntax and an independent processing engine, which may not be as efficient as the built-in str method, but are very powerful. Thanks to this, in languages that provide regular expressions, the syntax of regular expressions is the same. The difference is that different programming languages support different syntaxes, unsupported syntax is usually not commonly used.

1. common symbols

.: Match any character, except for line break \ n

: Match the first character 0 times or unlimited times
? : Match the first character 0 times or 1 time

. *: Greedy algorithm, matching as many characters as possible

.*? : Non-Greedy Algorithm

(): The data in parentheses is returned as a result.

2. Common Methods

Findall: matches all contents that conform to the rule and returns a list containing results

Search: matches and extracts the first regular content, and returns a regular expression object.

Sub: Replace the regular content and return the value after replacement.

3. Example

(1) Example of use, matching any character, except line break \ n

Import re # import the re Library File

A = 'xy123'

B = re. findall ('x .. ',)

Print B

The printed result is: ['xy1']. Each. represents a placeholder.

(2) * example, matching the first character 0 times or unlimited times

A = 'xy123'

B = re. findall ('x',)

Print B

The output is: ['x', '', 'x','']

(3 )? For example, match the first character 0 or 1 time

A = 'xy123'

B = re. findall ('x? ',)

Print B

The output is as follows: ['x', '']

(4). * example

Secret_code = 'hadkfalifexxixxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'

B = re. findall ('xx. * xx', secret_code)

Print B

The printed result is: ['xxixxfasdjifja134xxlovexx23345sdfxxyouxx']

(5 ).*? Examples

Secret_code = 'hadkfalifexxixxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'

C = re. findall ('xx .*? Xx', secret_code)

Print c

The printed result is: ['xxixx', 'xxlovexx', 'xxyouxx']

(6) () Example

Secret_code = 'hadkfalifexxixxfasdjifja134xxlovexx23345sdfxxyouxx8dfse'

D = re. findall ('xx (.*?) Xx', secret_code)

Print d

The printed result is: ['I', 'love', 'you']. The data in the brackets is returned.

(7) Example of re. S

S = ''' sdfxxhello

Xxfsdfxxworldxxasdf '''

D = re. findall ('xx (.*?) Xx ', s, re. S)

Print d

The printed result is: ['Hello \ n', 'World']. The role of re. S is to enable. \ n

(8) Example of findall

S2 = 'asdfxxixx123xxlovexxdfd'

F2 = re. findall ('xx (.?) Xx123xx (.?) Xx', s2)

Print f20

The printed result is: love.

At this time, f2 contains a list of one tuples. The tuples contain two elements. The two elements in the tuples are the content matched by two (). If s2 contains multiple 'xx (.?) Xx123xx (.?) Xx, f2 contains multiple tuples;

(9) Example of search

S2 = 'asdfxxixx123xxlovexxdfd'

F = re. search ('xx (.?) Xx123xx (.?) Xx ', s2). group (2)

Print f

The printed result is: love.

. Group (2) indicates that the content matching the second parentheses is returned. If it is. group (1), the printed content is: I

(10) Example of sub

S = '123rrr123'

Output = re. sub ('100 (.*?) 123 ', '2014 d123' % 123%, s)

Print output

The output is 123789123.

% D is similar to % d in C. If output = re. sub ('100 (.*?) 123 ', '100', s), and the output result is also: 123789123

(11) Examples of \ d used to match numbers

A = 'asdfasf1234567fasd555fas'

B = re. findall ('(\ d +)',)

Print B

The printed result is: ['123', '123'], \ d + can match a numeric string;

The above are some commonly used symbols and syntaxes for python crawler regular expressions. I hope it will be helpful for beginners of python.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.