Use regular expressions to search for Python crawler package BeautifulSoup

Source: Internet
Author: User
This article describes how to search using regular expressions in the Python crawler package BeautifulSoup, including using regular expressions to search for a variety of possible keywords and finding tags with unknown attribute values, when using Beautiful Soup, you can specify the corresponding name and attrs to search for the desired html code.

However, sometimes, there are many possibilities for its name or attr value in the content to be processed, especially when it meets a certain rule, it cannot be written as a fixed value.

Therefore, you can use regular expressions to solve this problem.
For example,

crifan

The corresponding BeautifulSoup code is as follows:

h1userSoup = soup.find(name="h1", attrs={"class":"h1user"});

If html is like this:

crifan crifan 123 crifan 456

If you want to find all the codes that match the h1 condition at a time, you can only find the code of a single class = "h1user". The remaining two

class="h1user test1"

And

class="h1user test2"

I cannot find it.

In this case, you can use BeautifulSoup with very powerful functions:

The expression of regular expressions is supported in attrs.

.

You can write it as follows:

h1userSoupList = soup.findAll(name="h1", attrs={"class":re.compile(r"h1user(\s\w+)?")});

You can find:

class="h1user"class="h1user test1"class="h1user test2"

.

Such as tags, the content of xxx is unknown (variable ).

If you want to find the corresponding p tag, you do not know how to implement it before.
If it is written:

sopu.findAll("p", attrs={"aria-lable": "xxx"});

Xxx must be written out. If the attribute value is not written, you cannot use attrs, and you cannot find the tag of the attribute value here.
So:

747 scores

You can use:

soup.findAll("p", attrs={"aria-lable": True});

Find the p tag whose attribute contains aria-lable.

Therefore, we do not know how to deal with the above:

Use BeautifulSoup to find tags with unknown attribute values but known attributes

In this example, you can:

Reuse:

sopu.findAll("p", attrs={"aria-lable": True});

You can find the p tag containing the attribute aria-lable.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.