This article describes how to search using regular expressions in the Python crawler package BeautifulSoup, including using regular expressions to search for a variety of possible keywords and finding tags with unknown attribute values, when using Beautiful Soup, you can specify the corresponding name and attrs to search for the desired html code.
However, sometimes, there are many possibilities for its name or attr value in the content to be processed, especially when it meets a certain rule, it cannot be written as a fixed value.
Therefore, you can use regular expressions to solve this problem.
For example,
crifan
The corresponding BeautifulSoup code is as follows:
h1userSoup = soup.find(name="h1", attrs={"class":"h1user"});
If html is like this:
crifan crifan 123 crifan 456
If you want to find all the codes that match the h1 condition at a time, you can only find the code of a single class = "h1user". The remaining two
class="h1user test1"
And
class="h1user test2"
I cannot find it.
In this case, you can use BeautifulSoup with very powerful functions:
The expression of regular expressions is supported in attrs.
.
You can write it as follows:
h1userSoupList = soup.findAll(name="h1", attrs={"class":re.compile(r"h1user(\s\w+)?")});
You can find:
class="h1user"class="h1user test1"class="h1user test2"
.
Such as tags, the content of xxx is unknown (variable ).
If you want to find the corresponding p tag, you do not know how to implement it before.
If it is written:
sopu.findAll("p", attrs={"aria-lable": "xxx"});
Xxx must be written out. If the attribute value is not written, you cannot use attrs, and you cannot find the tag of the attribute value here.
So:
747 scores
You can use:
soup.findAll("p", attrs={"aria-lable": True});
Find the p tag whose attribute contains aria-lable.
Therefore, we do not know how to deal with the above:
Use BeautifulSoup to find tags with unknown attribute values but known attributes
In this example, you can:
Reuse:
sopu.findAll("p", attrs={"aria-lable": True});
You can find the p tag containing the attribute aria-lable.