You can refer directly to the BS4 documentation: Https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html#find-all
Note that the following are :
1. Some tag properties cannot be used in search, such as the data-* attribute in HTML5 :
BeautifulSoup(' <div data-foo= ' value ' >foo!</div> ')data_soup. Find_all(data-foo="value")# syntaxerror:keyword can ' t be an expression
However, you can use the attrs parameter of the Find_all () method to define a dictionary parameter to search for tags that contain special attributes:
Data_soup. Find_all(attrs={"Data-foo" "Value"})
An expression can be a string, a Boolean value, a regular expression
2.The class attribute should be class_= ""
Find_all (name, Attrs, recursive, text, **kwargs)
The Find_all () method searches all the tag child nodes of the current tag and determines whether the filter is eligible. Here are a few examples:
Soup.Find_all("Title")# [<title>the dormouse ' s story</title>]Soup.Find_all("P","Title")# [<p class= "title" ><b>the dormouse ' s story</b></p>]Soup.Find_all(A)# [<a class= "sister" href= "Http://example.com/elsie" id= "Link1" >ELSIE</A># <a class= "sister" href= "Http://example.com/lacie" id= "Link2" >LACIE</A># <a class= "sister" href= "Http://example.com/tillie" id= "Link3" >TILLIE</A>] soup. Find_all (id= "Link2" ) # [<a class= "sister" href= "Http://example.com/lacie" id= "Link2" >LACIE</A>] import resoup. Find (text=re. Compile ( "sisters" )) # U ' Once upon a time there were Three Little Sisters; And their names were\n '
There are several methods that are similar, and several are new, what does the text and ID in the argument mean? Why does Find_all ("P", "title") return the <p> tag of CSS class "title"? Let's take a closer look at the parameters of the find_all ()
Name parameter
The name parameter can find all tags named name, and the string object is automatically ignored.
The simple usage is as follows:
Soup. Find_all("title")# [<title>the dormouse ' s story</title>]
Reiterate: Searching for the value of the name parameter can make any type of filter, character channeling, regular expression, list, method, or True .
Keyword parameters
If a parameter of the specified name is not a search for the built-in parameter name, the search will search for the parameter as a property of the specified name tag, and if a parameter named ID is included, Beautiful soup will search for each tag's "id" attribute.
Soup. Find_all(id=' link2 ')# [<a class= "sister" href= "Http://example.com/lacie" id= "Link2 ">LACIE</A>]
If the href parameter is passed in, Beautiful soup searches for the "href" attribute of each tag:
Soup. Find_all(href=re. ) Compile("Elsie"))# [<a class= "sister" href= "Http://example.com/elsie" id= "Link1" >elsie </a>]
A parameter value that can be used when searching for a property of a specified name includes a string, a regular expression, a list, and True.
The following example finds all tags that contain the ID attribute in the document tree, regardless of the value of the ID :
Soup. Find_all(id=True)# [<a class= "sister" href= "Http://example.com/elsie" id= "Link1" > Elsie</a>,# <a class= "sister" href= "Http://example.com/lacie" id= "Link2" >LACIE</A>,# <a class= "Sister" href= "Http://example.com/tillie" id= "Link3" >TILLIE</A>]
You can filter multiple properties of a tag at the same time by using multiple parameters of the specified name:
Soup. Find_all(href=re. ) Compile("Elsie"id=' Link1 ')# [<a class= ' sister ' href= ' http://example.com/ Elsie "id=" Link1 ">THREE</A>]
Some tag properties cannot be used in search, such as the data-* attribute in HTML5:
BeautifulSoup(' <div data-foo= ' value ' >foo!</div> ')data_soup. Find_all(data-foo="value")# syntaxerror:keyword can ' t be an expression
However, you can use the attrs parameter of the Find_all () method to define a dictionary parameter to search for tags that contain special attributes:
Data_soup. Find_all(attrs={"Data-foo""value"})# [<div data-foo= ' value ' >foo!</div ;]
Search by CSS
The ability to search tag by CSS class name is very useful, but the keyword class that identifies the CSS class name is reserved in Python, and using the class parameter causes a syntax error. from beautiful Soup in version 4.1.1, you can search for tags with the specified CSS class name through the class_ parameter:
Soup. Find_all("a"class_="Sister")# [<a class= "sister" href= "Http://example.com/elsie" Id= "Link1" >ELSIE</A>,# <a class= "sister" href= "Http://example.com/lacie" id= "Link2" >lacie</ A>,# <a class= "sister" href= "Http://example.com/tillie" id= "Link3" >TILLIE</A>]
The class_ parameter also accepts different types of filters , strings, regular expressions, methods, or True :
Soup.Find_all(Class_=Re.Compile("ITL"))# [<p class= "title" ><b>the dormouse ' s story</b></p>]DefHas_six_characters (css_classreturn css_class Span class= "ow" >is not none and len (css_class) == 6 soup. Find_all (class_=has_six_characters ) # [<a class= "sister" href= "Http://example.com/elsie" id= "Link1" >elsie</a >,# <a class= "sister" href= "Http://example.com/lacie" id= "Link2" >lacie</a>,# <a class= "sister" href= "Http://example.com/tillie" id= "Link3" >TILLIE</A>]
Tag class property is multivalued attribute . When searching for tags by CSS class name, you can search each CSS class name in tag individually:
css_soup = beautifulsoup ( " <p class= "body strikeout" ></p> ' ) css_soup. Find_all ( "P" class_= "strikeout" ) # [<p class= "body strikeout" ></p>]css_soup. Find_all ( "P" class_= "body" ) # [<p class= "body strikeout" ></P>]
The search for class attributes can also be fully matched by CSS values:
Css_soup. Find_all("P"class_="body strikeout")# [<p class= "body strikeout" ></P>]
If the order of the CSS class name does not match the actual value of class , the result will not be searched:
Soup. Find_all("a"attrs={"class""Sister"})# [<a class= "sister" href= "/HTTP/ Example.com/elsie "id=" Link1 ">ELSIE</A>,# <a class=" sister "href=" Http://example.com/lacie "id=" Link2 ">LACIE</A>,# <a class=" sister "href=" Http://example.com/tillie "id=" Link3 ">tillie</a ;]
textParameters
The text parameter allows you to search the contents of a string in a document. Like the optional value of the name parameter, the text parameter accepts a string, a regular expression, a list, and True. See Example:
Soup.find_all (text= "Elsie") # [u ' Elsie ']soup.find_all (text=["Tillie", "Elsie", "Lacie"]) # [u ' Elsie ', U ' Lacie ', U ' Tillie ']soup.find_all (text=re.compile ("Dormouse")) [u "the Dormouse ' s story", U "the Dormouse's story"]def Is_the_only_ String_within_a_tag (s): "" "Return True if this string was the only child of its parent tag." " return (s = = s.parent.string) Soup.find_all (text=is_the_only_string_within_a_tag) # [u "the Dormouse ' s story", U "the Dormouse ' s story ", U ' Elsie ', U ' Lacie ', U ' Tillie ', u ' ... ')
Although the text parameter is used to search for a string, it can be mixed with other parameters to filter the tag. Beautiful soup will find the . String method that matches the value of the text parameter. The following code is used to search for the <a> tag containing "Elsie" in the content:
Soup. Find_all("A"text="Elsie")# [<a href= "Http://example.com/elsie" class= "Sister "Id=" Link1 ">ELSIE</A>]
LimitParameters
The Find_all () method returns the entire search structure, and if the document tree is large then the search will be slow. If we don't need all the results, you can use the limit parameter to limit the number of results returned. The effect is similar to the Limit keyword in sql, The search results are stopped when the number of results reached limits the limit.
There are 3 tags in the document tree that match the search criteria, but the results return only 2 because we limit the number of returns:
Soup. Find_all("A"limit=2)# [<a class= "sister" href= "Http://example.com/elsie" id= " Link1 ">ELSIE</A>,# <a class=" sister "href=" Http://example.com/lacie "id=" Link2 ">lacie</a ;]
RecursiveParameters
When you call the Find_all () method of tag, Beautiful soup retrieves all descendants of the current tag, and if you only want to search for the direct child node of the tag, you can use the parameter recursive=false .
A simple document:
Whether to use the search results for the recursive parameter:
Soup. HTML. Find_all("title")# [<title>the dormouse ' s story</title>]soup. HTML. Find_all("title"Recursive=False)# []
Like callingFind_all ()The same call tagFind_all () is almost the most commonly used search method in beautiful soup, so we have defined its shorthand method. The beautifulsoup object and the tag object can be used as a method that executes the same as the Find_all () method that invokes the object, and the following two lines of code are equivalent:
Soup. Find_all("a")soup("a ")
These two lines of code are also equivalent:
Soup. Title. Find_all(text=True)soup. Title(text=True)
Use of BS4 (BEAUTIFULSOUP4)--find_all ()