BS4 has a find_all (Name,attrs,recursive,string,**kwargs) method that returns a list type that stores the results of a lookup
Name retrieves a string for the label name
Attrs retrieves a string for the value of a tag property, which can be indexed to find whether a particular string is contained in a tag.
Recursive whether to retrieve all descendants, by default True
String <>...</> retrieving strings in the string area
To illustrate:
Name
Soup.find_all ('a')#returns the contents of a labelSoup.find_all (['a','b'])#returns the contents of a and B tags forTaginchSoup.find_all (True):#Print all label names in a document Print(Tag.name)" "back to Htmlheadtitlebodypbpaa" "#after the use of regularization:ImportRe#if we just want to get a label that starts with B, n then we need a regular expression, and re is the corresponding library forTaginchSoup.find_all (Re.compile ('b')): Print(Tag.name)#returns body B
Attrs:
Soup.find_all ('P','Course')#find information that contains ' course ' in the P tagSoup.find_all (ID='Link1')" "return to [<a class= "Py1" href= "http://www.icourse163.org/course/BIT-268001" id= "Link1" >basic Python</a>] " "Soup.find_all ('Link')#return []ImportResoup.find_all (ID=re.compile ('Link'))#use regular expressions to find tag content that contains link" "[<a class= "Py1" href= "http://www.icourse163.org/course/BIT-268001" id= "Link1" >basic python</a> <a class= "Py2" href= "http://www.icourse163.org/course/BIT-1001870001" id= "Link2" >advanced Python</a>] " "
Recursive
Soup.find_all ('a', recursive=False)# return [] indicates that the son does not have a label on the node
String
Soup.find_all (string='basic python')#[' Basic Python '] Import resoup.find_all (String=re.compile ('python'))# All occurrences of a python string in a string retrieve the "' Thisis a Python demo page ', ' The demo Python introduces several Python co Urses. '] " "
In addition, we can use
<tag> (..) Equivalent to <tag>.find_all (..)
Soup (..) Equivalent to Soup.find_all (..)
Extension methods for Find
Method |
Description |
<>.find () |
Search for tangent returns only one result, string type, same as Find_all () parameter |
<>.find_parents () |
Search in ancestor node, return list type, same as Find_all () parameter |
<>.find_parent () |
Returns a result in the ancestor node, ibid. |
<>.find_next_siblings () |
Search in subsequent parallel nodes, ibid. |
<>.find_next_sibling () |
Returns a result in the subsequent node, as above |
<>.find_previous_siblings () |
Search in a parallel node of the previous sequence, ibid. |
<>.find_previous_sibling () |
Returns a result in a sequential parallel node, as above |
Crawler-An HTML content lookup method based on BS4 library