Tag. the returned result is also the first matched content.
tag_li = soup.find('li')# tag_li = soup.find(name = "li")print type(tag_li)print tag_li.p.string
Search text
If we only want to search by text content, we can pass in only text parameters:
search_for_text = soup.find(text='plants')print type(search_for_text)
The returned result is also a NavigableString object.
Search by regular expression
The following HTML text
The below HTML has the information that has email ids.
abc@example.com xyz@example.com
foo@example.com
The abc @ example email address is not included in any tag, so you cannot find the email address based on the tag. In this case, we can use regular expressions for matching.
email_id_example = """ The below HTML has the information that has email ids.
abc@example.com xyz@example.com
foo@example.com """email_soup = BeautifulSoup(email_id_example,'lxml')print email_soup# pattern = "\w+@\w+\.\w+"emailid_regexp = re.compile("\w+@\w+\.\w+")first_email_id = email_soup.find(text=emailid_regexp)print first_email_id
When a regular expression is used for matching, if multiple matches exist, the first one is returned first.
Search by tag attribute value
You can search by tag attribute values:
search_for_attribute = soup.find(id='primaryconsumers')print search_for_attribute.li.p.string
Searching based on tag attribute values is available for most attributes, such as id, style, and title.
However, the two cases may be different:
Custom attributes
Class attributes
We can no longer directly use attribute values for search, but must use the attrs parameter to pass itfind()
Function.
Search by custom attributes
You can add custom attributes to tags in HTML5, for example, adding attributes to tags.
As shown in the following code, if we perform operations like search id, an error is returned. The Python variable cannot contain the-symbol.
customattr = """ custom attribute example
"""customsoup = BeautifulSoup(customattr,'lxml')customsoup.find(data-custom="custom")# SyntaxError: keyword can't be an expression
At this time, the attrs attribute value is used to pass a dictionary type as the parameter for search:
using_attrs = customsoup.find(attrs={'data-custom':'custom'})print using_attrs
Search based on classes in CSS
For CSS class attributes, because class is a keyword in Python, it cannot be passed as a tag attribute parameter. in this case, it is searched like a custom attribute. It also uses the attrs attribute to pass a dictionary for matching.
In addition to the attrs attribute, you can also use the class _ attribute for transmission, so that it is different from the class and will not cause errors.
css_class = soup.find(attrs={'class':'producerlist'})css_class2 = soup.find(class_ = "producerlist")print css_classprint css_class2
Use custom function search
You canfind()
Method to pass a function, so that the search will be performed according to the conditions defined by the function.
The function should return true or false values.
def is_producers(tag): return tag.has_attr('id') and tag.get('id') == 'producers'tag_producers = soup.find(is_producers)print tag_producers.li.p.string
The code defines an is_producers function, which checks whether the tag has a specific id attribute and whether the attribute value is equal to producers. if the tag meets the condition, true is returned. otherwise, false is returned.
Combined use of various search methods
Beautiful Soup provides various search methods. Likewise, we can use these methods together to improve search accuracy.
combine_html = """ Example of p tag with class identical
Example of p tag with class identical
"""combine_soup = BeautifulSoup(combine_html,'lxml')identical_p = combine_soup.find("p",class_="identical")print identical_p
Use find_all () to search
Usefind()
The method returns the first matched content from the search results.find_all()
Method returns all matching items.
Infind()
The filtering items used in the method can also be used infind_all()
Method. In fact, they can be used in any search method, for example:find_parents()
Andfind_siblings()
.
# Search for tags whose class attributes are tertiaryconsumerlist. All_tertiaryconsumers = soup. find_all (class _ = 'tertiaryconsumerlist') print type (all_tertiaryconsumers) for tertiaryconsumers in all_tertiaryconsumers: print tertiaryconsumers. p. string
find_all()
Method:
find_all(name,attrs,recursive,text,limit,**kwargs)
Its parameters andfind()
The method is similar. Multiple limit parameters are used. The limit parameter is used to limit the number of results. Whilefind()
The limit of the method is 1.
At the same time, we can also pass a string list parameter to search for tags, tag attribute values, custom attribute values, and CSS classes.
# Search all p and li tags p_li_tags = soup. find_all (["p", "li"]) print p_li_tagsprint # search for all class attributes. the label all_css_class = soup of producerlist and primaryconsumerlist is used. find_all (class _ = ["producerlist", "primaryconsumerlist"]) print all_css_classprint
Search related tags
Generally, we can usefind()
Andfind_all()
You can also search for tags of interest related to these tags.
Search for parent tags
Availablefind_parent()
Orfind_parents()
Method to search for the parent tag of a tag.
find_parent()
The method returns the first matched content, whilefind_parents()
All matched content will be returned.find()
Andfind_all()
The method is similar.
# Search for the parent tag primaryconsumers = soup. find_all (class _ = 'primaryconsumerlist') print len (primaryconsumers) # retrieve the first primaryconsumer = primaryconsumers [0] # search for all ul parent labels parent_ul = primaryconsumer. find_parents ('Ul ') print len (parent_ul) # The result will contain all the content of the parent tag print parent_ulprint # Search, take the first parent tag that appears. there are two operations: immediateprimary_consumer_parent = primaryconsumer. find_parent () # immediateprimary_consumer_parent = primaryconsumer. find_parent ('Ul ') print immediateprimary_consumer_parent
Search for peer tags
Beautiful Soup also provides the ability to search for peer tags.
Use functionsfind_next_siblings()
The function can search for all the next tags at the same level, whilefind_next_sibling()
The function can search for the next tag at the same level.
producers = soup.find(id='producers')next_siblings = producers.find_next_siblings()print next_siblings
You can also use find_previous_siblings()
Andfind_previous_sibling()
Method to search for tags of the same level.
Search for the next tag
Usefind_next()
The method will search for the first one in the next tag, andfind_next_all()
All lower-level label items are returned.
# Search for the next-level tag first_p = soup. pall_li_tags = first_p.find_all_next ("li") print all_li_tags
Search for the previous tag
Similar to searching for the next tag, usefind_previous()
Andfind_all_previous()
Method to search for the previous tag.
The above is a detailed description of how Python uses the Beautiful Soup module to search for content. For more information, see other related articles in the first PHP community!