Detailed description: How Python uses the BeautifulSoup module to search for content

Last Update:2017-05-14 Source: Internet

Author: User

Tags return tag

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the search method functions of the BeautifulSoup module in python. Different types of filter parameters can be filtered to get the desired results. This article is very detailed and has some reference value for everyone. let's take a look at it. This article mainly introduces the search method functions of the Beautiful Soup module in python. Different types of filter parameters can be filtered to get the desired results. This article is very detailed and has some reference value for everyone. let's take a look at it.

Preface

We will use the search function of the Beautiful Soup module to search by tag name, tag attribute, document text, and regular expression.

Search method

Beautiful Soup's built-in search method is as follows:

Find ()
Find_all ()
Find_parent ()
Find_parents ()
Find_next_sibling ()
Find_next_siblings ()
Find_previus_sibling ()
Find_previus_siblings ()
Find_previous ()
Find_all_previous ()
Find_next ()
Find_all_next ()

Search using the find () method

First, you need to create an HTML file for testing.

plants

100000
algae

100000

deer

1000
rabbit

2000

fox

100
bear

100

lion

80
tiger

50

We can usefind() Method to obtain

Tag. by default, the first tag appears.
Tag. the output content is used to verify whether the first tag is obtained.
```
from bs4 import BeautifulSoupwith open('search.html','r') as filename: soup = BeautifulSoup(filename,'lxml')first_ul_entries = soup.find('ul')print first_ul_entries.li.p.string
```
The find () method is as follows:
```
find(name,attrs,recursive,text,**kwargs)
```
As shown in the code above,find()The method accepts five parameters: name, attrs, recursive, text, and ** kwargs. The name, attrs, and text parameters can be found infind()The method acts as a filter to improve the accuracy of matching results.
Search tags
Except for the search of the above code
- Tag. the returned result is also the first matched content.
```
tag_li = soup.find('li')# tag_li = soup.find(name = "li")print type(tag_li)print tag_li.p.string
```
  Search text
  If we only want to search by text content, we can pass in only text parameters:
```
search_for_text = soup.find(text='plants')print type(search_for_text)
     
```
  The returned result is also a NavigableString object.
  Search by regular expression
  The following HTML text
```
The below HTML has the information that has email ids.
 abc@example.com xyz@example.com
  foo@example.com
```
  The abc @ example email address is not included in any tag, so you cannot find the email address based on the tag. In this case, we can use regular expressions for matching.
```
email_id_example = """ The below HTML has the information that has email ids.
 abc@example.com xyz@example.com
 foo@example.com """email_soup = BeautifulSoup(email_id_example,'lxml')print email_soup# pattern = "\w+@\w+\.\w+"emailid_regexp = re.compile("\w+@\w+\.\w+")first_email_id = email_soup.find(text=emailid_regexp)print first_email_id
```
  When a regular expression is used for matching, if multiple matches exist, the first one is returned first.
  Search by tag attribute value
  You can search by tag attribute values:
```
search_for_attribute = soup.find(id='primaryconsumers')print search_for_attribute.li.p.string
```
  Searching based on tag attribute values is available for most attributes, such as id, style, and title.
  However, the two cases may be different:
  - Custom attributes
  - Class attributes
  We can no longer directly use attribute values for search, but must use the attrs parameter to pass itfind()Function.
  Search by custom attributes
  You can add custom attributes to tags in HTML5, for example, adding attributes to tags.
  As shown in the following code, if we perform operations like search id, an error is returned. The Python variable cannot contain the-symbol.
```
customattr = """ custom attribute example
   """customsoup = BeautifulSoup(customattr,'lxml')customsoup.find(data-custom="custom")# SyntaxError: keyword can't be an expression
```
  At this time, the attrs attribute value is used to pass a dictionary type as the parameter for search:
```
using_attrs = customsoup.find(attrs={'data-custom':'custom'})print using_attrs
```
  Search based on classes in CSS
  For CSS class attributes, because class is a keyword in Python, it cannot be passed as a tag attribute parameter. in this case, it is searched like a custom attribute. It also uses the attrs attribute to pass a dictionary for matching.
  In addition to the attrs attribute, you can also use the class _ attribute for transmission, so that it is different from the class and will not cause errors.
```
css_class = soup.find(attrs={'class':'producerlist'})css_class2 = soup.find(class_ = "producerlist")print css_classprint css_class2
```
  Use custom function search
  You canfind() Method to pass a function, so that the search will be performed according to the conditions defined by the function.
  The function should return true or false values.
```
def is_producers(tag): return tag.has_attr('id') and tag.get('id') == 'producers'tag_producers = soup.find(is_producers)print tag_producers.li.p.string
```
  The code defines an is_producers function, which checks whether the tag has a specific id attribute and whether the attribute value is equal to producers. if the tag meets the condition, true is returned. otherwise, false is returned.
  Combined use of various search methods
  Beautiful Soup provides various search methods. Likewise, we can use these methods together to improve search accuracy.
```
combine_html = """   Example of p tag with class identical 
   Example of p tag with class identical 
 """combine_soup = BeautifulSoup(combine_html,'lxml')identical_p = combine_soup.find("p",class_="identical")print identical_p
```
  Use find_all () to search
  Usefind()The method returns the first matched content from the search results.find_all()Method returns all matching items.
  Infind() The filtering items used in the method can also be used infind_all() Method. In fact, they can be used in any search method, for example:find_parents()Andfind_siblings().
```
# Search for tags whose class attributes are tertiaryconsumerlist. All_tertiaryconsumers = soup. find_all (class _ = 'tertiaryconsumerlist') print type (all_tertiaryconsumers) for tertiaryconsumers in all_tertiaryconsumers: print tertiaryconsumers. p. string
```
  find_all() Method:
```
find_all(name,attrs,recursive,text,limit,**kwargs)
```
  Its parameters andfind()The method is similar. Multiple limit parameters are used. The limit parameter is used to limit the number of results. Whilefind()The limit of the method is 1.
  At the same time, we can also pass a string list parameter to search for tags, tag attribute values, custom attribute values, and CSS classes.
```
# Search all p and li tags p_li_tags = soup. find_all (["p", "li"]) print p_li_tagsprint # search for all class attributes. the label all_css_class = soup of producerlist and primaryconsumerlist is used. find_all (class _ = ["producerlist", "primaryconsumerlist"]) print all_css_classprint
```
  Search related tags
  Generally, we can usefind()Andfind_all() You can also search for tags of interest related to these tags.
  Search for parent tags
  Availablefind_parent() Orfind_parents() Method to search for the parent tag of a tag.
  find_parent()The method returns the first matched content, whilefind_parents()All matched content will be returned.find() Andfind_all()The method is similar.
```
# Search for the parent tag primaryconsumers = soup. find_all (class _ = 'primaryconsumerlist') print len (primaryconsumers) # retrieve the first primaryconsumer = primaryconsumers [0] # search for all ul parent labels parent_ul = primaryconsumer. find_parents ('Ul ') print len (parent_ul) # The result will contain all the content of the parent tag print parent_ulprint # Search, take the first parent tag that appears. there are two operations: immediateprimary_consumer_parent = primaryconsumer. find_parent () # immediateprimary_consumer_parent = primaryconsumer. find_parent ('Ul ') print immediateprimary_consumer_parent
```
  Search for peer tags
  Beautiful Soup also provides the ability to search for peer tags.
  Use functionsfind_next_siblings()The function can search for all the next tags at the same level, whilefind_next_sibling() The function can search for the next tag at the same level.
```
producers = soup.find(id='producers')next_siblings = producers.find_next_siblings()print next_siblings
```
  You can also use find_previous_siblings() Andfind_previous_sibling() Method to search for tags of the same level.
  Search for the next tag
  Usefind_next() The method will search for the first one in the next tag, andfind_next_all()All lower-level label items are returned.
```
# Search for the next-level tag first_p = soup. pall_li_tags = first_p.find_all_next ("li") print all_li_tags
```
  Search for the previous tag
  Similar to searching for the next tag, usefind_previous()Andfind_all_previous() Method to search for the previous tag.
  The above is a detailed description of how Python uses the Beautiful Soup module to search for content. For more information, see other related articles in the first PHP community!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More