BeautifulSoup CSS SELECTORS/CSS selector for advanced applications

Source: Internet
Author: User

BeautifulSoup supports the most commonly used CSS selectors, which is the. Select () method that converts a string into a tag object or BeautifulSoup itself.

The HTML used in this article is:

Html_doc = "" "<html><head><title>The Dormouse ' s story</title></head><body><p class="title"><b>The Dormouse ' s story</b></P><p class="story">Once upon a time there were three Little sisters; and their names were<a href="Http://example.com/elsie" class="Sister" ID ="Link1">Elsie</a>,<a href="Http://example.com/lacie" class="Sister " id="Link2">Lacie</a>and<a href="Http://example.com/tillie" class="Sister" ID ="Link3">Tillie</a>; and they lived at the bottom of a well.</P><p class="story">...</P>"""

For example, you can search for notes like this:

soup.select("title")   #使用select函数# [<title>The Dormouse‘s story</title>]soup.select("p nth-of-type(3)")# [<p class="story">...</p>]

Alternatively, you can search for tags inside other parent tags, that is, through the tag's owning relationship :

Soup.select ("Body A") #搜索在body标签内部的aTags # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>,# <aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>,# <aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a>]soup.select ("HTML head title") #搜索在html the label inside the->head tag # [<title>the Dormouse ' s story</title>]

You can directly look for tags inside other tags :

Soup.select ("head > title") # [<title>the dormouse ' s Story</title>]soup.select ("p > a") # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>,# <aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>,# <aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a>]soup.select ("p > A:nth-of-type (2)") # [<aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>]soup.select ("p > #link1") # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>]soup.select ("Body > a")# []

get the brothers of an element through tags tags :

Soup.Select("#link1 ~ sister") #获得id为link1,classTagged content for sister's brother (all brother notes) # [<aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>,# <aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a>]soup.Select("#link1 +. Sister") #获得id为link1,classTagged content for sister's brother (next brother note) # [<aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>]

Get tags tags from css classes :

Soup.select (". Sister") #获得所有class为sister的标签 # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>,# <aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>,# <aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a>]soup.select ("[Class~=sister]") #效果同上一个 # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>,# <aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>,# <aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a;]

Get tags by id:

soup.select("#link1") #通过设置参数为id来获取该id对应的tag# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>]soup.select("a#link2")  #这里区别于上一个单纯的使用id,又增添了tag属性,使查找更加具体# [<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

Get tags by setting the parameters of the Select function as a list. It can be captured as long as it matches any one of the list.

Soup.select ("#link1, #link2") #捕获id为link1或link2的标签 # [<a class="sister" href="http:///example.com/  Elsie" id=" Link1 ">Elsie</a>, #<a class="sister" href="http://example.com /Lacie" id=" Link2 ">Lacie</a>]

To obtain a property according to whether the tag exists:

soup.select(‘a[href]‘) #获取a标签中具有href属性的标签# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

To find tags by a specific property value of a tag:

Soup.select ('a[href="Http://example.com/elsie"] ') # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>]soup.select ('a[href^="http://example.com/"] ') # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a>,# <aclass="Sister"href="Http://example.com/lacie"Id="Link2">lacie</a>,# <aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a>]soup.select ('a[href$="Tillie"] ') # [<aclass="Sister"href="Http://example.com/tillie"Id="Link3">tillie</a>]soup.select ('a[href*=". Com/el"] ') # [<aclass="Sister"href="Http://example.com/elsie"Id="Link1">elsie</a;]

Here's what you need to explain:
Soup.select (' a[href^= ' http://example.com/"]) means that the find HREF attribute value is a label that starts with the" http://example.com/"value, and you can view the blog introduction.
Soup.select (' a[href$= ' Tillie "]) means that the lookup href attribute value is a label that ends with Tillie.
Soup.select (' a[href*= '. Com/el "]) means that the string". Com/el "is found in the HREF attribute value, so only href=" Http://example.com/elsie "a match.

How to query the first label that meets the criteria for a query:

soup.select_one(".sister") #只查询符合条件的第一个tag# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

BeautifulSoup CSS SELECTORS/CSS selector for advanced applications

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.