"Getting Started with web crawler 04" Thoroughly mastering BeautifulSoup CSS Selectors

Source: Internet
Author: User
Tags tag name

"Getting Started with web crawler 04" Thoroughly mastering BeautifulSoup CSS Selectors

Guangdong Vocational and Technical College Aohaoyuan 2017-10-21

1. Introduction

At present, in addition to the official documents, the market and the network in detail beautifulsoup use of technical books and blog soft text is not much, and in this only information about CSS selectors less. In the web crawler page parsing, the CCS selector is actually a highly efficient tool. Although the information is not much, but the official documents are very detailed, but the drawback is the need for a certain basis to understand, and there is no small and refined demonstration examples. However, in this article, you can see ... Absolutely work!

2. CSS Selector overview

The BeautifulSoup supports most CSS selectors.
The syntax is: the string parameter is passed to the tag object or the. Select () method of the BeautifulSoup object, and the selected result is returned as a list, that is, the return type is list.
tag. Select("string")
BeautifulSoup. Select ("string")
Note: When you get an element with a specific CSS attribute, the tag name does not have any decorations, the class name is added in front of it, and the ID name is plus #.

3. CSS Test examples

4. Search by tags

Example 1: Select all the title tags.

Example 2: Select the 3rd label in all P tags.

Example 3: Select all a labels under the Body tab.

Example 4: Select direct Child label A under the body tag.

Example 5: Select all sibling node tags after id=link1. Before the class name, add # to the ID name.

Example 6: Select the next sibling node tag after Id=link1.

5. Search by CSS class name

Example 7: Find a label with class name sister.

Example 8: look for a label under the P tag with the class name title.

6. Find by Tag id attribute

Example 9: Select all tags with the id attribute of LINK2.

Example: Select the A tag whose id attribute is link2.

7. Querying elements with multiple CSS selectors at the same time

Example: Select all tags with the id attribute of LINK2 and the id attribute as LINK3.

Example: Select the Class property of red, the id attribute is LINK2, and the id attribute is link3 all labels.

8. Find by whether a property exists

Example: look for a label that has the Herf attribute under the a tag.

9, through the value of the property to find

Example: Select the A tag and its properties Href=http://example.com/lacie all tags.

Example: Select the A tag, whose href attribute is all tags that begin with HTTP.

Example: Select the A tag, whose href attribute is all tags ending with lie.

Example: Select the A tag whose href attribute contains the label of the. com.

10. Search by Layer by tag

Example: First select the 3rd label in all P tags, and then look in the label for the label with the name's property value Ohy.

Example: first select the 3rd label in all p tags, then find a label in the list of labels, and remove the text from the 1th label in the list.

11. Returns the first label of the found element

Example: Select the first of all the tags that have a class name of sister.

12. Summary

If you want to quickly implement a more powerful web crawler, then the BEAUTIFULSOUPCSS selector will be one of your essential tools. BeautifulSoup integrates the syntax of the CSS selector with its own easy-to-use API. In the development of the web crawler, for those familiar with CSS selector syntax, the use of CSS selectors is a very convenient way.

"Getting Started with web crawler 04" Thoroughly mastering BeautifulSoup CSS Selectors

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.