"Getting Started with web crawler 04" Thoroughly mastering BeautifulSoup CSS Selectors
Guangdong Vocational and Technical College Aohaoyuan 2017-10-21
1. Introduction
At present, in addition to the official documents, the market and the network in detail beautifulsoup use of technical books and blog soft text is not much, and in this only information about CSS selectors less. In the web crawler page parsing, the CCS selector is actually a highly efficient tool. Although the information is not much, but the official documents are very detailed, but the drawback is the need for a certain basis to understand, and there is no small and refined demonstration examples. However, in this article, you can see ... Absolutely work!
2. CSS Selector overview
The BeautifulSoup supports most CSS selectors.
The syntax is: the string parameter is passed to the tag object or the. Select () method of the BeautifulSoup object, and the selected result is returned as a list, that is, the return type is list.
tag. Select("string")
BeautifulSoup. Select ("string")
Note: When you get an element with a specific CSS attribute, the tag name does not have any decorations, the class name is added in front of it, and the ID name is plus #.
3. CSS Test examples
4. Search by tags
Example 1: Select all the title tags.
Example 2: Select the 3rd label in all P tags.
Example 3: Select all a labels under the Body tab.
Example 4: Select direct Child label A under the body tag.
Example 5: Select all sibling node tags after id=link1. Before the class name, add # to the ID name.
Example 6: Select the next sibling node tag after Id=link1.
5. Search by CSS class name
Example 7: Find a label with class name sister.
Example 8: look for a label under the P tag with the class name title.
6. Find by Tag id attribute
Example 9: Select all tags with the id attribute of LINK2.
Example: Select the A tag whose id attribute is link2.
7. Querying elements with multiple CSS selectors at the same time
Example: Select all tags with the id attribute of LINK2 and the id attribute as LINK3.
Example: Select the Class property of red, the id attribute is LINK2, and the id attribute is link3 all labels.
8. Find by whether a property exists
Example: look for a label that has the Herf attribute under the a tag.
9, through the value of the property to find
Example: Select the A tag and its properties Href=http://example.com/lacie all tags.
Example: Select the A tag, whose href attribute is all tags that begin with HTTP.
Example: Select the A tag, whose href attribute is all tags ending with lie.
Example: Select the A tag whose href attribute contains the label of the. com.
10. Search by Layer by tag
Example: First select the 3rd label in all P tags, and then look in the label for the label with the name's property value Ohy.
Example: first select the 3rd label in all p tags, then find a label in the list of labels, and remove the text from the 1th label in the list.
11. Returns the first label of the found element
Example: Select the first of all the tags that have a class name of sister.
12. Summary
If you want to quickly implement a more powerful web crawler, then the BEAUTIFULSOUPCSS selector will be one of your essential tools. BeautifulSoup integrates the syntax of the CSS selector with its own easy-to-use API. In the development of the web crawler, for those familiar with CSS selector syntax, the use of CSS selectors is a very convenient way.
"Getting Started with web crawler 04" Thoroughly mastering BeautifulSoup CSS Selectors