Python Crawler Toolkit __python

Source: Internet
Author: User
Tags xpath

Requests Package: is a practical Python HTTP client library, writing crawler from the Web crawl data often used, simple and practical, interface simple, Requests.get (URL).


lxml package: Mainly used to parse the HTML content crawled through requests, extract the data we need, use XPath syntax to extract and filter HTML text content, lxml use XPath syntax to locate and filter the HTML content.


Use of lxml packages:
The Lxml tool lets you extract the data we need from the HTML code
A Web page is an HTML file.
Need to organize (organize into a tree structure) the contents of an HTML-formatted file through lxml
HTML file is a tree structure-the directory structure of the type Linux system
After the lxml is organized into a tree structure, then the content is positioned, filtered, filtered by using XPath syntax


The syntax of XPath is used:
Path representation (using XPath syntax to represent the path of a label in an XML literal)
The div navigates to all the div tags under the root node and returns an iterative object
div[class= "J-r-list-c-desc"]/hl/text () extracts text data under a label
/@href Extract the attribute value under a label with the attribute name of href

Filter criteria
div[@class = "link" to navigate to the DIV tag that contains the class attribute and the attribute value is link under the root directory
Div[li] filter out all div tags that contain the Li child tags in the root directory
div[@class] filter out div tags that contain class attributes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.