Python applications are now in full swing with a wide range of applications. Fast access to the top of the programming language rankings due to its rapid development and high efficiency. This series of articles is dedicated to a comprehensive and systematic introduction of Python language development knowledge and related knowledge summaries. I hope you can get started quickly and learn the language of Python.
This article is based on Python in the previous part of Python combat: Python crawler Learning tutorial, get the movie leaderboard, again upgrade the Python web crawler Combat course.
1. Project Overview.
The use of XPath and requests module for Web page crawl and analysis, to achieve the effect of Web page image download.
Grab and crawl pictures address: http://www.2cto.com/meinv/
Development environment: Python 2.7, Pycharm 5 Community
Required Knowledge: Artifact XPath, requests module, Python basic syntax.
2. Introduction and installation of the required modules
Xpath
Description: XPath is actually a language that can be used to find and extract information in XML through the attributes of an element. It supports HTML.
Simpler than regular expressions. More powerful
Installation: Download the lxml library for installation operations. : http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml. Download the corresponding version of lxml
Open Library Directory Run command to install
After the download is complete, please change the suffix name WHL to zip.
Unzip the file to put the lxml folder in the Python installation directory of the Lib folder.
Requests Module Installation
For detailed installation steps see: Python Combat: Python crawler learning tutorial for requests installation in the movie leaderboard.
3.Xpath extract Find content in detail:
Language is no exception, XPath also has a certain syntax.
Locating the root node
/down Level Search
/text () Extract text content
/@xxx Extract Attribute Contents
4. Project Principal Code
From lxml import etree
selector = etree. HTML (Web page source code)
Selector.xpath (XPath syntax)
Import requests
Requests.get (URL)
5. Code Demo:
Effect Show:
Tip: XPath simple get: Developer Tools-Locate the label you want to extract-right-click to copy the XPath path.
But still need to modify OH.
Welcome to the Headlines Today: Be the full-stack siege lion. Python actual combat: Beautiful picture downloader, a huge amount of images you download.
QQ Technology Group: 538742639
Project source code please pay attention to the public platform: fullstackcourse do all-stack siege lion. Reply: "Beautiful picture downloader" gets.
Next: Python Learning Primer Tutorial, String function expansion
Python combat: Beautiful picture downloader, a huge picture of you download