Python combat: Beautiful picture downloader, a huge picture of you download

Source: Internet
Author: User
Tags python web crawler

Python applications are now in full swing with a wide range of applications. Fast access to the top of the programming language rankings due to its rapid development and high efficiency. This series of articles is dedicated to a comprehensive and systematic introduction of Python language development knowledge and related knowledge summaries. I hope you can get started quickly and learn the language of Python.

This article is based on Python in the previous part of Python combat: Python crawler Learning tutorial, get the movie leaderboard, again upgrade the Python web crawler Combat course.

1. Project Overview.

The use of XPath and requests module for Web page crawl and analysis, to achieve the effect of Web page image download.

Grab and crawl pictures address: http://www.2cto.com/meinv/

Development environment: Python 2.7, Pycharm 5 Community

Required Knowledge: Artifact XPath, requests module, Python basic syntax.

2. Introduction and installation of the required modules

    1. Xpath

      1. Description: XPath is actually a language that can be used to find and extract information in XML through the attributes of an element. It supports HTML.

      2. Simpler than regular expressions. More powerful

      3. Installation: Download the lxml library for installation operations. : http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml. Download the corresponding version of lxml

      4. Open Library Directory Run command to install

        1. After the download is complete, please change the suffix name WHL to zip.

        2. Unzip the file to put the lxml folder in the Python installation directory of the Lib folder.

    2. Requests Module Installation

      1. For detailed installation steps see: Python Combat: Python crawler learning tutorial for requests installation in the movie leaderboard.

3.Xpath extract Find content in detail:

Language is no exception, XPath also has a certain syntax.

Locating the root node

/down Level Search

/text () Extract text content

/@xxx Extract Attribute Contents

4. Project Principal Code

  1. From lxml import etree

  2. selector = etree. HTML (Web page source code)

  3. Selector.xpath (XPath syntax)

  4. Import requests

  5. Requests.get (URL)

5. Code Demo:

Effect Show:

Tip: XPath simple get: Developer Tools-Locate the label you want to extract-right-click to copy the XPath path.

But still need to modify OH.

Welcome to the Headlines Today: Be the full-stack siege lion. Python actual combat: Beautiful picture downloader, a huge amount of images you download.

QQ Technology Group: 538742639

Project source code please pay attention to the public platform: fullstackcourse do all-stack siege lion. Reply: "Beautiful picture downloader" gets.

Next: Python Learning Primer Tutorial, String function expansion

Python combat: Beautiful picture downloader, a huge picture of you download

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.