The new XPath crawler is very comfortable to use.

Source: Internet
Author: User
I am using the pycharm Editor, 3.5 python. Br/> first, let's take a look at the source code and results.
##### @ Time: 2018/10/25
Import requests
From lxml import etree
Headers = {"User-Agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1 Trident/5.0 ;"}
Html = requests. Get ("http://tu.duowan.com/tu", headers = headers). Text
Xpath_url = etree. html (HTML)
Picture_url = xpath_url.xpath ("// ul [@ ID = 'pic-list']/Li/A/img/@ SRC ")
For I in picture_url:
Picture_code = I [40:-4]
With open ("./picture/%s.jpg" % picture_code, "WB") as file:
File. Write (requests. Get (I). Content)

Import requests
From lxml import etree
First, two rows are used to import two databases. Therefore, you can use requests and XPath for the sake of further explanation.

Headers = {"User-Agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1 Trident/5.0 ;"}
This line is used to add a header to the crawler so that the server considers itself as a header and will be called later.

Html = requests. Get ("http://tu.duowan.com/tu", headers = headers). Text
Requests. Get (URL). Text

Xpath_url = etree. html (HTML)
Picture_url = xpath_url.xpath ("// ul [@ ID = 'pic-list']/Li/A/img/@ SRC ")
The first line is to convert the source code to a format that can be recognized by XPath. It is very important for new users to easily forget and cause subsequent errors. The second line is the regular XPath syntax.
Want to learn can go to the following link to see the http://www.w3school.com.cn/xpath/index.asp

For I in picture_url:
Picture_code = I [40:-4]
With open ("./picture/%s.jpg" % picture_code, "WB") as file:
File. Write (requests. Get (I). Content)
The first line is a simple for loop, which can be used for reading books or tutorials.
The second line is the string truncation, in order to obtain the different identifier of the image. You can choose to name it in any way, depending on your preferences.
Line 3 with open () as F: structure. Open a file and assign it to F as a variable. Open ("./picture/%s.jpg" % picture_code, "WB") is to open the picture directory of the current directory, which needs to be created online. Of course, you can also use this line of code to implement. OS. makedirs ('./picture'), but you have to import the OS module to make it more complex. Of course, it is easy to use and depends on your preferences. It is highly recommended to crawl things at a time, especially when you need to create many different directories to create things.
Create a file named after image encoding and set it to WB writable, for example, binary code format, because many images are in binary format.
The fourth line is to write a simple file into requests. get (I ). content. If you want to see it better, you can add a picture = requests. get (I ). content, which can be directly followed by file. write (picture ). They are all personal preferences, but do not forget to add the content after get (I). If not, picture is equal to a return value.
Probably 200, which is correct, or other incorrect return values.
Open usage want to learn can refer to this link below http://www.runoob.com/python/python-func-open.html

The new XPath crawler is very comfortable to use.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.