I am using the pycharm Editor, 3.5 python. Br/> first, let's take a look at the source code and results.
##### @ Time: 2018/10/25
Import requests
From lxml import etree
Headers = {"User-Agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1 Trident/5.0 ;"}
Html = requests. Get ("http://tu.duowan.com/tu", headers = headers). Text
Xpath_url = etree. html (HTML)
Picture_url = xpath_url.xpath ("// ul [@ ID = 'pic-list']/Li/A/img/@ SRC ")
For I in picture_url:
Picture_code = I [40:-4]
With open ("./picture/%s.jpg" % picture_code, "WB") as file:
File. Write (requests. Get (I). Content)
Import requests
From lxml import etree
First, two rows are used to import two databases. Therefore, you can use requests and XPath for the sake of further explanation.
Headers = {"User-Agent": "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1 Trident/5.0 ;"}
This line is used to add a header to the crawler so that the server considers itself as a header and will be called later.
Html = requests. Get ("http://tu.duowan.com/tu", headers = headers). Text
Requests. Get (URL). Text
Xpath_url = etree. html (HTML)
Picture_url = xpath_url.xpath ("// ul [@ ID = 'pic-list']/Li/A/img/@ SRC ")
The first line is to convert the source code to a format that can be recognized by XPath. It is very important for new users to easily forget and cause subsequent errors. The second line is the regular XPath syntax.
Want to learn can go to the following link to see the http://www.w3school.com.cn/xpath/index.asp
For I in picture_url:
Picture_code = I [40:-4]
With open ("./picture/%s.jpg" % picture_code, "WB") as file:
File. Write (requests. Get (I). Content)
The first line is a simple for loop, which can be used for reading books or tutorials.
The second line is the string truncation, in order to obtain the different identifier of the image. You can choose to name it in any way, depending on your preferences.
Line 3 with open () as F: structure. Open a file and assign it to F as a variable. Open ("./picture/%s.jpg" % picture_code, "WB") is to open the picture directory of the current directory, which needs to be created online. Of course, you can also use this line of code to implement. OS. makedirs ('./picture'), but you have to import the OS module to make it more complex. Of course, it is easy to use and depends on your preferences. It is highly recommended to crawl things at a time, especially when you need to create many different directories to create things.
Create a file named after image encoding and set it to WB writable, for example, binary code format, because many images are in binary format.
The fourth line is to write a simple file into requests. get (I ). content. If you want to see it better, you can add a picture = requests. get (I ). content, which can be directly followed by file. write (picture ). They are all personal preferences, but do not forget to add the content after get (I). If not, picture is equal to a return value.
Probably 200, which is correct, or other incorrect return values.
Open usage want to learn can refer to this link below http://www.runoob.com/python/python-func-open.html
The new XPath crawler is very comfortable to use.