Python3 web crawler-1. What is a web crawler?
1. What is crawler?
First, let's take a brief look at crawlers. That is, the process of requesting a website and extracting the required data. As for how to crawl, it will be the content to be learned later. Our program can send requests to the server instead, and then download a large amount of data in batches.
Ii. Basic crawler Process
Iii. What is a request?
When we send a request to the server through a browser, what information does this request contain? We can use chrome's Developer Tools to explain (if you do not know how to use it, read the remarks in this article ).
4. What does response contain?
5. Simple request demonstration
Use the Python request library for webpage requests:
The output result is the webpage code that has not yet been rendered, that is, the content of the request body. You can view the response header information:
View status code:
You can also add the request header to the request information:
Capture images (Baidu logo ):
6. How to Solve JavaScript rendering Problems
Use Selenium webdriver
Input print (driver. page_source). You can see that this code is the rendered code.
[Note] use of chrome
The Elements tag shows the clear HTML code.
The Network tag contains the data requested by the browser. Click it to view the detailed information, such as request headers and response headers.
Learning video on YouTube (Elnino Chen): https://www.youtube.com/channel/UC0gXu_5GOwzAaxkFymbwRhg