Download the entire python website.
Download the entire website tool using python.
The core process is simple:
1. Enter the website address
2. url to get the response content.
3. According to the http packet header of the response, if the type is html, the process starts from step 1. If it is another type, it will be executed from step 1.
4. Extract the href and src attribute values in html.
5. Add the extracted url to the download queue. If the url already exists in the download queue, discard it.
6. Then open the next url in the url queue.
7. Continue to step 1 of the loop, until the url in the url queue is processed.
This step looks simple, but it takes a long time to process many details.
For various types of URLs, how do I name URLs with question marks.
Currently, this program has one problem:
1. When a url is opened, it may be blocked and cannot be executed in one place. Urllib. request
2. When the url queue is too long, the multi-thread download speed is faster.
3. I do not know how many errors there are. When writing comments, if you use Chinese, you need to switch back and forth the input method, so you can use English.
Currently, the Program Department supports multithreading and will be improved in the future.
If you are interested in perfection, we are very welcome.
Source code download: http://download.csdn.net/detail/jiangxiaoma111/8002631
Personal email: 369806726@qq.com
How does python download multiple files from a website?
Generally, you can use a download tool. For example, downlthemall is used in firefox.
Python download
Www.python.org/ftp/python/