Oneline has recently added some old sites to the wget Image website, which are those that haven't been updated for a long time but have many resources. The collection is not reliable. who knows which day they will suddenly disappear, so they decided to mirror the image. Www.2cto.com for example, I want to mirror www.oschina.net,... oneline with wget Image website recently added some old sites to favorites, which are those that haven't been updated for a long time but have many resources. The collection is not reliable. who knows which day they will suddenly disappear, so they decided to mirror the image. For example, I want to mirror www.oschina.net, oneline: 1 wget-c-m-k-np-p-w 10 -- random-wait -- waitretry = 5 http://www.oschina.net/ The meaning of-c -- continue is simply the meaning of "resume-m -- mirror", which is used to convert the link in the image using-k -- convert-links, convert all possible links to relative links at cost to facilitate local browsing of www.2cto.com-np -- no-parent, without processing the parent directory-p -- page-requisites, wget downloads all the elements that display an html page, including images, sounds, style sheets, etc.-w -- wait is recommended to add -- wait when creating an image, in this way, the two adjacent requests are separated by the specified time, which does not cause too much burden on the server-random-wait, and is used together with-wait, requests can be randomly generated within a certain period of time. you can also avoid the site's log analysis-waitretry. when a request fails, wget will send a request again, until the specified number of times is reached. if you can access a site from a browser, the system returns "HTTP Request sent, awaiting response... 403 Forbidden ", you need to use-U (-- user-agent) to change the" User-Agent "field of wget. You can use nc to obtain information about the browser and run the following command: 1nc-l 8000 open the browser (chrome is used) to access http://localhost:8000 , View nc output: 1GET/HTTP/1.12 Host: localhost: 80003 Connection: keep-alive4User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.20.1.101 Safari/537.115 Accept: text/html, application/xhtml + xml, application/xml; q = 0.9, */*; q = 0.86Accept-Encoding: gzip, deflate, sdch7Accept-Language: en-US, en; q = 0.88Accept-Charset: ISO-8859-1, UTF-8; q = 0.7, *; q = 0.3User-Agent: Field This is what we want. So the final command is: 1 wget-U "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) chrome/23.0.20.1.101 Safari/537.11 "-c-m-k-np-p-w 10 -- random-wait -- waitretry = 5 http://www.oschina.net/ Success, followed by a long wait ......
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.