Now I prefer to read HTML e-books. It is not very convenient to jump back in PDF, And the PDF reader is very bloated. html should be the first choice for e-books that do not need to be added with special requirements such as footer, you only need a browser to view the browser, and the speed is fast.
Wget in Linux is a powerful tool for website images ~ /. Make an alias in bashrc, alias getsite = 'wget-r-k-p-NP ', so that you only need:
Getsite http: // URL/to/html/book
You can.
But today I met a website that can be opened in a browser, but wget immediately returns 403. At first, it was assumed that the robots.txt file restricted wget, but it still caused this error after adding robotx = off and letting wget ignore robots. After some searches, it is clear that some sites have disabled the wget user agent. It is estimated that the whole site download is prevented, resulting in excessive traffic and piracy. (Well, It's too evil for me to download this website ......)
The problem can be solved after it is found,Add parameters to wget:-U nosuchbrowser/1.0In this way, the UA we see by the other party is not wget, so we can download it smoothly ......
The problem is solved, but at last I would like to remind readers that if wget is disabled on some websites, there must be some reasons. It is best not to use wget to download it, let alone Piracy ......
For more information, see http://www.cnblogs.com/stephenjy/archive/2010/02/17/1668937.html