How to reverse the crawler
Cookies pool, replacing cookies means replacing users
Proxies pool, replacing proxy means replacing IP
Header in Camouflage browser, join User-agent and Referer
Set delay, Time.sleep (1)
A few basic requirements: 1. The urllib of the grab PY is not necessarily going to work, but learn if you haven't used it. Better alternatives have requests and other third parties more humane, mature library, if the Pyer do not understand all kinds of libraries, then White learned. Crawl the most basic is to pull the page back. If you go deeper, you will find that you have to face a variety of web requirements, such as certified, different file formats, coding processing, a variety of strange URL compliance processing, repeated crawl problems, cookies follow the problem, multithreading, multi-process crawl, multi-node crawl, crawl scheduling, resource compression and a series of problems. So the first step is to pull the page back, slowly you will find a variety of problems for you to optimize. 2. Storage capture back will generally use a certain strategy to save, rather than direct analysis, the individual feel better architecture should be analysis and crawl separation, more loose, each link out of the problem can isolate another link may appear problems, good troubleshooting or update release. Then the storage file system, Sqlornosql database, memory database, how to save is the focus of this link. You can choose to save the file system to begin with and then name it with a certain rule. 3. Analysis of the Web page for text analysis, extract the link, or extract the text, anyway, look at your needs, but must do is to analyze the link. You can use what you think is the quickest and most optimal method, such as regular expressions. Then the results of the analysis are applied with the other Links: 4. Show If you do a bunch of things, a little display output is not, how to show value. So finding a good display kit and going out to show the muscles is also the key. If you want to do a station to write a reptile, or you have to analyze the data of something, do not forget this link, better to show the results to others feel.