Crawler Anti-Drop line

Source: Internet
Author: User
Tags server website

Recently climbed Amazon, found that the previously written crawler is relatively "thin", crawling static stations or AJAX Web site can also be used,

For more witty services, such as Amazon Web sites, the crawler will often fail after more than 100 consecutive visits.

Looked for the reason for the failure, and found that Amazon detects the IP once the request is too many times, will jump to a detection whether the program on the operation of the page,

is to enter the verification code of the webpage, enter the correct verification code, you can continue to enjoy the visit, get rid of the verification code is a very troublesome work, determined and verification code against IS and oneself to pass ...

GG a way to tidy up a better solution.

1.ADSL Restart dialing

We all know that ADSL redial, will be replaced by a new IP address, then you can write a script to set the time to redial ADSL, or crawl with a crawler, found to start to jump to the Verification code page, and then call redial ADSL script

2. Crawl Proxy server address

Proxy server can also be better to solve the problem of IP is blocked, I believe that we have a better proxy server site it ~

Proxy Server website domain name often replaced, I do not provide, we own GG Bar, ferry may also have a surprise ~

Write regular crawl proxy server, pay attention to crawl finish must detect if available!

The code can refer to the following blog's checkproxy () function

Https://blog.linuxeye.com/410.html

Want to know my Amazon crawler?

Code under the GitHub spider-comments project Amazon-spider-comments

Https://github.com/fankcoder/spider-comments

Crawler Anti-Drop line

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.