A small number of spider impersonates Baiduspider on the Chinese Internet to Capture webpages. At the madcon conference on Saturday, many people found that they did not know how to judge whether Baiduspider is true or false. I would like to explain it again:
On the Chinese Internet, the hostname of the Baiduspider ip address is named in * .baidu.com format. If not * .baidu.com, the hostname is impersonate. We recommend that you use DNS lookup to determine whether the source ip address belongs to Baidu.
For example, on linux, you can use the host ip command to reverse the ip address to determine whether the ip address is captured by Baiduspider.
$ Host 123.125.66.120
120.66.125.123.in-addr. arpa domain name pointer
Baiduspider-123-125-66-120.crawl.baidu.com.
On windows, you can use the nslookup command to reverse the ip address to determine whether the ip address is captured by Baiduspider.
Click "start"-"run"-"cmd"-"enter nslookup IP address"-"enter"
C: Documents and Settingswangtao> nslookup 123.125.66.120
Name: baiduspider-123-125-66-120.crawl.baidu.com
Address: 123.125.66.120
For details, see:
Http://www.baidu.com/search/spider_chinese.html