What is robot!
define('ISROBOT', getrobot());if(defined('NOROBOT') && ISROBOT) {exit(header("HTTP/1.1 403 Forbidden"));}
The code above is said to prevent robot access. I don't understand where it represents a bot.
This is how the robot's function is written. Is there any xuanjicang in it?
function getrobot() {if(!defined('IS_ROBOT')) {$kw_spiders = 'Bot|Crawl|Spider|slurp|sohu-search|lycos|robozilla';$kw_browsers = 'MSIE|Netscape|Opera|Konqueror|Mozilla';if(preg_match("/($kw_browsers)/", $_SERVER['HTTP_USER_AGENT'])) {define('IS_ROBOT', FALSE);} elseif(preg_match("/($kw_spiders)/", $_SERVER['HTTP_USER_AGENT'])) {define('IS_ROBOT', TRUE);} else {define('IS_ROBOT', FALSE);}}return IS_ROBOT;}
Reply to discussion (solution)
Bot-Microsoft's bing
Spider-Baidu
Slurp-Yahoo
Others do not know, but this can only prevent normal crawling. if someone spoofs it, it cannot prevent it.
Bot-Microsoft's bing
Spider-Baidu
Slurp-Yahoo
Others do not know, but this can only prevent normal crawling. if someone spoofs it, it cannot prevent it.
Isn't it easy?
Hooligans can't grasp the issue, but general search engines follow the robots protocol.
Hooligans can't grasp the issue, but general search engines follow the robots protocol.
$ Kw_spiders = 'bot | Crawl | Spider | slurp | sohu-search | lycos | robozilla ';
Bot | Crawl | Spider | slurp | sohu-search | lycos | robozilla
Is the regular expression pattern to be matched
Bot, Spider... are all Spider identifiers.