I have a daily cut on the server nginx log habit, so for every day to visit the major search engines, can always record some 404 page information, traditionally I just occasionally analyze the log, but for a lot of log information friends, manual to filter may not be an easy thing, This is not my own slowly research a little bit, for Google, Baidu, search, 360 search, appropriate search, Sogou, Bing and other search engines of the 404 access to become a txt text file, directly on the code test.php.
Copy Code code as follows:
<?php
Visit test.php?s=google
$domain = ' http://www.jb51.net ';
$spiders =array (' Baidu ' => ' Baiduspider ', ' 360 ' => ' 360Spider ',
' Google ' => ' Googlebot ', ' Soso ' => ' sosospider ', ' Sogou ' =>
' Sogou Web spider ', ' Easou ' => ' easouspider ', ' Bing ' => ' Bingbot ');
$path = '/home/nginx/logs/'. Date (' y/m/'). (Date (' d ')-1). ' /access_www.txt ';
$s =$_get[' s '];
if (!array_key_exists ($s, $spiders)) Die ();
$spider = $spiders [$s];
$file = $s. ' _ '. Date (' ym '). (Date (' d ')-1). ' TXT ';
if (!file_exists ($file)) {
$in =file_get_contents ($path);
$pattern = '/get (. *) http\/1.1 ' 404.* '. $spider. ' /';
Preg_match_all ($pattern, $in, $matches);
$out = ';
foreach ($matches [1] as $k => $v) {
$out. = $domain. $v. " \ r \ n ";
}
File_put_contents ($file, $out);
}
$url = $domain. ' /silian/'. $file;
echo $url;
Okay, that's it. There is no advanced technology, only the process of hands-on writing.