This article describes how to collect statistics on the path of the 404 link page captured by the search engine in the nginx access log in PHP, if you need it, you can refer to my habit of cutting nginx logs every day on the server. Therefore, you can always record some 404 page information when visiting major search engines every day, traditionally, I only occasionally analyze logs. However, it may not be easy to manually filter many log information users, 360 access to search engines such as Google, Baidu, sousearch, 404 search, yisearch, Sogou, and Bing is generated as a txt text file and the code test is directly carried on. php.
The code is as follows:
<? Php
// Access test. php? S = google
$ Domain = 'http: // www.jb51.net ';
$ Spiders = array ('baidu' => 'baidider Ider ', '000000' => '360spider ',
'Google '=> 'googlebot', 'sososo' => 'sosospider ', 'sogou' =>
'Sogou web spider ', 'easou' => 'easouspider ', 'Bing' => 'bingbot ');
$ Path = '/home/nginx/logs/'. date ('Y/m/'). (date ('D')-1).'/access_www.txt ';
$ S = $ _ GET ['s '];
If (! Array_key_exists ($ s, $ spiders) die ();
$ Spider = $ spiders [$ s];
When file1_1_s.'_'.date('ym'{.(date('d'{-1}.'.txt ';
If (! File_exists ($ file )){
$ In = file_get_contents ($ path );
$ Pattern = '/GET (. *) HTTP \/1.1 "404. *'. $ spider .'/';
Preg_match_all ($ pattern, $ in, $ matches );
$ Out = '';
Foreach ($ matches [1] as $ k => $ v ){
$ Out. = $ domain. $ v. "\ r \ n ";
}
File_put_contents ($ file, $ out );
}
$ Url = $ domain. '/silian/'. $ file;
Echo $ url;
Okay. There is no advanced technology, and there is only a hands-on writing process.