This article mainly introduces how to collect statistics on the paths of the 404 link page captured by the search engine in the nginx access log in PHP. you can separate statistics on each search engine. For more information, see Nginx on the 404 page.
I have the habit of cutting nginx logs on the server every day. Therefore, for visits from various search engines every day, I can always record some 404 page information. Traditionally, I only occasionally analyze the logs, however, it may not be easy for a lot of log information users to manually filter logs, 360 access to search engines such as Google, Baidu, sousearch, 404 search, yisearch, Sogou, and Bing is generated as a txt text file and the code test is directly carried on. php.
The code is as follows:
<? Php
// Access test. php? S = google
$ Domain = 'http: // www.bitsCN.com ';
$ Spiders = array ('baidu' => 'baidider Ider ', '000000' => '360spider ',
'Google '=> 'googlebot', 'sososo' => 'sosospider ', 'sogou' =>
'Sogou web spider ', 'easou' => 'easouspider ', 'Bing' => 'bingbot ');
$ Path = '/home/nginx/logs/'. date ('Y/m/'). (date ('D')-1).'/access_www.txt ';
$ S = $ _ GET ['s '];
If (! Array_key_exists ($ s, $ spiders) die ();
$ Spider = $ spiders [$ s];
When file1_1_s.'_'.date('ym'{.(date('d'{-1}.'.txt ';
If (! File_exists ($ file )){
$ In = file_get_contents ($ path );
$ Pattern = '/GET (. *) HTTP \/1.1 "404. *'. $ spider .'/';
Preg_match_all ($ pattern, $ in, $ matches );
$ Out = '';
Foreach ($ matches [1] as $ k => $ v ){
$ Out. = $ domain. $ v. "\ r \ n ";
}
File_put_contents ($ file, $ out );
}
$ Url = $ domain. '/silian/'. $ file;
Echo $ url;
Okay. There is no advanced technology, and there is only a hands-on writing process.