Search engine crawler Recorder

Source: Internet
Author: User

Today, on ChinaUnix, I saw a code record crawler program:

Function saveRobot ($ dir) {$ addtime = date ('Y-m-d H: I: s', time (); $ GetLocationURL = "http ://". $ _ SERVER ["HTTP_HOST"]. $ _ SERVER ['request _ URI ']; $ agent1 = $ _ SERVER ["HTTP_USER_AGENT"]; $ agent = strtolower ($ agent1); $ Bot = ''; if (strpos ($ agent, "googlebot")>-1) {$ Bot = "Google";} if (strpos ($ agent, "mediapartners-google")> -1) {$ Bot = "Google";} if (strpos ($ agent, "baiduspider")>-1) {$ Bot = "Baidu ";} if (strpos ($ Gent, "sogou spider")>-1) {$ Bot = "Sogou";} if (strpos ($ agent, "sosospider")>-1) {$ Bot = "Soso";} if ($ Bot! = "") {$ MDateTime = date ("Y-m-d"); // check whether the table exists today. If it does not exist, it is created. File_put_contents ($ dir. "/define mdatetime.html", "$ Bot-$ GetLocationURL-$ addtime <br>", FILE_APPEND); // echo $ agent. '-'. $ Bot. '-'. $ GetLocationURL ;}}

 

Inspired by this, it can be seen that when a crawler accesses your website, it identifies itself through $ _ SERVER ["HTTP_USER_AGENT"]. Different crawlers have different names.

 

I searched for a complete crawler Record Program on the Internet and posted it for your reference:

<? Php/*** name: file cls_spider.php * -------- description ----------------- * The role of the class file is to monitor the operations of search engine crawlers on websites. * This class uses php Code and is only applicable to php websites. * If the code is not used in a database, you can directly write the record in a text file. Create a spider folder in the root directory. * Records generated by code are for reference only and do not necessarily contain all records, because files not running the Code are not recorded. * This code is free of charge. You can copy and modify it as needed, but you want to retain some of my copyright information. * -------- Usage ------------- * Add the following code to the page for statistics and call the code. Generally, the code is modified in the globally called file. * Require (ROOT_PATH. 'Directory of the current file/cls_spider.php '); * $ spider = new spider (); * if there is a friend who cannot install it, contact me through the following methods. * QQ: 235534 * EMAIL: dreamisok@qq.com * blog: http://blog.toptao123.com * please support my website http://www.ataobao.net http://www.toptao123.com welcome exchange link */class spider {var $ searchbot = ""; var $ tlc_thispage = ""; var $ filename = ""; var $ timestr = ""; var $ spider_array = array ("Googlebot" => "googlebot ", "google adsense" => "mediapartners-google", "YODAO" => "yodaobot", "MSNbot" => "msnbot", "Yahoobot" => "slurp ", "Baidus Pider "=>" baiduspider "," Sohubot "=>" sohu-search "," IASK "=>" iaskspider "," SOGOU "=>" sogou ", "Robozilla" => "robozilla", "Lycos" => "lycos"); function _ construct () {$ this-> tlc_thispage = addslashes ($ _ SERVER ["REQUEST_URI"]); $ this-> filename = 'spider /'. date ("ymd" 2.16.'.txt '; $ this-> timestr = $ this-> nowtime (); $ this-> searchbot = $ this-> get_naps_bot (); $ this-> spider ();} function spider () {if (! Empty ($ this-> searchbot) {$ writestring = "Time :". $ this-> timestr. "Robot :". $ this-> searchbot. "URL :". $ this-> tlc_thispage. "\ n"; $ data = fopen ($ this-> filename, "a"); fwrite ($ data, $ writestring); fclose ($ data );}} function get_naps_bot () {if (isset ($ _ SERVER ['HTTP _ USER_AGENT ']) {$ useragent = strtolower ($ _ SERVER ['HTTP _ USER_AGENT']); foreach ($ this-> spider_array as $ key => $ value) {if (strpos ($ useragent, $ valu E )! = False) {return $ key ;}} return false;} function nowtime () {$ date = date ("Y-m-d.G: I: s "); return $ date ;}}?>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.