PHP代碼實現爬蟲記錄——超管用_php執行個體

來源:互聯網
上載者:User
實現爬蟲記錄本文從建立crawler 資料庫,robot.php記錄來訪的爬蟲從而將資訊插入資料庫crawler,然後從資料庫中就可以獲得所有的爬蟲資訊。實現代碼具體如下:

資料庫設計

create table crawler  (   crawler_ID bigint() unsigned not null auto_increment primary key, crawler_category varchar() not null, crawler_date datetime not null default '-- ::', crawler_url varchar() not null, crawler_IP varchar() not null)default charset=utf;

以下檔案 robot.php 記錄來訪的爬蟲,並將資訊寫入資料庫:

<?php $ServerName = $_SERVER["SERVER_NAME"] ;  $ServerPort = $_SERVER["SERVER_PORT"] ;  $ScriptName = $_SERVER["SCRIPT_NAME"] ;  $QueryString = $_SERVER["QUERY_STRING"];  $serverip = $_SERVER["REMOTE_ADDR"] ;  $Url="http://".$ServerName; if ($ServerPort != "") {  $Url = $Url.":".$ServerPort ; }  $Url=$Url.$ScriptName; if ($QueryString !="") {  $Url=$Url."?".$QueryString; }   $GetLocationURL=$Url ; $agent = $_SERVER["HTTP_USER_AGENT"];  $agent=strtolower($agent); $Bot =""; if (strpos($agent,"bot")>-) {  $Bot = "Other Crawler"; } if (strpos($agent,"googlebot")>-) {  $Bot = "Google"; }    if (strpos($agent,"mediapartners-google")>-) {  $Bot = "Google Adsense"; } if (strpos($agent,"baiduspider")>-) {  $Bot = "Baidu"; } if (strpos($agent,"sogou spider")>-) {  $Bot = "Sogou"; } if (strpos($agent,"yahoo")>-) {  $Bot = "Yahoo!"; } if (strpos($agent,"msn")>-) {  $Bot = "MSN"; } if (strpos($agent,"ia_archiver")>-) {  $Bot = "Alexa"; } if (strpos($agent,"iaarchiver")>-) {  $Bot = "Alexa"; } if (strpos($agent,"sohu")>-) {  $Bot = "Sohu"; } if (strpos($agent,"sqworm")>-) {  $Bot = "AOL"; } if (strpos($agent,"yodaoBot")>-) {  $Bot = "Yodao"; } if (strpos($agent,"iaskspider")>-) {  $Bot = "Iask"; } require("./dbinfo.php"); date_default_timezone_set('PRC');  $shijian=date("Y-m-d h:i:s", time()); // 串連到 MySQL 伺服器 $connection = mysql_connect ($host, $username, $password); if (!$connection) {  die('Not connected : ' . mysql_error()); } // 設定活動的 MySQL 資料庫 $db_selected = mysql_select_db($database, $connection); if (!$db_selected) {  die ('Can\'t use db : ' . mysql_error()); } // 向資料庫插入資料 $query = "insert into crawler (crawler_category, crawler_date, crawler_url, crawler_IP) values ('$Bot','$shijian','$GetLocationURL','$serverip')"; $result = mysql_query($query); if (!$result) {  die('Invalid query: ' . mysql_error()); }?>

成功了,現在訪問資料庫即可得知什麼時候哪裡的蜘蛛爬過你的什麼頁面。

view sourceprint?<?phpinclude './robot.php';include '../library/page.Class.php';$page = $_GET['page'];include '../library/conn_new.php';$count = $mysql -> num_rows($mysql -> query("select * from crawler"));$pages = new PageClass($count,,$_GET['page'],$_SERVER['PHP_SELF'].'?page={page}');$sql = "select * from crawler order by ";$sql .= "crawler_date desc limit ".$pages -> page_limit.",".$pages -> myde_size;$result = $mysql -> query($sql);?>
 <?phpwhile($myrow = $mysql -> fetch_array($result)){?>
  <?php }?> 
  
爬蟲訪問時間 爬蟲分類 爬蟲IP 爬蟲訪問的URL
<? echo $myrow["crawler_date"] ?> <? echo $myrow["crawler_category"] ?> <? echo $myrow["crawler_IP"] ?> <? echo $myrow["crawler_url"] ?>
<?php echo $pages -> myde_write();?>

以上代碼就是PHP代碼實現爬蟲記錄——超管用的全部內容,希望對大家有所協助。

  • 聯繫我們

    該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

    如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.