Web site How to view search engine spider crawler behavior

Source: Internet
Author: User
Tags time and date nginx server

Brief introduction


This article introduces Linux/nginx how to view search engine spider crawler behavior, clear spider crawling situation to do SEO optimization has a lot of help. A friend you need to learn through this article


Summary

SEO optimization of the first step of the site is to make spider crawlers often come to your site to patronize, the following Linux command can let you know the spider's crawling situation clearly.


Below we analyze for Nginx server, the directory where log files are /usr/local/nginx/logs/access.log


Access.log This file should be recorded in the last day of the log situation, first look at the log size, if large (more than 50MB) is recommended not to use these command analysis, because these commands are CPU-intensive, or updated to put on the analysis machine to perform, so as not to affect the speed of the site.


Linux shell command


1. Baidu Spider crawl number of times

Cat/var/log/nginx/access.log | grep Baiduspider | Wc

The left-most value shows the number of crawls.


650) this.width=650; "title=" 2017-11-22_133727.png "src=" https://s1.51cto.com/oss/201711/22/ Bf98954625c2b3eab7b0751237725e19.png-wh_500x0-wm_3-wmp_4-s_1430818294.png "alt=" Bf98954625c2b3eab7b0751237725e19.png-wh_ "/>


2. Baidu Spider's detailed record (Ctrl C can be terminated)

Cat/var/www/log/nginx/access.log | grep Baiduspider


You can also use the following command:

Cat/var/log/nginx/access.log | grep Baiduspider | Tail-n 10
Cat/var/log/nginx/access.log | grep Baiduspider | Head-n 10

Just look at the last 10 or the top 10, which will tell you the time and date of the log file's start record.


3. Baidu Spider Crawl home detailed records

Cat/var/log/nginx/access.log | grep Baiduspider | grep "Get/http"

Baidu Spider seems to home very love every hour to patronize, and Google and Yahoo Spider prefer inside page.


4. Baidu Spider sex recording time-point distribution

Cat/var/log/nginx/access.log | grep "Baiduspider" | awk ' {print $4} '


5. Baidu Spider Crawl page descending sequence list by number of times

Cat/var/log/nginx/access.log | grep "Baiduspider" | awk ' {print $7} ' | Sort | uniq-c | Sort-r


The text of the Baiduspider changed to Googlebot can view Google's data, in view of the particularity of the mainland, we should be more concerned about the log Baidu.


Attached: (mediapartners-google) detailed crawling record of Google adsense spider

Cat Access.log | grep mediapartners

What is Mediapartners-google? Google AdSense ads can be related to content, because each contains AdSense ads are visited, soon there is a mediapartners-google spider came to this page, so a few minutes later refresh will be able to display relevance ads, really bad ah!


Linux under Nginx How to enable Web log, view spider crawler


The default path is the one you specified when you installed the

If you use an installation package such as Lnmp

You can do it under the shell.

Where is Nginx

After finding the appropriate path

In the Conf folder under Nginx to see the configuration file, log file if recorded, in the configuration file has a path!


Summarize! I want to learn how to penetrate!


This article from "Li Shilong" blog, declined reprint!

Web site How to view search engine spider crawler behavior

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.