View search engine spider crawlers in Nginx on the CentOS Server

Source: Internet
Author: User

Summary

The first step to optimize website SEO is to first let spider crawlers come to your website frequently. The following Linux Command will let you know the spider crawling situation clearly. The following is an analysis of the nginx server. The directory of the log file is/usr/local/nginx/logs/access. log, access. the log file records the logs of the last day. First, check the log size. If it is large (more than 50 MB), we recommend that you do not use these commands for analysis, because these commands consume a lot of CPU, or, update and put it on the analytics machine for execution to avoid affecting the website speed.

Linux shell commands

1. Number of times Baidu spider crawls
Cat access. log | grep Baiduspider | wc
The leftmost value shows the number of crawlers.
2. detailed records of Baidu Spider (Ctrl C can be terminated)
Cat access. log | grep Baiduspider
You can also use the following command:
Cat access. log | grep Baiduspider | tail-n 10
Cat access. log | grep Baiduspider | head-n 10
Only the last 10 or the first 10 records can be viewed, so that you can know the start time and date of the log file.
3. Detailed record of Baidu spider crawling home page
Cat access. log | grep Baiduspider | grep "GET/HTTP"
Baidu spider seems to love the homepage every hour, while Google and Yahoo spider prefer the inner pages.
4. Time Point Distribution of Baidu Spider's dispatch record
Cat access. log | grep "Baiduspider" | awk '{print $4 }'
5. descending order of Baidu spider crawling pages
Cat access. log | grep "Baiduspider" | awk '{print $7}' | sort | uniq-c | sort-r
In this article, the Baiduspider can be changed to Googlebot to view Google's data. In view of the particularity of the Chinese mainland, we should pay more attention to Baidu's log.
Appendix :( Mediapartners-Google) Detailed crawling record of Google adsense spider
Cat access. log | grep Mediapartners
Mediapartners-What is Google? Google adsense ads can be related to content, because after each ad containing adsense is accessed, a Google spider, Mediapartners, will soon come to this page, so refresh the page in a few minutes to display the relevant ads. That's amazing!


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.