Summary
The first step to optimize website SEO is to first let spider crawlers come to your website frequently. The following Linux Command will let you know the spider crawling situation clearly. The following is an analysis of the nginx server. The directory of the log file is/usr/local/nginx/logs/access. log, access. the log file records the logs of the last day. First, check the log size. If it is large (more than 50 MB), we recommend that you do not use these commands for analysis, because these commands consume a lot of CPU, or, update and put it on the analytics machine for execution to avoid affecting the website speed.
Linux shell commands
1. Number of times Baidu spider crawls
Cat access. log | grep Baiduspider | wc
The leftmost value shows the number of crawlers.
2. detailed records of Baidu Spider (Ctrl C can be terminated)
Cat access. log | grep Baiduspider
You can also use the following command:
Cat access. log | grep Baiduspider | tail-n 10
Cat access. log | grep Baiduspider | head-n 10
Only the last 10 or the first 10 records can be viewed, so that you can know the start time and date of the log file.
3. Detailed record of Baidu spider crawling home page
Cat access. log | grep Baiduspider | grep "GET/HTTP"
Baidu spider seems to love the homepage every hour, while Google and Yahoo spider prefer the inner pages.
4. Time Point Distribution of Baidu Spider's dispatch record
Cat access. log | grep "Baiduspider" | awk '{print $4 }'
5. descending order of Baidu spider crawling pages
Cat access. log | grep "Baiduspider" | awk '{print $7}' | sort | uniq-c | sort-r
In this article, the Baiduspider can be changed to Googlebot to view Google's data. In view of the particularity of the Chinese mainland, we should pay more attention to Baidu's log.
Appendix :( Mediapartners-Google) Detailed crawling record of Google adsense spider
Cat access. log | grep Mediapartners
Mediapartners-What is Google? Google adsense ads can be related to content, because after each ad containing adsense is accessed, a Google spider, Mediapartners, will soon come to this page, so refresh the page in a few minutes to display the relevant ads. That's amazing!