Brief introduction
This article introduces Linux/nginx how to view search engine spider crawler behavior, clear spider crawling situation to do SEO optimization has a lot of help. A friend you need to learn through this article
Summary
SEO optimization of the first step of the site is to make spider crawlers often come to your site to patronize, the following Linux command can let you know the spider's crawling situation clearly.
Below we analyze for Nginx server, the directory where log files are /usr/local/nginx/logs/access.log
Access.log This file should be recorded in the last day of the log situation, first look at the log size, if large (more than 50MB) is recommended not to use these command analysis, because these commands are CPU-intensive, or updated to put on the analysis machine to perform, so as not to affect the speed of the site.
Linux shell command
1. Baidu Spider crawl number of times
Cat/var/log/nginx/access.log | grep Baiduspider | Wc
The left-most value shows the number of crawls.
650) this.width=650; "title=" 2017-11-22_133727.png "src=" https://s1.51cto.com/oss/201711/22/ Bf98954625c2b3eab7b0751237725e19.png-wh_500x0-wm_3-wmp_4-s_1430818294.png "alt=" Bf98954625c2b3eab7b0751237725e19.png-wh_ "/>
2. Baidu Spider's detailed record (Ctrl C can be terminated)
Cat/var/www/log/nginx/access.log | grep Baiduspider
You can also use the following command:
Cat/var/log/nginx/access.log | grep Baiduspider | Tail-n 10
Cat/var/log/nginx/access.log | grep Baiduspider | Head-n 10
Just look at the last 10 or the top 10, which will tell you the time and date of the log file's start record.
3. Baidu Spider Crawl home detailed records
Cat/var/log/nginx/access.log | grep Baiduspider | grep "Get/http"
Baidu Spider seems to home very love every hour to patronize, and Google and Yahoo Spider prefer inside page.
4. Baidu Spider sex recording time-point distribution
Cat/var/log/nginx/access.log | grep "Baiduspider" | awk ' {print $4} '
5. Baidu Spider Crawl page descending sequence list by number of times
Cat/var/log/nginx/access.log | grep "Baiduspider" | awk ' {print $7} ' | Sort | uniq-c | Sort-r
The text of the Baiduspider changed to Googlebot can view Google's data, in view of the particularity of the mainland, we should be more concerned about the log Baidu.
Attached: (mediapartners-google) detailed crawling record of Google adsense spider
Cat Access.log | grep mediapartners
What is Mediapartners-google? Google AdSense ads can be related to content, because each contains AdSense ads are visited, soon there is a mediapartners-google spider came to this page, so a few minutes later refresh will be able to display relevance ads, really bad ah!
Linux under Nginx How to enable Web log, view spider crawler
The default path is the one you specified when you installed the
If you use an installation package such as Lnmp
You can do it under the shell.
Where is Nginx
After finding the appropriate path
In the Conf folder under Nginx to see the configuration file, log file if recorded, in the configuration file has a path!
Summarize! I want to learn how to penetrate!
This article from "Li Shilong" blog, declined reprint!
Web site How to view search engine spider crawler behavior