Self-study SEO tutorial: web site log file analysis search Spider crawl record

Source: Internet
Author: User
Keywords Website optimization SEO

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Where is the website log file?

"Simple said Web site Virtual host FTP several folder description"

After the successful launch of the virtual host, will automatically generate 4 folders in your FTP, respectively: "Databases", "Logofiles", "others", "wwwroot", they function as follows:

1, wwwroot folder: The folder can be accessed through the Web files, you need to upload your external Web site files uploaded to the directory, the input domain name access to the file will be the file under the folder.

2, Databases folder: This folder, like Logofiles, others folder, is not accessible through the Web folder, that is, users can not enter the Web site to access the files under these folders, you can upload some files in these folders do not want others to be able to access. For example, you can store an Access database in the Databases folder, which maximizes your database security.

3, Logofiles folder: Logofiles folder is a log file, which holds your Web site log files. Log files allow you to query some access records to a Web site. (due to the different space quotient, the name of this folder also has a little difference has all the life: Logofiles or weblog anyway will always take log this letter)

4. Others folder: This site stores your custom IIS error files. IIS defaults to some of the error prompts, such as 404 errors, 403 errors, 500 errors, and so on, if you feel that these error tips are not personalized, you can upload your custom error prompts to the directory.

Second, how to analyze the Web site log file analysis to illustrate

The following is an example of the recent August log file for the IBM Notebook Forum http://www.ebenben.com, the space log code is as follows:

2009-08-23 16:06:03 w3svc176 58.61.160.170 get/nb/html/30/t-12730.html-80-220.181.7.24 baiduspider+ (+http:// www.baidu.com/search/spider.htm) 200 0 0 20006

1, first take Baidu as an example, analysis of the explanation:

Visit Time: 2009-08-23 16:06:03

Baidu Spider's ip:58.61.160.170

Spider crawling this URL:/nb/html/30/t-12730.html

The meaning of the expression is very obvious: IP for 58.61.160.170 Baidu spider in 2009-08-23 16:06:03 Point 43 points on this site/nb/html/30/t-12730.html page is included or updated.

"Supplementary Notes"

2009-08-23 16:06:03 is Baidu spider visit date and time;

158.61.160.170 is Baidu Spider's IP

"Get is the action of the server, not get is posp;" Get is getting content from the server;/nb/html/30/t-12730.html uses the HTML1.1 protocol to get content 200 is the return status code, 200 is the successful acquisition, 404 is the file is not found, 401 is the need password, 403 is forbidden to view, 500 server error. It is obvious that the final return of this example is 200, for a successful gain!

"+ (+http://www.baidu.com/search/spider.htm) ″ indicates the antecedents

Of course, there are some space logs will have such a code, they say the following meaning:

"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; Alexa Toolbar) "indicates that the user is using a computer that is compatible with Netscape's Mozilla browser, Windows NT operating system, browser is IE6.0, equipped with Alexa Toolbar.

2, next to say Googlebot, I hope Googlebot has been a long time, these days it finally came, and then I entered site:seo.9ilp.com on the www.google.com, sure enough to see the update of the file included.

2009-08-23 00:07:20 66.249.65.116-218.85.132.68 get/html/down/20070129/550.html-200 Mozilla/5.0+ (compatible;+ googlebot/2.1;++http://www.google.com/bot.html)

218.85.132.68 Check this IP directly to the result "American GoogleBot search engine robot", hehe, Goolgebot really famous, also very standard.

3, Yahoo also want to say:

2009-08-23 00:04:45 202.160.178.195-218.85.132.68 get/html/ad/20070131/658.html-200 Mozilla/5.0+ (compatible;+ yahoo!+slurp+china;+http://misc.yahoo.com.cn/help.html)

Check IP is yahoo China, Yahoo!+slurp+china is Yahoo Spider name, formerly is Inktomi slurp; want to check the site in Yahoo's included situation and Google and Baidu is different, you directly on the www.yahoo.com.cn input URL can be. Do not prefix site: There is also a point, think that these months of Yahoo China in the technical improvements in the site included and keyword search, we can experience.

After talking about the three search reptiles, take a look at other second-rate search engine crawler and portal Search crawler:

1) msn:msn (MSN Live Search Beta) search technology I personally think it is difficult to inflow, it seems to be worse than the portal search, the beta version of the search in the test phase, now the telecommunications use of MSN Search technology, do not know what is the vision of telecommunications, hehe.

2009-08-23 08:22:15 65.55.213.7-218.85.132.68 get/html/down/20070129/550.html-200 msnbot-media/1.0+ (+http:// search.msn.com/msnbot.htm)

2 Alexa: The famous world rankings Alexa, its spiders are more difficult to remember, is ia_archiver. Strictly speaking, it does not know is not a reptile, it and pure search engine is different, is to detect the majority of traffic, not to include the page.

2009-08-23 01:24:44 209.237.238.226-218.85.132.68 get/html/internet/20070130/631.html-200 ia_archiver

3) Iask love to ask:

2009-08-23 11:56:47 60.28.164.44-218.85.132.68 get/html/webpromote/20070203/935.html-200 Mozilla/5.0+ (compatible ; +iaskspider/1.0;+msie+6.0)

4) Sogou Sogou:

For Sogou, I feel funny. People remember I said that my site is revised, brand is the old site files, I have been deleted. And in order to delete the search engine included in the dead link, I wrote in the robots.txt file: Disallow:/brand, this of course prohibit access to Brand under the file, I would like to say that, one, it sogou do not comply with the protocol, two, For files that have been deleted for nearly one months, it comes from where to search. I really don't understand.

2009-08-23 01:34:42 220.181.19.170-218.85.132.68 get/404.htm 404;/underwear/brand/brand2.htm sogou+spider

5) Yodaoice:

For this, we do not feel surprised, this is the 163 new search engine, is currently in beta, the interface is quite like Google style, but the technology is young, also like a dog to grab a pass. 2009-08-23 06:19:29 60.191.80.151-218.85.132.68 get/404.htm 404;/underwear/4864.gif yodaoice

So many reptiles stare at my website, on the one hand happy is Love SEO forum development still rely on them, on the other hand depressed is some not according to the rules to catch the visit will have a bad impact. Look at the log files that grew much more than the previous two weeks.

Copyright: Love SEO Forum

If you need to reprint please attach to love SEO learning Forum this post link address: http://seo.9ilp.com/thread-965-1-1.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.