Flying pig: Uncover the problem of search engine

Source: Internet
Author: User
Keywords Search

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

GOOGLE04 years after the introduction of sandbox, Baidu finally raised the new station last year, the standard, the threshold of the SEO has become higher than before, many times the new station has become a problem, why the search engine does not include my site? Why is my site included slower than others? Sort by common degree , the search engine included problems summed up in the following several reasons, you can control the view of their own web site belong to which kind:

1. Illegal content

This behavior is generally rare, violations of national laws and regulations of the Web page, the general search engine is not included, Baidu in the "Webmaster FAQ" (http://www.baidu.com/search/guide.html#1) clearly stated that does not include "not in accordance with national laws and regulations "Web page, Google is still in this area of efforts, but as the Google Chinese process, the crackdown on illegal content is bound to be more and more stringent, if you are interested in the search for a few erotic forums ... I will not cite this specific example.

2, the protocol is wrong

Due to the complexity of the spider protocol, there are several common errors that lead to a small number of webmaster errors in creating a robots protocol.

(1) Reverse the order

Error written:

User: *

Disallow:googlebot

The right thing to do is:

User-agent:googlebot

Disallow: *

(2), put multiple prohibited commands on one line

For example, the error is written:

Disallow:/css/CGI/images/

The right thing to do is:

Disallow:/css/

Disallow: CGI

Disallow:/images/

(3), a lot of space before the line

For example, write:

Disallow: CGI

Although this is not mentioned in the standard, this approach can easily be problematic.

(4), 404 redirect to another page

When robot accesses many sites that do not have a robots.txt file set, it is automatically redirected to another HTML page by 404. Robot often handles the HTML page file in the same way as robots.txt files. Although this is generally not a problem, but it is best to put a blank robots.txt file in the site root directory.

(5), using uppercase. For example

User-agent:excite

DISALLOW:

Although the standard is not case-sensitive, the directory and file name should be lowercase:

User-agent:googlebot

Disallow:

(6), grammar only disallow, no allow

The wrong wording is:

User-agent:baiduspider

Disallow:/john/

Allow:/jane/

(7), forget the slash/

Wrong writing:

User-agent:baiduspider

Disallow:css

The right thing to do is:

User-agent:baiduspider

Disallow:/css/

......

For the Robots protocol, The Flying Pig advises you to carefully read some of the robots protocol tutorials before creating a robots protocol, such as the Baidu Help file, "Prohibit search engine collection methods (http://www.baidu.com/search/robots.html) "Very detailed, Google Webmaster tools, there are" analysis robots.txt "," Generate Robots.txt "two tools, you can make the most of it.

3. Website design problem

This situation in the previous few years in the enterprise station is more common, the most common situation is that the whole station flash, the whole station JS, spiders can not crawl, for such a site, the revision is the best choice, the relevant discussion a lot, this side will not repeat the

4, the website is not stable

Site instability is included in the situation is relatively more see, he has 2 kinds of impact on the spider crawl, one is, just spider crawl the page when your site can not visit, spider think your site has no content, a long period of time will not continue to visit, resulting in delay, Or is the process of spider crawl encountered too many errors, a page can sometimes be crawled and can not be crawled, so that the search engine that you can not provide useful content for visitors-after all, if the user clicks from the search results after a 404 page is to make search engine difficult to accept- The reason for the instability of the website also includes 2 kinds of cases, one kind is the server is unstable, many stationmaster covet cheap, uses some relatively inexpensive host, often each machine has placed hundreds of websites, the suggestion stationmaster Best chooses some more well-known IDC buys the host, for example new network interconnection, the time interconnection, the Western Digital. There is also a situation, the site's procedures are not stable, such as http://www.law158.com/can not be included for a long time, from the IIS access log, found that spiders crawl the page has been unable to access the situation, the original station on-line, because programmers in writing programs, Do not pay attention to the efficiency of the implementation of the optimization, resulting in a part of dynamic pages occupy too much resources, access to the user more immediately appear service Unavailable prompt. For this situation, the idea is to troubleshoot the use of excessive resources of the page, generally speaking, the longer the execution of the program, the more resources to occupy, the more likely to appear service unavailable error, the General dynamic page execution time should not exceed 325ms, Therefore, I suggest that the station administrator in the dynamic page to add a section to see the program execution time code, the following code

Top of program: Starttime=timer

Program Tail: Response. Write (timer-starttime) & "MS"

After the investigation of a few execution time of more than 350ms problem page, the site access to stability, a renewal cycle after the site is included.

5. Related punishment

The so-called Chengmenthe, implicating, if a site by the search engine blocked, and your site unfortunately by the search engine to determine, 2 have a close relationship, so very regrettable, included is impossible things, especially Baidu, for some station group, garbage station, SEO excessive site ban extremely strict. The related penalty is divided into three kinds, Domain name Association, Server Association, Link Association.

(1), Domain name Association. Google has mentioned in an earlier patent documentation that WHOIS information can be retrieved by search engines and used in order of results. Therefore, the search engine can judge a website by whois information, for example, you make a garbage station http://www.a.com/by the search engine blocked, and then, with the same WHOIS application for a http://www.b.com, domain name to do another website, Then http://www.b.com/may not be included, because the search engine based on the information in WHOIS information to Judge 2 site owners are the same, of course, this is just an example, generally speaking, because 1 stations are punished and trigger the domain name Association is relatively small. There is a group of friends of the site, 17washu Point Com,whois data and he had done before a few of the same k, so the station has not been included, in order to prevent the site is reproduced by mistake chain This station above the domain name did deal with. For this situation, we recommend that the webmaster in the application domain name, do not each domain name whois information are the same.

(2), Server/IP association. If you are on the same server as a Web site that has been punished by a search engine, or if you share an IP, your content may not be included in the search engine. But at home, because most of them use virtual hosts, so hundreds of stations share an IP or a server is also common, this situation need not worry, but if it is with friends, a server only put a few sites, which has a site is punished, then should be careful, should consider replacing the server. For example, a friend recently let me see a station, http://www.177liuxue.com/has not been included for a long time, excluding other factors, through the http://www.114best.com/ip/with the IP reverse search function found, The station and a k of the QQ space station, there is a few months did not be included on the side of the servers, this just know not to be included in the reason. After the replacement of the server, this update cycle is included.

(3), link Association. Google's "webmaster Guide" (http://www.google.com/support/webmasters/bin/answer.py?answer=35769#design) mentions, " Please do not participate in a link scheme designed to improve your site rankings or PageRank. In particular, to avoid links to prohibited sites or "bad neighbor", clearly pointed out that the link to the problematic site, may lead to ranking or included problems. This requires the webmaster in the selection of links when eyes, in addition to looking at the PR, but also should look at the site in Baidu and other search engines, the content of the quality of the site itself, to avoid the "bad neighbor" company.

6, the content quality is not high

Baidu in the "Webmaster FAQ" pointed out that Baidu does not include "copied from the internet on the highly repetitive content." Google's webmaster guidelines also used a special section to talk about "little or no original content" http://www.google.com/support/webmasters/bin/answer.py?answer= 66361, if your site content is reproduced on the internet many times, or there is no content, then spider may be gone forever, such as http://www.zhaoche51.com/This station was established in early July this year, Baiduspider from July 14 to visit once grabbed more than 300 pages have not visited, and then I carefully looked at the station's IIS log, found Baiduspider patronize the log as follows:

[18822] 2008-07-14 08:48:32 w3svc746795306 222.74.81.18 www.zhaoche51.com get/station.asp c= Bijie 80-61.135.168.160 baiduspider+ (+http://www.baidu.com/search/spider.htm) 200 0 0

[18823] 2008-07-14 08:48:32 w3svc746795306 222.74.81.18 www.zhaoche51.com get/station.asp c= Yanan 80-61.135.168.160 baiduspider+ (+http://www.baidu.com/search/spider.htm) 200 0 0

[18837] 2008-07-14 08:48:36 w3svc746795306 222.74.81.18 www.zhaoche51.com get/station.asp c= Ezhou 80-61.135.168.160 baiduspider+ (+http://www.baidu.com/search/spider.htm) 200 0 0

[18839] 2008-07-14 08:48:38 w3svc746795306 222.74.81.18 www.zhaoche51.com get/bus.asp id=136 80-61.135.168.160 baiduspider+ (+http://www.baidu.com/search/spider.htm) 200 0 0

[18840] 2008-07-14 08:48:38 w3svc746795306 222.74.81.18 www.zhaoche51.com get/station.asp c= Yangjiang 80-61.135.168.160 baiduspider+ (+http://www.baidu.com/search/spider.htm) 200 0 0

Unfortunately, the pages crawled by Baiduspider are just a frame page (such as/station.asp?c= Yangjiang) that is automatically generated by the program, and there is no real content! This led to Baidu's view that the site is currently worthless and has not visited this station for 22 days. In this regard, I suggest that the site has not been done, the content has not filled the framework, as far as possible not to submit to Baidu or exchange links with friends, this will lead to search engines that your site does not currently have a collection price, and can not be included in the period of the time included.

7, spider entrance is insufficient

Some sites can be quickly indexed by search engines, and some sites in the content of the situation, but to a month or two will be included in the search engine, which, traction spiders to crawl site is a very important aspect, a new station built, such as to inform the search engine of the station's web site, Used to be submitted to the search engine site to inform the way, such as Baidu submitted to the portal Http://www.baidu.com/search/url_submit.html,google site submission Portal http://www.google.com/ Addurl/?hl=zh-cn&continue=/addurl, but legend, hand-submitted URLs are easy to be manually audited, encounter unnecessary trouble, so now more is not submitted, in some sites to do a link, Let spider naturally follow the link crawl to the content of the new site, here is a common mistake is, many people think that just do a link can, the result is a spider in your link after 1, 2 months late, included that more distant things, point to the source of the new link, should be spider visit frequent , and the best content and your site-related, spider frequent visits, your link will be more quickly recognized by the spider, content and the new station-related, links will be spider think more fetching value.

The above 7 aspects, is the author encountered in the work of some search engine included problems sorted out, written hasty inevitable omission, please master generous enlighten, welcome to the author blog http://www.001pp.com more valuable advice, at the same time, SEO and website operation Exchange QQ Group 54338195, Welcome to join, the above content is an integral part of this article, reproduced must be retained.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.