Tips for avoiding spiders crawling and indexing errors: bypassing conflicts

Source: Internet
Author: User
Keywords Avoid what you

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

As you know, you can't always rely on spider engines to work very effectively when accessing or indexing your site. Relying entirely on their own ports, spiders produce a lot of duplicate content, some important pages as garbage, the index should not be shown to the user's chain access port, there will be other problems. There are tools that allow us to fully control the spiders ' activities within the site, such as meta-robots tags, robots.txt, canonical tags, etc.

Today, I talk about the limitations of the use of robotic control technology. In order for spiders not to crawl a page, webmasters sometimes use multiple robotic control techniques to prevent search engines from accessing a Web page. Unfortunately, these technologies can sometimes contradict each other: on the other hand, such restrictions can hide certain dead chains.

So what happens when a page is blocked from the robots file, or is used noindex tag and canonical tag?

Quick Review

Before we go into the subject, let's take a look at some of the limiting techniques of the mainstream robots:

Meta Robot label

Meta robot tag (Meta robots tag) establishes page level descriptions for search engine robots. The meta robot tag should be placed on the head of the HTML file.

Spec tag (canonical tag)

The spec tag (canonical tag) is a meta label for the page level in the HTML header of the page. It tells the search engine which URL to display is canonical. The goal is to keep the search engine from grabbing duplicate content while concentrating the weight of the duplicated pages on the page of the specification.

The code is like this:

<link rel= "canonical" Href= "http://example.com/quality-wrenches.htm"/>

X Robot Label

Since 2007, Google and other search engines have supported the use of X-robots-tag as a way to tell spiders to crawl and index the order of precedence, X-robots-tag is located in the HTTP header, used to notify spiders to crawl and index files. This label is useful for controlling indexes on those non-HTML files, such as PDF files.

Robot label

Robots.txt allows some search engines to go inside the site, but it doesn't guarantee that a particular page will be crawled or indexed. Unless it is for SEO reasons, robots.txt is worth using only if it is really necessary or if there is a need to shield the robots on the site. I always recommend using the Meta data label "NOINDEX" to replace it.

Avoid conflicts

It is unwise to use both methods to limit the robot entrance:

· Meta robots ' noindex ' (Meta robot Tag "NOINDEX")

· Canonical tag (when pointing to a different URL) (Standard label)

· Robots.txt Disallow

· X-robots-tag (x Robot label)

Although you'd like to keep the search results on the page, one way is always better than two. Let's take a look at what happens when there are a lot of path control techniques in a single URL.

Meta-Robots ' noindex ' and canonical tags

If your goal is to pass the weight of a URL to another URL and you don't have a better way to do it, you can only use the canonical tag. Do not use the Meta robot label "NOINDEX" to trouble themselves. If you use two robot methods, search engines may not see your canonical tags at all. The utility of the weight transfer will be ignored, because the robot's noindex tag will make it invisible to the canonical tag!

Meta-Robots ' noindex ' & X-robots-tag ' Noindex '

These labels are superfluous. These two tags are placed on the same page I can only see the bad effects of SEO. If you can change the header file in the Meta robot ' noindex ', you shouldn't use the X robot tag.

Robots.txt Disallow &meta ' noindex '

This is the most common conflict I've ever seen:

The reason I favor meta-noindex is because it effectively blocks a page from being indexed, and it can also pass weights to a deeper page that connects to the page.   This is a win-win approach. The robots.txt file does not allow a full restriction on search engines to view information on the page (and its valuable internal links), especially when URLs are indexed. What's the benefit? I have written an article on this subject alone.

If two tags are used, the robots.txt guarantee will make meta robots ' noindex ' not be seen by spiders. You will be affected by the disallow in robots.txt and miss all the benefits of all the meta-Robots ' noindex '.

The source of the article for www.leadseo.cn Shanghai, the website optimization experts, reprint please keep the source! Thank you very much!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.