Abstract: Generally speaking, in the construction of the site, duplication of content is difficult to avoid, but repeated content to a large extent will affect the site in the search engine performance. What we are discussing today is the reason for repeating content and how to solve it
In general, in the construction of the site, duplication of content is difficult to avoid, but the duplication of content to a large extent will affect the site in the search engine performance. What we are discussing today is the reason for repeating content and how to solve it.
The main reason for duplicate content
1, the problem of Web site normalization.
URL normalization issues include the normalization of the main domain name and page URL address normalization Two aspects, the main domain name standardization needs to be standardized. In addition is the normalization of the page URL, usually in order to allow search engines to better crawl the content of the site, we will be the URL for pseudo static processing, and the general site pseudo Static, the original dynamic URL still exists and can access, which caused a number of URLs to access the same Web site.
2, other versions of the content
In addition to providing a normal version, many Web sites offer some other browsing versions, such as a print version or a simple version, but do not prohibit search engines from crawling these pages, which has become a duplicate content page.
3, website structure
Many sites in the structural design of the beginning did not consider the factors of SEO, the result is caused by a variety of page versions, such as products by price, comment, time sorting, etc., especially some e-commerce sites, such pages repeat the phenomenon is particularly serious.
4,url any plus characters or 200 status codes.
Some websites are because of the website procedure and the technology reason, the user after the URL parameter random plus some characters all can visit normally, and the page is and does not add before the complete repetition.
Check if there is a duplicate version of the page there is a relatively simple way to randomly select a sentence with double quotes after the search, from the results can see how many duplicate pages. Because it's usually a little bit of a chance to search for a full life with a random sentence.
The dangers of duplicate content
A lot of the existence of SEO is a misunderstanding, that is, if the site has duplicate content will be the search engine penalty, in fact, not so serious, search engine will be in all the repeat page to choose a think the best version to participate in ranking, other similar content will not have the rank.
Then the problem comes, first, search engine how to determine which page is the most genuine, whether and webmaster want to recommend the same page, if the search engine judgment error, the original page as a copy of the content, copy the page as the original page, and you are in the process of promotion is the original, it is done without effort. In addition, there are multiple copies of pages in the same Web site that distribute page weights, since the page appears on the site, there will necessarily be links to this page, and if the link is unified, that all the weight can be concentrated, while the same page included to a certain extent also occupy the search engine spiders crawl energy, Reduce the original need to carry out the capture page of the tape recorder rate.
Ways to eliminate replicated content
URL normalization problem The best way is a page only corresponds to a URL, do not appear a number of different versions, the site all links point to this URL. Of course, sometimes because the degree of the original or other reasons, can not be completely unified as a URL, then we could use the following three ways to focus weight.
1,301 redirect
301 can pass the weight of the page, currently the mainstream search engine supports 301 redirects, you can copy the content of the page all through 301 permanent redirect to the original page of the way.
2,robots File prohibited
It can also effectively solve the problem of Web site duplication by preventing the copying of content from being crawled by search engines in robots.txt files.
3,canonical Label
The canonical tag is a new label issued jointly by Google, Yahoo in 2009, and Baidu also supports canonical tags, which can also be used to address duplication.
After you add the above labels, it is equivalent to tell the search engine which URL is the most standard original version, other copies of the content is to point to this one unique URL, a little page 301 to pass the weight of the meaning, but 301 is the page directly jump, and add this tag, the page or stay in the original address.