How a search engine determines the repeatability of content

Last Update:2014-12-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Do stand so long feeling the deepest is the original article in the search engine's eyes more and more important. I am responsible for several enterprise station SEO day-to-day optimization work, one of the stations are the daily average of IP in two thousand or three thousand, due to a certain period of time the quality of the site content does not pass, resulting in the site is down the right, long tail keyword traffic suddenly went to the half, the site traffic is nearly half. With my efforts to the original, the site is now a good performance gradually restored stability. In this "content for the King" era, want the site in search engines have good performance, it must be in the content of hard work.

But many SEO personnel have the experience, the lasting maintenance original content construction is not an easy thing. So pseudo original, plagiarism and other kinds of tricks have been used by stationmaster, these methods really effective or self-deception? Today, I share with you the knowledge of search engine to judge the content of duplication.

Why should the search engine actively handle repetitive content?

1, to save the crawl, index, analysis of the content of space and time

In a simple words, the search engine resources are limited, and the user's demand is unlimited. A lot of duplicate content consumes the valuable resources of the search engine, so it is necessary to deal with the duplicate content from the cost point of view.

2, to help avoid repeated collection of content

Aggregating the information that best fits the user's query intent from what has been identified and collected can improve efficiency and avoid repeated collection of repetitive content.

3. Repetition frequency can be used as the criterion of excellent content

Since the search engine can identify duplicate content of course can be more effective to identify what content is original, high-quality, repeat the lower frequency, the content of the original quality of the article is higher.

4, improve the user experience

In fact, this is the search engine most important point, only to handle the duplication of content, the more useful information presented to the user, users can buy.

Second, the search engine in the eyes of repeated content have what form?

1, format and content are similar. This kind of situation is more common on the Electricity merchant website, pilfer picture phenomenon abound.

2. Similar format only.

3, the content is similar.

4, the format and content are similar in part. This is generally more common, especially for enterprise type Web sites.

Third, the search engine how to judge duplicate content?

1, the general principle of basic judgment is to contrast each page of the digital fingerprint. Although this method can find some duplicate content, the disadvantage is that it consumes a lot of resources, slow operation speed and low efficiency.

2. I-match based on global characteristics

The principle of this algorithm is that all the words appearing in the text are sorted and then graded, with the aim of deleting irrelevant keywords in the text and retaining important keywords. In this way, the effect of heavy effects is high and the effect is obvious. For example, we may be in the false original article words, paragraphs interchange, this way can not deceive the i-match algorithm, it will still determine the repetition.

3. Spotsig based on discontinued words

The use of a large number of inactive words in the document, such as modal particles, adverbs, prepositions, conjunctions, which can interfere with the effective information, search engines in the process of the removal of these deactivated words will be deleted, and then the document matching. Therefore, we can do optimization to reduce the use of the word stop frequency, increase the density of the page keyword, more conducive to search engine crawling.

4. Simhash based on multiple hashes

This algorithm involves the geometrical principle, the explanation is more laborious, simply speaking, similar text has similar hash value, if two text Simhash closer, namely Hamming distance is smaller, the text is more similar. Therefore, the task of checking heavy in mass text is transformed into how to quickly determine whether there is a small fingerprint of hamming distance in the mass simhash. We just need to know that through this algorithm, the search engine can in a very short period of time to the large-scale Web page for the approximate weight. At present, this algorithm is effective in distinguishing the effect and checking the weight.

This article by the Telecommunications 400 Telephone http://www.400kls.com original, welcome reprint.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How a search engine determines the repeatability of content

Contact Us

Recommend Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support