Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Do stand so long feeling the deepest is the original article in the search engine's eyes more and more important. I am responsible for several enterprise station SEO day-to-day optimization work, one of the stations are the daily average of IP in two thousand or three thousand, due to a certain period of time the quality of the site content does not pass, resulting in the site is down the right, long tail keyword traffic suddenly went to the half, the site traffic is nearly half. With my efforts to the original, the site is now a good performance gradually restored stability. In this "content for the King" era, want the site in search engines have good performance, it must be in the content of hard work.
But many SEO personnel have the experience, the lasting maintenance original content construction is not an easy thing. So pseudo original, plagiarism and other kinds of tricks have been used by stationmaster, these methods really effective or self-deception? Today, I share with you the knowledge of search engine to judge the content of duplication.
Why should the search engine actively handle repetitive content?
1, to save the crawl, index, analysis of the content of space and time
In a simple words, the search engine resources are limited, and the user's demand is unlimited. A lot of duplicate content consumes the valuable resources of the search engine, so it is necessary to deal with the duplicate content from the cost point of view.
2, to help avoid repeated collection of content
Aggregating the information that best fits the user's query intent from what has been identified and collected can improve efficiency and avoid repeated collection of repetitive content.
3. Repetition frequency can be used as the criterion of excellent content
Since the search engine can identify duplicate content of course can be more effective to identify what content is original, high-quality, repeat the lower frequency, the content of the original quality of the article is higher.
4, improve the user experience
In fact, this is the search engine most important point, only to handle the duplication of content, the more useful information presented to the user, users can buy.
Second, the search engine in the eyes of repeated content have what form?
1, format and content are similar. This kind of situation is more common on the Electricity merchant website, pilfer picture phenomenon abound.
2. Similar format only.
3, the content is similar.
4, the format and content are similar in part. This is generally more common, especially for enterprise type Web sites.
Third, the search engine how to judge duplicate content?
1, the general principle of basic judgment is to contrast each page of the digital fingerprint. Although this method can find some duplicate content, the disadvantage is that it consumes a lot of resources, slow operation speed and low efficiency.
2. I-match based on global characteristics
The principle of this algorithm is that all the words appearing in the text are sorted and then graded, with the aim of deleting irrelevant keywords in the text and retaining important keywords. In this way, the effect of heavy effects is high and the effect is obvious. For example, we may be in the false original article words, paragraphs interchange, this way can not deceive the i-match algorithm, it will still determine the repetition.
3. Spotsig based on discontinued words
The use of a large number of inactive words in the document, such as modal particles, adverbs, prepositions, conjunctions, which can interfere with the effective information, search engines in the process of the removal of these deactivated words will be deleted, and then the document matching. Therefore, we can do optimization to reduce the use of the word stop frequency, increase the density of the page keyword, more conducive to search engine crawling.
4. Simhash based on multiple hashes
This algorithm involves the geometrical principle, the explanation is more laborious, simply speaking, similar text has similar hash value, if two text Simhash closer, namely Hamming distance is smaller, the text is more similar. Therefore, the task of checking heavy in mass text is transformed into how to quickly determine whether there is a small fingerprint of hamming distance in the mass simhash. We just need to know that through this algorithm, the search engine can in a very short period of time to the large-scale Web page for the approximate weight. At present, this algorithm is effective in distinguishing the effect and checking the weight.
This article by the Telecommunications 400 Telephone http://www.400kls.com original, welcome reprint.