The principle that search engine judge the content of website article is original

Source: Internet
Author: User
Keywords Search

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Now most of the site visitors are from the search engine, to visit the number of people who are more aware of, the most important thing is to be included in the major search engines more entries, and ranked as far as possible by the front. So what to do to make the search engine quickly indexed content and the top of the list? Small make up before has already talked about the enterprise website construction completes the website The Search engine optimization and the promotion, then today Small series will discuss with everybody the original content.

First of all need to clear a concept: the Web search engine judgment of the original can be understood as the first time on the network content, that is, the network did not appear on the contents of the server database index is not content.

So how did the search engine make the original judgments? Search engine spider indexing Program through hyperlink text to the site, and through the site link to the article page.

Search engine decision Start analysis:

First of all, the title of the analysis: The search engine is generally to intercept the first 60 characters as analysis content. To determine if this title is unique, we all know that the engine classification is indexed by entry-related search terms. The engine will match the intercepted headings to the database of entries that have been included in the relevant search terms. If this title is already in the database, it will be considered as a title to the article. If a group of words is finished, say goodbye to the following words, and so on. Until the previous 60 characters have been matched and the following words are treated uniformly, the individual thinks the engine is likely to do a string processing of the next phrase.

The ultimate title of the horse has two kinds of results: 1. The title database does not have this content temporarily; 2. This content already exists in the header database. For the two different scenarios, the engine makes an identification in its index server. A ranking parameter as a site weight.

Second, the content of the analysis: the basic ideas should be similar to the analysis of the title, but there are differences. Because the content contains much more information than the title, it requires more complex algorithms. Because the content of the article is generally very long, it is impossible to analyze the key words of the pair, can only say a word or a paragraph of the analysis of the horse pair. But the range should still be a database of articles that have relevant search terms in the title. The method of analyzing the content is to intercept the randomly long field and then analyze the field before and after it. If the current page and the engine content database has the same field and the previous paragraph is similar, it is believed that this article is not original suspicion.

This analysis process is usually repeated several time. If analyzed 10 times, there are seven times before the interception field can be in the existing content database of the same content, plus the title is similar, the article will be identified as not original.

If judged this is an original, then the engine will be in its website weight index database in the weighting of the domain name, obviously, the more original articles, the higher the weight, the site ranking is getting better.

The engine through the title, content keyword and server content, as long as a sufficient number of pairs, expand the relevant database entries on the range, an article is not original can be distinguished. With the increasingly powerful server performance, the algorithm is becoming more and more complex, it should be easy to determine whether the original article. So plagiarism, copy, will surely die. Original link reproduced please keep the original link. Personal opinion is for reference only, please correct me.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.