How does a search engine determine whether an article is original or not?

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

I recently in the operation of a Non-mainstream station, the content is collected, began to collect fortunately, later was K, tens of thousands of data station Baidu included only dozens of. Of course, I also know that the collection is not the way, but the manpower is limited, it is impossible to add, nor realistic. So want to search the search engine is how to judge the original or not, but unfortunately, this aspect of the content is actually very little. Then I went to the search engineer's point of view to think, can not help but a cold sweat ah, because it is too simple to judge the original or not. I will analyze it according to my thinking order for reference.

Let me take this article as an example to explain. Title: Nanhao Beijing Technology Co., Ltd. is a professional manufacturer of cursor reading machine. Content: Nanhao technology research and development of the cursor reading machine fast reading card, excellent quality, good service. Our company address is in XXXX, Beijing. Spiders come to our website via hyperlink text and come to this article page through links to the site. Search engine decision analysis started.

1. Analysis of the title:

Now a lot of pages have significant optimization traces, with a lot of long tail words, but these in the back position of the long tail word should just tell the engine this page is about what content, because the engine will think that there are too many repetitions, obviously this is an inaccurate approach. There should actually be an intercept function, like intercepting only 40 characters from the Front for analysis. Ultimately, suppose the engine intercepts: Nanhao Beijing Technology Co., Ltd. is a professional cursor reader. The first thing to do is to determine the title is not the only, how to determine it, rest assured that there are ways. We all know that the engine classification is based on the word entry, and how does the item come from? Simplicity: Related search term entries. The engine will take the intercepted title and follow the relevant search term to his database to analyze the pair. For example, take the word "cursor Reader" from the title, and then with the relevant search word, if the database already has the title, it will be for this title not only one, to the article content. If the cursor reading machine This word is finished, and then intercept South Hao Beijing, and so on, and so on ... Until the end of the analysis the engine thought the title contained all the pivot words. The final title of the horse has two kinds of results: first, the title database has no such content, the content to be inspected. Second, the title database already exists this content, to be examined content.

2. Content Analysis:

The basic idea should and the analysis of the title is short, but there are differences, the content of the contents of the information is more complex than the title, a variety of more, but also have more complex algorithm. The front has said our content is: Nanhao technology research and development of the cursor reading machine fast reading card, excellent quality, good service. Our company address is in XXXX, Beijing. Because the content of the article is generally very long, so it is impossible to analyze the key words, he had to go to a sentence or a paragraph analysis of the horse pair. But the range should still be the title of the relevant search term in the article database to analyze the pair. First of all, the analysis of his methods: randomly intercept the random long field, and then on the line before and after the analysis of the field, if the current page and the engine content database with the same field and the same before and after paragraph, will think this article has plagiarism, not original suspicion. This analysis process is usually repeated several times, if the analysis of 10, there are 9 times in the interception of the field before and after the same content in the database has the same contents, plus the title is the same, so, you this article will be identified as not original.

Now let's simulate

The engine for the first time to intercept the "Cursor reader fast," and then through the relevant search term to the article database, the database field before the "Technology research and development", the field after the "quality excellent", took out these two fields and our current page for a pair. If the same content, recorded as 0, not the same content, recorded as 1. Once the horse is finished. Then intercept the "company address" and manipulate it again to get a result of 0 or 1, and so on. The number of cycles until the engine set is completed. If the horse 10 times, there are 7 times, or 8 times, or 10 times can find the same content, then you will think that this article is not original ... Further to say, if judged this is an original, then the engine will be in his domain name weight database on the domain name +1 manipulation, obviously, more and more original, the weight is more and more high, ranking is getting better. I would like to pass the title and content of such a hub word pair, as long as a sufficient number of pairs, bold and brave to expand the relevant database range, an article is not original can be distinguished. In fact, now the processor is getting faster and cheaper, coupled with the search engine engineers are highly educated, the improvement of the algorithm, as well as the accumulation of experience. Search engine to judge the original or not, just like chop cabbage as simple. Do not want to do, a think really scared jump, the conclusion is that the collection station will die! Original is still, the most bad title at least to change it. Look, if you have time to share how to do the engine analysis does not come out of the false original article. This article is published by 168 net earning forum www.wz168.org.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.