Search engine is how to determine whether the content of the page article original

Source: Internet
Author: User
Keywords Search engine article content

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

I recently in the operation of a Non-mainstream station, the content is collected, began to collect fortunately, later was K, tens of thousands of data station Baidu included only dozens of. Of course, I also know that the collection is not the way, but the manpower is limited, it is impossible to add, nor realistic. So want to search the search engine is how to determine the original or not, but unfortunately, this aspect of the content is not much. Then I went to the search engineer's point of view to think, can not help but a cold sweat, because it is too easy to determine the original or not. I will analyze it according to my thinking order for reference.

Let me take this article as an example to explain. Title: Nanhao Beijing Technology Co., Ltd. is a professional manufacturer of cursor readers. Content: Nanhao technology research and development of the cursor reading machine fast reading card, excellent quality, good service. Our company address is in XXXX, Beijing. Spiders come to our website via hyperlink text and come to this article page through links to the site. Search engine judgment analysis started.

1. Analysis of the title. Now a lot of pages have obvious optimization traces, with a lot of long tail words, but these in the back position of the long tail word should just tell the engine this page is about what content, because so the engine will think there are too many repetitions, obviously this is an incorrect approach. You should actually have an intercept function, such as intercepting only 40 characters in front of the analysis. Finally, suppose the engine intercepts: Nanhao Beijing Technology Co., Ltd. is a professional cursor reader.

The first thing to do is to judge the title is not the only, how to judge, rest assured that there are ways. We all know that the engine classification is based on the word entry, and how does the item come from? Simple: Related search term entries. The following figure:

The engine will take the intercepted title and follow the relevant search term to his database to analyze the pair. For example, take the word "cursor Reader" from the title, and then with the relevant search word pair, if the database already has this title, it will be considered the title is not unique, to the content of the article. If the cursor reader is on the right end of the word, we'll intercept South Hao Beijing again, and so on. Until the end of the analysis engine that the title contains all the keywords.

The final title of the result of the two kinds of results: first, the title of the database has no such content, to be examined content. Second, the title database already exists this content, to be examined content.

2. Content analysis. The basic idea should be similar to the analysis of the title, but there are also differences, the information contained in the content is more complex than the title, a wide variety of, but also have more complex algorithm.

The front has said our content is: Nanhao technology research and development of the cursor reading machine fast reading card, excellent quality, good service. Our company address is in XXXX, Beijing. Because the content of the article is generally very long, so it is impossible to analyze the key words, he had to go to a sentence or a paragraph analysis of the horse pair. But this is a pair of articles in the article database that should be in the title of the relevant search term.

First of all, the analysis of his methods: randomly intercept the random long field, and then on the line before and after the analysis of the field, if the current page and the engine content database with the same field and the previous paragraph is the same, you will think that this article has plagiarism, not original suspicion. This analysis process is usually repeated several times, if the analysis of 10 times, there are 9 times in the interception of the field before and after the same content in the database has the same contents, plus the title and the same, so, you this article will be identified as not original.

Now let's simulate.

The engine first intercepted the "Cursor reader read card fast," and then through the relevant search term to the article database, the database field before the "Technology research and development", the field for "quality excellent", take out these two fields and our current page for a pair. If the same content, recorded as 0, not the same content, recorded as 1. Once the horse is finished.

Then intercept the "company address", do the operation, get a result 0 or 1 again, and so on. The number of cycles until the engine set is completed. If a pair of 10 times, there are 7 times, or 8 times, or 10 times can find the same content, then you will think that this article is not original ...

Farther said, if the decision is an original, then the engine will be in his domain name weight database on the domain name +1 operation, obviously, more and more original, the weight is more and more high, ranking is getting better. such as A5,chinaz.

I would like to pass the title and content of such a keyword pair, as long as a sufficient number of pairs, bold expansion of the relevant database range, an article is not original can be distinguished. In fact, the current processor is getting faster and cheaper, coupled with the search engine engineers are highly educated, improved algorithm, and the accumulation of experience. Search engines to judge the original or not, as simple as chopping cabbage.

Do not want to do, a think really startled, the conclusion is that the collection station will die! Original or, at least, the worst title should be changed. Look, if you have time to share how to do the engine analysis does not come out of the false original article.

The above is only the younger brother simple analysis, the actual algorithm is much more complicated, for reference only! Another ad: http://www.nanhaokeji.com, I operate a station sincere recruit links, enterprise station for good, PR just updated to 1, qq:419844484, add friends Please specify friends chain.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.