The technical rules of search engine original recognition algorithm

Source: Internet
Author: User
Tags final

Some time ago to attend a small gathering of the SEO circle in Wuhan, chat hi leather, together with a few Baidu engineers, the specific analysis of the next Baidu original recognition algorithm, in the technical aspects of some details, feel quite interesting, write to everyone together under the communication, to shoot a short sesame brick.

Why do search engines pay so much attention to originality?

In the early search engine algorithms, there was no original judgment. But with the later acquisition, reproduced the large norms of proliferation, resulting in users can hardly search for the content they really want. A lot of repetitive content is flooded in search results, which makes people dazzling.

First of all, due to the rise of late acquisition technology, a large number of reprinted content flooded the network. Reproduced, will inevitably have some damage to the original, such as removing the picture, delete some important paragraphs, or flooded with a large number of non-original author's annotation information. Whatever it is, it lowers the quality of the content, The first more than 10 pages of a keyword search are the same content, making the search useless. Thus prompting the search engine to outline the original content.

As a result of the late acquisition technology gradually strong, can automatically replace some synonyms and modify, also caused the content of reprinted articles to further decline. There are a lot of unreadable articles on the web. It also prompted the search engine for high-quality original content screening.

In fact, one of the most tangled things about a dwarf sesame is that wrote an article, was transferred, and changed to the bottom of the head, it became a very domineering article. Over time, it is a blow to the author's enthusiasm for the innovation of the article. This is also the search engine's effort to find the source of original articles, The first reason to give priority to display. Respect the author's copyright, otherwise an article is reproduced, a large number of traffic is draining to other sites, will directly affect the author's earnings.

How does a search engine determine if an article is original?

1.1 Reprint of the conscience, reproduced to retain the foreign company, author name, declaration reproduced, etc.

Personally, this is a purely conscientious practice. Because in the reprint, delete your information is very easy. Of course, a lot of collection may not delete the copyright notices left by the original author. This gives search engines a good way to identify. First of all, the general portal after the article is reproduced, It'll be polite. Leave a [turn] word behind the title. More honest, but also in the bottom of the article or the head left to reprint the source of the link.

Reprint source is not necessarily original, but certainly more conducive to search engines to find the ultimate source of this article. Currently known practices, the title stay [Turn], the bottom leave the original author of the article link, the article information displayed reproduced in the author or website. This is the most popular way to identify.

1.2 Technical aspects of identification

Of course, such a polite reprint is only part of the collection of collected troops. There is a considerable part of the reprint, will Qiatouquwei change the title, the outside even the author does not mention, directly in the collection process shielding replacement. Maybe this is a habit of Chinese cottage collection, like QQ ICQ, Baidu's Google, Alipay PayPal ... So for this kind of behavior, in this we also do not evaluate.

Search engines for this type of article recognition is more through the technical aspects of things. The most priority is that the article was first captured by search engine spiders. Meanwhile, the time left in the article, for Baidu Spider is confusing. That is to say, you reprint a May 18 post, Change the time to May 16, Baidu Spider is a certain probability of being deceived.

At the same time, to change the title of this aspect, Baidu has a very vague algorithm. The specific approach is to compare the relationship between the title and the content. It is not clear how the specific judgment is done, but according to past experience, if the title and content are completely unrelated, This article is easily overlooked by Baidu. This means that the search engine has a certain degree of recognition in this respect. At the same time, for a slightly different two articles, Baidu can be based on the content of the degree of fluency, to determine which article of higher quality.

Acquisition content is difficult to identify, search engine algorithm needs to be improved

Here, many people may think that the search engine's recognition algorithm is already very powerful. But the fact is, the collection of articles is still difficult to identify.

1.1 Acquisition tools of pseudo original learning ability is very strong

Because now the collection tool for synonyms, the language of the machine self-learning ability is also very powerful. Now collected an article through the processing of collection tools, even individuals, but also feel that the statement is slightly stiff, it is difficult to feel that this article is from the hands of software, so this is the search engine is now a headache point, Because the quality of this article, after all, than the original difference too much.

1.2 Web page structure is too complex, HTML structure is difficult to recognize

Although there are a lot of websites doing in line with the specifications of SEO, but the content area and the column, the hot topic recommendation, the advertisement and so on content in the HTML separation is not obvious. This also increases the search engine for content to read after the separation of recognition. Through the Web mobile version of the optimized reading can be seen, At present, Baidu is still often will not be able to distinguish which is the title of the article, which is the author, content, release time and so on. This also caused the final for the original content of the comparison, there are certain deviations.

1.3 Article manufacturing tools, direct generation of original articles

At present, the network is popular with many original article manufacturing tools, such as directly from English articles using Google translation, using the matching of the statement automatically match a search engine spider seems to be a perfect article, but for users, in addition to a large number of keyword piling, meaningless.

A series of articles, the final decision of the current search engine, low sesame feel search engine or need to increase the identification of original content input, because today's collection and pseudo original technology, can go far in the forefront of OH ~ Short Sesame blog http://www.cl889.com.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.