Baidu Webmaster Platform: About the original project of those things

Source: Internet
Author: User
Tags continue

Webmaster Network (www.admin5.com) May 17 News, yesterday A5 reported Baidu search results page began to large news portal site for the original content labeling, Baidu Spark program began online. Then Baidu Webmaster Platform SEO expert Lee released an article on the original project. The article is mainly about why to pay attention to the original, collection is very cunning, to identify the original is very difficult, Baidu to identify the original road how to go? The following is the specific content.

  First, why should the search engine pay attention to the original

1.1 The acquisition of flooding

A survey from Baidu, more than 80% of the news and information are manually reproduced or machine collection, from traditional media newspapers to entertainment website Lace News, from the introduction of the game to product evaluation, and even the university library issued reminders also have sites in the Machine collection. It can be said that the quality of original content is surrounded in the collection of one millet, search engine in the Sea Amoy, is both difficult and challenging things.

  1.2 Improve the search user experience

The digitalization reduces the transmission cost, the tool reduces the collection cost, the machine collection behavior confuses the content source to reduce the content quality. The collection process, out of unintentional or intentional, resulting in incomplete collection of Web page content, format confusion or additional garbage problems, which has seriously affected the quality of search results and user experience. Search engines focus on the root cause of the original is to improve the user experience, here is the original quality of the original content.

 1.3 Encourage original authors and articles

Reprint and collection, shunt the quality of the original site flow, no longer with the name of the original author, will directly affect the quality of the original webmaster and the author's income. Long-term look will affect the initiative of the original, not conducive to innovation, not conducive to new high-quality content. Encouraging high quality originality, encouraging innovation, giving the original site and the author reasonable flow, so as to promote the prosperity of Internet content, should be an important task of search engines.

 Second, the acquisition is very cunning, it is difficult to identify the original

2.1 Collection posing as original, tamper with key information

At present, a large number of sites in bulk collection of original content, using manual or Machine methods, tampering with the author, release time and source of key information, posing as original. This type of posing as original is required to identify the search engine to properly adjust.

 2.2 Content generators, create false original

Use the Automatic article generator and other tools, "original" an article, and then an eye-catching title, now the cost is very low, and must be original. However, the original is to have the value of social consensus, rather than making a random piece of junk can be counted as a valuable high-quality original content. Although the content is unique, but does not have the social consensus value, this kind of false originality is the search engine needs to identify and strike.

 2.3 Web page differentiation, structured information extraction difficulties

The different site structure difference is bigger, the HTML label meaning and the distribution also different, therefore extracts the key information like the title, the author and the time Difficulty degree difference is also quite big. To achieve both the full and the accurate, also the most timely, in the current Chinese Internet scale is not easy, this part will require a search engine and webmaster to cooperate well will be more smooth operation, stationmaster if use clearer structure to inform search engine page layout, will make the search engine efficiently extracts original related information.

 Third, Baidu to identify the road to the original how to go?

3.1 Set up original project team to fight a protracted war

Faced with the challenge, in order to improve the search engine user experience, in order to make the original site of quality creators get the benefits, in order to promote the Chinese Internet, we have a large number of people to make the original project group: technology, products, operations, legal, etc., this is not a temporary organization is not 1 We are ready for a protracted battle.

  3.2 Original recognition "origin" algorithm

The internet is prone to tens of billions, hundreds of billions of pages, from the excavation of original content, can be said to be a haystack, a multitude. Our original identification system, in the Baidu Big Data cloud computing platform to carry out, can quickly achieve all Chinese Internet web pages of repeated aggregation and link-point relationship analysis. First of all, by the content of similarity to aggregate collection and original, the similar web pages are aggregated together as the candidate sets of original identification; Secondly, the original candidate set is identified by the author, publishing time, link point, user comments, author and site history original situation, forwarding trajectory and so on hundreds of factors to identify the original web page; Finally, Through the value analysis system to judge the value of the original content and then appropriate guidance to the final ranking.

At present, through our experiments and real online data, the "origin" algorithm has made some progress, in the news, information and other fields to solve most of the problems. Of course, in other areas there are more original problems waiting for the "origin" to solve, we firmly walk.

 3.3 Original Spark Program

We have been committed to the original content recognition and sorting algorithm adjustment, however, in the current Internet environment, the rapid identification of original solution to the original problem is indeed facing a great challenge, the scale of the calculation of data, the face of the collection of endless, different sites to build stations and templates vary enormously, content extraction complex and so on. These factors will affect the original algorithm recognition, and even lead to error in judgment. At this time need Baidu and webmaster work together to maintain the ecological environment of the Internet, webmaster recommended original content, search engine through a certain judgment after preferential treatment of original content, to jointly promote ecological improvement, encourage original, this is the "original Spark program", to quickly solve the serious problems currently facing. In addition, the webmaster to the original content of the recommendation, will be applied to the "origin" algorithm, and then help Baidu found the shortcomings of the algorithm, continuous improvement, with more intelligent recognition algorithm automatically identify original content.

At present, the original spark program has also achieved a preliminary effect, a part of the original news site of the original content in the Baidu search results to give the original mark, the author shows, and so on, and in the ranking and flow has also made a reasonable upgrade.

Finally, originality is an ecological problem, need long-term improvement, we will continue to invest, with the webmaster to promote the Internet ecological progress; Original is the environmental problem, need everyone to common maintenance, stationmaster do more original, more recommend original, Baidu will continue efforts to improve sorting algorithm, encourage original content, for original author, The original site provides reasonable sequencing and flow.

Related reading:

Baidu Webmaster Platform Lee released a document talking about website 404 questions

Baidu outside chain Judgment standard document release explain problem outside chain judgment and processing principle



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.