Baidu search engine in order to rectify the information content of the Internet, the large-scale launch of "Baidu original Spark Program", in order to this plan can be implemented with high intensity, set up a corresponding topic page, but also invite high-quality sites to join the Spark program. We now face is a full of "repetitive content", "a lot of junk content" era, the spark program in the search engine is how to identify duplicate content?
Search engine to provide users with high-quality content, users search for relevant content, through some filtering mechanism to filter out duplicate content, instead of showing a large number of repeat the same results, if the site has a large number of duplicate content, in the search engine filtering process may bring impact to the site.
Search engine to crawl content before the site, there are already crawling page is expected, if the site has a large number of duplicate pages, spiders will be on these pages for one by one search, although the results of the return is filtered, but it wasted the spider on the site to crawl the budget, reducing the spider to crawl other high-quality pages, Repeating the same page will also disperse the overall weight of the site, so that the spider can get a meaningful page reduction.
Webmaster can not judge search spiders will crawl repeat page that version, search engine itself can not clear the description, the user of different search, the spider returned page may be different, the existence of repeated pages may also have different biases, users search the return page is your favorite return page, Can you bring the highest rate of flow conversion? These are not known, to reduce the weight of duplicate pages dispersed, you can screen in a file, or add canonical tag transfer weight.
As the above analysis of the situation, repeated content will affect the spiders crawl. At the same time, search engine for the Internet ecological Fair and healthy development and quality of the original site's collective interests, reduce the original value of the collection station, will repeat, load, spam site to punish. Search engine to make such a statement, it is how to identify duplicate content?
Duplicate content can occur in different sites, can also occur on the same site, the optimization of the SEO CMS system, resulting in more than one URL address site can access the same page, such a situation is repeated content, spiders encounter such a situation will determine which page is the important page, Will first be detected from the robots file, whether there is a ban on the crawling address, if there is to stop the repeat page address crawl, if not will continue to crawl content, in the crawl encountered meta tags, annotated noindex these pages are not for spiders, this situation spiders will also not continue to crawl repeat , to reduce the spider's crawl difficulty, even if the content is repeated, spiders will only crawl the value of that version.
Baidu's Spark program invited Webmaster to participate in the original site, if your site is the original site, not plagiarism imitation, content and form of a unique personality of the resources, and has the social consensus value of resources, in line with the relevant national regulations; Not in the second load and false originality; You can submit your site to Baidu original Spark program site. This method can be reduced because the site weight is not high, the site content in the high weight of the site reproduced, spiders will think that the site is reproduced original content, your station instead is reproduced collection site.
Search engine in the end how to determine whether the site content is original, duplicate content page in the end which page is the original page, this is any search engine has not published algorithm. But Loudi Talent Network (http://www.0738rc.com) from Baidu official data to understand, with the "original" label site, the original Spark program on the line before and after the same time flow comparison shows that the original URL traffic growth is obvious. At the same time, the original Spark program online after a period of time the original URL traffic growth is obvious and gradually become stable, visible if the site can join Baidu original plan, the site in the flow of ascension still has a lot to help.