This paper gives a detailed description of how to avoid duplicating the webpage, guaranteeing the high quality of the webpage content and improving the weight of the webpage, and provides the method.
The author is Google's employees, absolute authority.
properly block search engine access:
Instead of having our algorithms determine the "best" version of a file, you might want to direct Google to choose your preferred version. For example, if you do not want us to index the printed version of your website article, you can write a directory name or regular expression in your robots.txt file to prevent Google from crawling those print versions.
Use 301 Redirect:
If you have refactored your site, please redirect your users, Google crawlers, and other search engine spiders by using 301 redirects (permanent redirects) in your original site's. htaccess.
links should be consistent:
Strive to keep your internal links consistent; Do not have/page/, but also/page and/page/index.htm internal links.
Use top-level domain name: To keep us always using the most appropriate version of the file, use the country-specific top-level domain as much as possible. As with URLs such as Example.com/de or de.example.com, Google certainly knows more clearly that Example.de is a German-centred content.
Beware of syndicated syndication:
If you also provide your content for other sites, please include links to the original article in each of the other Web site articles. Note: Even so, for a query, Google always shows the version that we think is the most appropriate (not banned by the website), and it may or may not be the version you want.
use the preferred domain feature of Google Webmaster tools:
If other sites link your URLs to both the WWW version and the no WWW version, you can use the Google Webmaster tool to let us know what kind of indexing you want.
to reduce duplicate content on template pages:
For the copyright notice, you have two choices, one is a lengthy copyright notice at the bottom of each of your pages. The other is to set up a special Copyright detail page, and then write a very brief summary at the bottom of each page and link to the copyright notice page.
Avoid publishing Content-Free pages:
Users don't like to see pages without actual content. Try to avoid an empty shelf page. Take real Estate page as an example, do not publish (or at least to prevent) comments on the page, or no real estate list of real estate listing sites. Only in this way, Web site users (and Google crawler) will not see the infinite number of "below" in "a city name" can not be missed in the list of waiting to rent ... "but there is no list of pages."
get to know your content management system:
Make sure you are familiar with how your site displays content, especially when it includes blogs, forums, or related systems. Often in these systems the same content will appear in many forms.
don't worry, stay happy:
Do not be unduly disturbed by the plundering (misappropriation and reprint) of your content site. While annoying, it's almost impossible to have a negative impact on your presence in Google. If you can't stand it, you are welcome to submit a Millennium Copyright Act to claim ownership of your content. We'll deal with those rogue sites.