The status of the URL in the process of SEO is very important, but also a fundamental problem faced by seoer, we have in many previous articles have repeatedly mentioned a point of view:
SEO flow generated in your ranking better than the page, and these pages have ranked the premise is the search engine included.
We know that the internet world is growing larger, the production of content can almost be said to be endless, this for the limited resources of the search engine, they can do is to be as fast as possible to include new content, so that the network of new content production > be crawled included content. The URL and search engine relationship, for example, as if your address and courier relationship, an accurate and easy to find the address, will greatly improve the courier delivery.
Then the search engine in the crawl page process, what will encounter problems?
One: The repetition of the URL.
Please do not overlook this point, the content may be different from what you think. We assume that there are two URLs below
http://www.xxx.com/seo/888 http://www.xxx.com/seo.asp?id=888
The two pages produce the same content, the former may be pseudo static, or it may be a real static page, but it seems that the former is better than the latter. But this is not the case, home page, the format of these two URLs are easy to crawl and included, we often do not use? This dynamic approach is to avoid a large number of duplicate content that can be generated. But the former is also likely to produce a large number of duplicate content, such as search engines may mistakenly think that the 888 is a sessionid, in the case of no accurate judgment, the former advantage of this model is not obvious.
There may be some people who don't quite understand the passage, first of all to separate the URL and content, in understanding the above words, we first simply said the search engine for repetitive content of the judgment: Search engine itself has a strong database to store the content has been crawled, to determine whether a content has similar, The best practice is to compare what you're about to crawl with what's already in the database, but by reading the Google website Quality Guide, we found that this is an understanding of misunderstanding, in turn, this in the crawl when the content of the comparison of the technical means is not very feasible, because the content is too large. So the search engine on the crawl URL analysis is very important, we want to let the search engine that our URL corresponding content in the station is not repeated, the best course or static URL, so that the search engine that the site itself does have a lot of different content, for this situation, the best URL should be written:
Http://www.xxx.com/seo/seo-url
In the final analysis, the URL is guaranteed to be unique and not to be confused with some other situation.
Two: "Infinite Space" (Infinite cycle)
Now most blogs have a calendar control, that is, no matter what time you click, there will be a page, since the corresponding content can not be found, but the resulting URL is the only, thus, the formation of the concept of infinite space, because the time is endless, so the resulting page is endless, For search engines, this is very unfriendly.
You can avoid this situation by using the NOFOLLW property to do effective boot, related articles: http://www.admin5.com/article/20120312/414377.shtml
Three: The hierarchy should conform to logic.
We analyze the following several pages:
1, http://www.xxx.com/seo/
2, Http://www.xxx.com/seo/url
3, Http://www.xxx.com/seo/url/weiyi
If the search engine can only crawl one of the words today, from the priority level, it is the first crawl 1th, then there is a misunderstanding, if I put the page in the root directory, there is no level of priority, if the level of priority is not different, search engines will be the same directory under the optimization of the comparison, This is why the collection of time will first crawl the home page. So the best way is to build subdirectories according to the business logic, what is the dependency relationship between content and content, and use hierarchical optimization to plan the URL.
IV: Duplication of content processing.
The above picture is I from a well-known network shopping platform to search the notebook when the screening conditions, we do a data analysis, in this page, the brand is 16, the price condition is 5, the processor is 8, the screen size is 8, the hard disk easy condition is 6, the memory is 6, the hard disk is 6, the video card condition is 6, The resulting search conditions are as follows:
16*5*8*8*6*6*6*6=6220800
And we look at the picture shows a product of 2,471, so obviously the content of the repetition is very much, the example here is not very large data, some sites can be combined into hundreds of millions of or even tens of billions of of the page out. Interested friends can look at my previous written ASP and other dynamic language site in doing SEO, site search should pay attention to the problem.
"Respect for original, share ideas." From the Open Sesame Network Technology original article, reproduced please indicate the origin of the article-http://www.51zmkm.com/news/25.html "