The first thing to do after the website is to submit the Web site to search engines, search engine received the URL of the request, and will arrange the spider to crawl the site, but not satisfactory is the spider can not be fully included in the site, the actual rate of the site is very low, what is the cause of the site can not be fully included, is the website structure problem, is the website weight problem, or is the website optimization problem? Guangzhou part-time bar to everyone to uncover the true reason not fully included.
First: Be robots.txt file screen. When you analyze a Web site log, you find that spiders crawl on the site every day, but the site is still included is not sufficient, at this time we are very necessary to detect the website of the robots file, webmaster are aware of spiders to crawl the site, the first will be to retrieve the existence of a Web site files, see if the site has no need to crawl pages , see whether the file to prevent a certain part of the content of the site, a lot of webmaster because it will not be the right to write a file, resulting in the site can not be fully included. For many novice webmaster do not know how to write files, you can use the Baidu Webmaster Tools file function, you can detect the correct writing of your files, or can directly help you generate files, you need to fill out the Shield file path. As in the following example:
User-agent: *
Disallow:/news/does not allow all search engines to crawl the contents of the news path
If the above file is added to a Web site, spiders will not crawl the site in the news path content, the site in the news directory updated articles will never be included, the site updates more articles, visit the log spider every day to crawl, but for these content is not to be included. But for the website news content is very important, so the wrong files and the Web site can not be fully included behind the hand.
Second: Robots meta tags banned, the site in the process of diagnosis, found that the actual site is low, the site has a column page can be included in the complete, but some columns update original articles, also will not be included, after checking the site code to find that the page used Noindex label told Spiders not allowed to index this page, obviously protect this section of the code page will not be included, the updated content even if the quality of high will not be crawled. At the same time, the Nofollow label tells the spider this page link does not pass the weight, if the website page link all carries the nofollow, that this is obviously tells the search engine this page to have no value. If your station encountered can not be fully included, check the meta tags, whether there is a wrong label.
Third: The page has not been visited, especially for some large content-oriented Web sites, the content page of the website is very many, if the website is not very good to establish the chain, many pages may face is not included in the bottom of the phenomenon, most of these pages away from the depth of the home page, spiders can not crawl the page, Caused by spiders can not be included. or link to this page links are tagged with nofollow, do not pass any weight of the link. Because this cause of the site is not fully included in the wrong, for the construction of the site, it is best not to use nofollow tags on the site, JS code, or spiders can not recognize the steering and so on. For the depth of the Site page, webmaster can improve the site's internal chain building, not the site's page into an independent page, set up a good navigation and the chain, or to include the page to add outside the chain, improve the page in the search engine weight value.
Four: by spiders as the content of cheating, the site of a large number of use of black hat SEO, cheating methods to optimize the page, such a page spider is not to be included. Site on the long-term use of hidden text on the page on the accumulation of keywords, the site on the spider to retrieve the existence of hidden text and links, then your site is likely to be spiders removed from the index, will not appear in the search page again. Webmasters may ask what is the content of cheating, such as the original cheat method use and background color of the same text, piling up a lot of keywords, this cheating method is easy to detect; using the noscript tag, tell the browser when the page has JS, when the browser is closed, what should be displayed on the page, In fact, there are a lot of keywords piled up in the noscript tag, which is risky and can easily cause the content to be deleted.
V: Low quality content spam sites, in addition to the above mentioned 4 point optimization methods on the reasons, there is also an important can not be ignored is the page itself, the search engine recognition ability gradually strengthened, for non-high quality, not original content, spiders have a certain ability to identify, whether it is the content of their own web site duplication, Or the site from the external duplication of content, spiders can be a certain degree of recognition, for repeated content pages, spiders will not continue to put their own database, and even some of the low weight of the site, to delete the index operation. In the era of the Internet garbage bubble, to really do a full collection, the weight of the site is high, or insist to do high-quality content, so as to long-term survival in the industry.
Guangzhou part-time Bar (http://gz.jianzhi8.com) that even if the site weight is not high, as long as the site does not make the above 5 points of error, the site wants to be fully included or can, the Internet garbage bubble era will never end, but as long as the webmaster do not do as a member, Seriously do their own optimization, do not violate the rules of the search engine, to meet the updated algorithm, the site and long-term stability to survive.