Absrtact: The proportion of the site is often a lot of optimization personnel to attach great importance to one of the indicators, the site is good or bad, fundamentally able to determine how much the site's traffic, after all, there will be ranked, there will be a ranking of the flow. But the site is included
The proportion of the site is often a lot of optimization personnel to attach great importance to one of the indicators, the site is good or bad, fundamentally able to determine how much the site's traffic, after all, there will be a ranking, there will be ranked only the flow. But the site is a lot of trouble to collect webmaster problems, a lot of webmaster desperately trying to do the station, but found that spiders do not favor their own site, included a few.
When the Webmaster distress site why not be included, should go to think, who is in the decision site included? The answer is obvious, is the search engine spiders. Since the search engine spiders are included in the decision, we should start with the working principle of spiders, in-depth to study, and then grasp the principles of spider work to formulate solutions to achieve the maximum collection of the site. Well, the nonsense does not say much, the following author will be simple to discuss with you.
Principle one: Crawling through the Web site links page
The reason why a search engine robot is called a spider is that its behavior is very similar to that of a spider. Spiders will crawl through web links on a website page, if a site does not have any link access, then spiders will not be able to do. Therefore, to achieve maximum site collection, the first step is to provide more for spiders, more closely linked to the entrance. The simplest way is to create more internal links for spiders, such as the author of a website is so, the author after each edit article will add one to two "read recommended" link for spiders to provide a crawling entrance, the following figure:
Principle two: According to the structure of the site to crawl inside page
When a spider looks for a crawling portal, it starts to take the next step-grabbing the page's content. However, it is important to note that spiders are not able to crawl the content of the site at once, it will be based on the structure of the site to crawl, that is, if the site's structure is unreasonable, will become a spider crawl page a stumbling block. Therefore, the stationmaster should from two aspects to solve the website internal structure question:
(1) Compact Flash and JS code. Baidu has also stated that spiders have excessive flash elements of the site is more difficult to crawl, so webmasters should try not to use flash on the site, even if you want to use the small size of the flash; For JS code is also so, too gorgeous JS function is actually unnecessary, This will only aggravate the spider's grip pressure, so it is a wise choice to remove or merge redundant JS.
(2) completely clear the site dead link. Site dead link generation is sometimes unavoidable, but if not timely attention to clean up, will become a spider crawl page a stumbling block. Webmasters must not be too troublesome, it is best to develop a good habit of checking every day, as long as a found dead link, it should be to the FTP delete, or to Baidu Webmaster platform to submit dead link, tell the spider This is a dead link, do not go crawling, so as to allow spiders to increase the degree of goodwill to your site.
Principle Three: Try index page based on content quality
The structure of the site if there is no big problem, spiders can generally smoothly crawl the page, and then do the next step of the work-index page content. This step is the most important, if the success of the index, then your site page content even if the success is included, and the Spider index page is the decisive factor is the content quality of the page. If a website page content clearance, or content repeat high will be easily rejected by spiders. Therefore, in order to allow spiders to successfully index our page, webmaster should focus on the content of the site construction, to achieve regular updates, even if not original to do the depth of false original, as far as possible to provide fresh content spiders. Of course, we can also use webmaster tools or spider log to observe the spider on our site index:
Principle four: After investigation and then issue the inner page
When the spider finished three steps above, and successfully indexed the page, then it can be said that our page content is really included, but you also do not be excited too early, because the collection does not equal to the page was released. Spiders have a working principle, that is, the index will not immediately release the content of the page, but will be selective inspection will be released, this period we do not have to be too tense, as long as we continue to do the content update, patience, do not make any big mistake, our page content can soon be released!
Spiders are just a code-programmed robot, its laws are always in the hands of people, so our site is not ideal time should be more to study the working principle of spiders, and their own summed up a number of laws to develop solutions to solve the problem, so that our web site to achieve the maximum collection.