In my website each address, unavoidably appears some with # The URL, is generally by clicks a link, jumps to the webpage the position, realizes the information content fast localization. This site is used in the article directory is done. But is this URL search engine with # included?

1, what is the hash value?

In Ruan Feng's blog There is an article can well explain the hash value, #值不是http请求, but a browser action. With #, you can quickly navigate to a specific location in a Web page. For example, will quickly navigate to id= "comment-121" or this location.

2, search engine will not crawl with # (hash value) of the URL

The answer is generally not. Search engine Crawl page first to follow the HTTP protocol, but # is not the content of the agreement. In fact, we have not seen the search results of search engines, which record can be quickly positioned in a location within the page. Therefore, it is unrealistic to want to quickly locate a search engine for the first time by adding a chain anchor with # inside and outside the site. Of course, search engine spiders in order to simulate real users, after entering the site, will be through some technology to simulate the effect of mouse clicks, this time, the page anchor link or function, but any links in the search results, will not bring #.

3, #! in the URL What's the effect?

This is a violation of the 2 exception, Google will crawl with #! URL. Google rules that if you want AJAX-generated content to be read by the browse engine, the URL can use "#!" (this URL generally does not produce a positioning effect on the General page), Google will automatically turn its subsequent content to the value of the query string _escaped_fragment_. For example,/#!/username is equivalent to/?_escaped_fragment_=/username, and the URL of the band will be crawled, so #! URL will be included in Google.

4, search engine will not crawl with the # (hash value) of the URL to give us the revelation

First, do not try to use robots.txt shielding # URL. I've turned over a bug where I've added the disallow:/*#* rule to the robots.txt of the swish network to try to block the URLs of these bands being crawled. But in fact, this approach is wrong, first # in the robots.txt is the annotation symbol, the content behind it will be commented out, so this rule becomes disallow:/*, that is, to prevent all pages included in this site, fortunately I found this morning, and immediately modified. Second, the search engine will not crawl the URL with #, so there is no need to add a rule like this.

Two, can use # and AJAX combination to hide not want to be crawled content. In some of our pages, there may be things we don't want to tell the search engine directly, or some privacy doesn't want to be crawled, so we can use # to control the display of that information. For example, we add a button that displays 123 of personal information when the URL is #show-info-123, without displaying it. For search engines, the URL with # is automatically ignored, so 123 of personal information is not crawled.

