How to get your Ajax app content to crawl search engines

Source: Internet
Author: User
Tags creative commons attribution

This document outlines the steps is necessary in order to make your AJAX application crawlable. Once you has fully understood each of the these steps, it should not take your very long to actually make your application CRA wlable! However, need to understand, all of the steps involved, so we recommend reading this guide in its entirety.

Overview of Solution

Briefly, the solution works as Follows:the crawler finds a pretty AJAX URL (that's, a URL containing a #! hash fragment) . It then requests the content for this URL from your server in a slightly modified form. Your Web server returns the content in the form of a HTML snapshot, which is then processed by the crawler. The search results'll show the original URL.

Step-by-Step guide

1. Indicate to the crawler that your site supports the AJAX crawling scheme

The first step to getting your Ajax site indexed are to indicate to the crawler that your site supports the Ajax crawling s Cheme. The special token in your hash fragments (that's, everything after the # sign in a URL): hash Fragments has to begin with a exclamation mark. For example, if your AJAX app contains a URL like this:

Www.example.com/ajax.html#key=value
It should now become this:

Www.example.com/ajax.html#!key=value
When your site adopts the scheme, it'll be considered "AJAX crawlable." This means the crawler would see the content of the your app if your site supplies HTML snapshots.

2. Set up your server to handle requests for URLs that contain _escaped_fragment_

Suppose would like to get www.example.com/index.html#!key=value indexed. Your part of the agreement are to provide the crawler with a HTML snapshot of this URL, so that the crawler sees the Conte Nt. How would your server know when to return a HTML snapshot instead of a regular page? The answer is the URL of that was requested by the Crawler:the crawler would modify each AJAX URL such as

Www.example.com/ajax.html#!key=value
To temporarily become

Www.example.com/ajax.html?_escaped_fragment_=key=value
Wonder why the is necessary. There is very important reasons:

Hash fragments is never (by specification) sent to the server as part of the HTTP request. In other words, the crawler needs some-to-let your server know that it wants the content for the URL www.example.com/a Jax.html#!key=value (as opposed to simply www.example.com/ajax.html).
Your server, on the other hand, needs to know that it have to return a HTML snapshot, rather than the normal page sent to The browser. Remember:an HTML Snapshot is all the content of this appears on the page after the JavaScript have been executed. Your server ' s end of the agreement is to return the HTML snapshot for Www.example.com/index.html#!key=value (which is, the Original url!) to the crawler.
Note:the crawler escapes certain characters in the fragment during the transformation. To retrieve the original fragment, make sure to unescape all%XX characters in the fragment. More specifically,%26 should become &,%20 should become a space,%23 should become #, and%25 should become%, and s O on.

Now so you had your original URL back and you know what content the crawler are requesting, you need to produce an HTML Snapshot. How does that? There is various ways; Here is some of them:

If a lot of your content are produced with JavaScript, the want to use a headless browser such as htmlunit to obtain th e HTML Snapshot. Alternatively, you can use a different tool such as Crawljax or watij.com.
If Much of your content is produced with a server-side technology such as PHP or ASP, you can use your existing code a nd only replace the JavaScript portions of your Web page with static or Server-side created HTML.
You can create a static version of your pages offline, as are the current practice. For example, many applications draw content from a database and then rendered by the browser. Instead, you could create a separate HTML page for each AJAX URL.
It's highly recommended that's a try out your HTML snapshot mechanism. It's important to make sure so the Headless browser indeed renders the content of your application ' s state correctly. Surely you'll want to know what's the crawler would see, right? To does this, you can write a small test application and see the output, or can use a tool such as Fetch as Googlebot.

To summarize, make sure the following happens on your server:

A request URL of the form Www.example.com/ajax.html?_escaped_fragment_=key=value is mapped back to its original form:www. Example.com/ajax.html#!key=value.
The token is URL unescaped. The easiest-to-do-is-to-use URL decoding. For example, in Java you would does this:
Mydecodedfragment = Urldecoder.decode (myencodedfragment, "UTF-8");
An HTML snapshot are returned, ideally along with a prominent link at the top of the page, letting end users know that they There are reached the _escaped_fragment_ URL in error. (Remember that _escaped_fragment_ URLs is meant to being used only by crawlers.) For all requests, this does not has an _escaped_fragment_, the server would return content as before.

3. Handle pages without hash fragments

Some of your pages may not have a hash fragments. For example, the might want your home page to be www.example.com, rather than www.example.com#!home. For this reason, we had a special provision for pages without hash fragments.

Note:make sure you use this option only for pages that contain dynamic, ajax-created content. For pages that has only static content, it would not give extra information to the crawler, but it would put extra load o n your and Google ' s servers.

In order to do pages without hash fragments crawlable, you include a special meta tag in the head of the HTML of your PA Ge. The META tag takes the following form:
<meta name= "Fragment" content= "!" >
This indicates to the crawler, it should crawl the ugly version of this URL. As per the above agreement, the crawler would temporarily map the pretty URL to the corresponding ugly URL. In other words, if you place <meta name= "fragment" content= "!" > into the page www.example.com, the crawler would temporarily map this URL to Www.example.com?_escaped_fragment_= and W Ill request this from your server. Your server should then return the HTML snapshot corresponding to www.example.com. Please note the one important restriction applies to this meta-tag:the only valid content is "!". In other words, the META tag would always take the exact form: <meta name= "Fragment" content= "!" Which indicates a empty hash fragment, but a page with AJAX content.

4. Consider updating your Sitemap to list the new AJAX URLs

Crawlers use Sitemaps to complement their discovery crawl. Your Sitemap should include the version of Your URLs that you ' d prefer to has displayed in search results, so in most CAs Es it would be http://example.com/ajax.html#!key=value. Do not include links such as Http://example.com/ajax.html?_escaped_fragment_=key=value in the Sitemap. Googlebot does not follow links that contain _escaped_fragment_! If you have a entry page to your site, such as your homepage, so you would like displayed in search results without the #!, then add this URL to the Sitemaps as is. For instance, if you want this version displayed in search results:

http://example.com/
Then include

http://example.com/
In your sitemaps and make sure that <meta name= "fragment" content= "!" > is included in the head of the HTML document. For more information, check out our additional articles on Sitemaps.

5. Optionally, but importantly, test the crawlability of your app:see What's the crawler sees with "Fetch as Googlebot" .

Google provides a tool that would allow you to get an idea of the The crawler sees, Fetch as Googlebot. You should use this tool to see whether your implementation are correct and whether the bot can now see all the content of you Want a user to see. It is the also important to the use of this tool to ensure that your site was not cloaking.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and Co De samples is licensed under the Apache 2.0 License. For details, see our Site policies.

Last Updated June 18, 2014. Reference: https://developers.google.com/webmasters/ajax-crawling/docs/getting-startedhttps:// Developers.google.com/webmasters/ajax-crawling/docs/learn-more

How to get your Ajax app content to crawl search engines

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.