How to optimize your site with HTTP caching (Chinese and English)

Source: Internet
Author: User
English version: how to optimize your site with HTTP caching

I 've been on a Web tweaking kick lately: how to speed up your JavaScript, Gzip files with your server, and now how to set up caching. But the reason is simple:Site performance is a feature.

For Web sites, speed may be feature #1.Users hate waiting, We get frustrated by buffering videos and pages that pop together as images slowly load. it's a jarring (aka Bad) user experience. time Investigation in site optimization is well worth it, so let's dive in.

What is caching?

Caching is a great example of the ubiquitous time-space tradeoff in programming. You canSave timeByUsing spaceTo store results.

In the case of websites, the browser can save a copy of images, stylesheets, JavaScript or the entire page. the next time the user needs that resource (such as a script or logo that appears on every page), the browser doesn't have to download it again.Fewer downloads means a faster, happier site.

Here's a quick refresher on how a web browser gets a page from the server:

1. Browser: Yo! You got index.html?
2. SERVER: (looking it up ...)
3. Sever: Totally, dude! It's right here!
4. Browser: That's rad, I'm downloading it now and showing the user.

(The actual HTTP protocol may have minor differences; see live HTTP headers for more details .)

Caching's uugly secret: It gets stale

Caching seems fun and easy. the browser saves a copy of a file (like a logo image) and uses this cached (saved) copy on each page that needs the logo. this avoids having to download the image ever again and is perfect, right?

Wrongo. What happens when the company logo changes? Amazon.com becomes nile.com? Google becomes quadrillion?

We 've got a problem. The shiny new logo needs to go with the shiny new site, caches be damned.

So even though the browser has the logo, it doesn't know whether the image can be used. After all, the file may have changed on the server and there cocould be an updated version.

So why bother caching if we can't be sure if the file is good? Luckily, there's a few ways to fix this problem.

Caching Method 1: Last-modified

One fix is for the server to tell the browserWhat versionOf the file it is sending. A server can returnLast-modifiedDate along with the file (let's call it logo.png), like this:

Last-modified: Fri, 16 Mar 2007 04:00:25 GMT
File Contents (could be an image, HTML, CSS, Javascript...)

Now the browser knows that the file it got (logo.png) was created on Mar 16 2007. The next time the browser needs logo.png, it can do a special check with the server:

1. Browser: Hey, Give me logo.png, but only if it's been modified since Mar 16,200 7.
2. SERVER: (checking the modification date)
3. SERVER: Hey, you're in luck! It wasNot modifiedSince that date. You have the latest version.
4. Browser: Great! I'll show the user the cached version.

Sending the short "not modified" message isLot fasterThan needing to download the file again, especially for giant JavaScript or image files. caching saves the day (err... The bandwidth ).

Caching Method 2: etag

Comparing versions with the modification time generally works, but cocould lead to problems. What if the server's clock was originally wrong and then got fixed? What if daylight savings time comes early and the server isn't updated? The caches cocould be inaccurate.

Etags to the rescue. an etag is a unique identifier given to every file. it's like a hash or fingerprint: every file gets a unique fingerprint, and if you change the file (even by one byte), the fingerprint changes as well.

Instead of sending back the modification time, the server can send back the etag (fingerprint ):

ETag: ead145f
File Contents (could be an image, HTML, CSS, Javascript...)

The etag can be any string which uniquely identifies the file. The next time the browser needs logo.png, it can have a conversation like this:

1. Browser: Can I get logo.png, if nothing matches tag "ead145f "?
2. SERVER: (checking fingerprint on logo.png)
3. SERVER: You're in luck! The version here is "ead145f". It wasNot modified.
4. Browser: score! I'll show the user my cached version.

Just like last-modifed,Etags solve the problem of comparing file versions, Could t that "If-None-match" is a bit harder to work into a sentence than "If-modified-since ". but that's my problem, not yours. etags work great.

Caching method 3: expires

Caching a file and checking with the server is nice, doesn't for one thing:We are still checking with the server.It's like analyzing your milk every time you make cereal to see whether it's safe to drink. sure, it's better than buying a new gallon each time, but it's not exactly wonderful.

And how do we handle this milk situation? WithExpiration date!

If we know when the milk (logo.png) expires, we keep using it until that date (and maybe a few days longer, if you're a college student ). as soon as it goes expires, we contact the server for a fresh copy, with a new expiration date. the header looks like this:

Expires: Tue, 20 Mar 2007 04:00:25 GMT
File Contents (could be an image, HTML, CSS, Javascript...)

In the meantime, we avoid even talking to the server if we're in the expiration period:

There isn't a conversation here; the browser has a monologue.

1. Browser: Self, is it before the expiration date of MAR 20,200 7? (Assume it is ).
2. Browser: Verily, I will show the user the cached version.

And that's that. The web server didn't have to do anything. The user sees the file instantly.

Caching Method 4: Max-age

Oh, we're not done yet. expires is great, but it has to be computed for every date.max-ageHeader lets us say "this file expires 1 week from today", which is simpler than setting an explicit date.

Max-age is measured in seconds. Here's a few quick second conversions:

  • 1 day in seconds = 86400
  • 1 week in seconds = 604800
  • 1 month in seconds = 2629000
  • 1 year in seconds = 31536000 (effectively infinite on Internet time)
Bonus header: public and private

The cache headers never cease. Sometimes a server needs to control when certain resources are cached.

  • Cache-control: publicMeans the cached version can be saved by proxies and other intermediate servers, where everyone can see it.
  • Cache-control: privateMeans the file is different for different users (such as their personal homepage). The user's private browser can cache it, but not public proxies.
  • Cache-control: no-cacheMeans the file shoshould not be cached. This is useful for things like search results where the URL appears the same but the content may change.

However, be wary that some cache directives only work on newer HTTP 1.1 browsers. If you are doing special caching of authenticated pages then read more about caching.

OK, I'm sold: enable caching

First, make sure Apache has mod_headers and mod_expires enabled:

 ... list your current modules... apachectl -t -D DUMP_MODULES ... enable headers and expires if not in the list above... a2enmod headers a2enmod expires 

The general format for setting headers is

  • File types to match
  • Header/expiration to set

A general tip: The less a resource changes (images, PDFs, etc .) the longer you shoshould cache it. if it never changes (every version has a different URL) Then cache it for as long as you can (I. e. A year )!

One technique: Have a loader file (index.html) which is not cached, but that knows the locations of the items which are cached permanently. the user will always get the loader file, but may have already cached the resources it points.

The following config settings are based on the ones at askapache.

Seconds Calculator

All the times are given in seconds (A0 = access + 0 seconds ).

Using expires Headers

 ExpiresActive On ExpiresDefault A0 # 1 YEAR - doesn't change often <FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav)$"> ExpiresDefault A29030400 </FilesMatch> # 1 WEEK - possible to be changed, unlikely <FilesMatch "\.(jpg|jpeg|png|gif|swf)$"> ExpiresDefault A604800 </FilesMatch> # 3 HOUR - core content, changes quickly <FilesMatch "\.(txt|xml|js|css)$"> ExpiresDefault A10800 </FilesMatch> 

Again, if you know certain content (like JavaScript) won't be changing often, have "JS" files expire after a week.

Using max-age Headers:

 # 1 YEAR <FilesMatch "\.(flv|ico|pdf|avi|mov|ppt|doc|mp3|wmv|wav)$"> Header set Cache-Control "max-age=29030400, public" </FilesMatch> # 1 WEEK <FilesMatch "\.(jpg|jpeg|png|gif|swf)$"> Header set Cache-Control "max-age=604800, public" </FilesMatch> # 3 HOUR <FilesMatch "\.(txt|xml|js|css)$"> Header set Cache-Control "max-age=10800" </FilesMatch> # NEVER CACHE - notice the extra directives <FilesMatch "\.(html|htm|php|cgi|pl)$"> Header set Cache-Control "max-age=0, private, no-store, no-cache, must-revalidate" </FilesMatch> 
Final step: Check Your caching

To see whether your files are cached, do the following:

  • Online: examine your site in the cacheability query (Green means cacheable)
  • In Browser: Use firebug or live HTTP headers to see the HTTP Response (304 not modified, cache-control, etc .). in particle, I'll load a page and use live httpheaders to make sure no packets are being sent to load images, logos, and other cached files. if you press Ctrl + refresh the browser will force a reload of all files.

Read more about caching, or the HTTP header fields. caching doesn't help with the initial download (that's what Gzip is for), but it makes the overall site experience much better.

Remember: creating unique URLs is the simplest way to caching heaven.Have fun streamlining your site!

 

Chinese version: Use http cache to optimize websites

 

For websites, speed is the first priority. Users always hate waiting. loading videos and pages is a terrible user experience. Therefore, how to use cache to optimize the website is worth further research.

 

What is cache?

 

Cache is an example of changing the time of Space everywhere. By using extra space, we can get a faster speed. When a user browses a website, the browser can locally Save copies of images or other files on the website, so that when the user accesses the website again, the browser no longer needs to download all files, reducing the download volume increases the page loading speed.

 

The following figure shows how the browser interacts with the server.

 

Disadvantages of caching

 

The cache is very useful, but it also brings some defects. When our website is updated, for example, if the logo is changed, the browser still saves the logo of the old version locally, so how does the browser determine whether to use a local file or a new file on the server? The following describes several judgment methods.

 

Caching Method 1: Last-modified

 

To notify the browser of the current file version, the server sends a tag of the last modification time, for example:

 

Last-modified: Fri, 16 Mar 2007 04:00:25 GMT

File Contents (cocould be an image, HTML, CSS, JavaScript ...)

 

 

In this way, the browser will know the time when the file was created. In subsequent requests, the browser will verify the file according to the following rules:

1. Browser: hey, logo.png. If it was modified after Fri, 16 Mar 2007 04:00:25 GMT, please send it to me.

2. SERVER: (check the file modification time)

3. SERVER: Hey, this file has not been modified since that time. You have the latest version.

4. Browser: Great. I will display it to the user.

 

In this case, the server only returns a 304 Response Header, which reduces the response data volume and increases the response speed.

 

Caching Method 2: etag

 

Generally, it is feasible to compare files by modifying the time. However, in some special cases, for example, if an error occurs in the server clock, the server clock is modified. After the arrival of the Daylight Saving Time (DST), the server time is not updated in time, these will cause the problem of comparing the file version by modifying the time.

 

Etag can be used to solve this problem. Etag is the unique identifier of a file. Like a hash or fingerprint, each file has a unique identifier. As long as the file changes, the identifier changes.

 

The etag tag returned by the server:

 

Etag: ead145f

File Contents (cocould be an image, HTML, CSS, JavaScript ...)

 

The following access sequence is shown in:

 

 

1. Browser: hey, I used the logo.png file to check whether the file does not match the "ead145f" string.

2. SERVER: (check etag ...)

3. SERVER: Hey, my version here is "ead145f". You are the latest version.

4. Browser: Well, you can use the local cache.

 

Like last-modified, etag solves the problem of file version comparison. However, the etag level is higher than the last-modified level.

 

Caching method 3: expires

 

Caching a file and verifying the version with the server is good, but there is still a disadvantage that we must connect to the server. Make a comparison before each use. This method is safe, but not the best. We can use expiration date to reduce such requests.

 

Just like we use milk to cook oatmeal, we need to check whether the milk is safe before each drink. But if we know the expiration time of the milk, we can use it directly before it expires, instead of sending it for inspection. Once the expiration time is exceeded, we can buy a new one. When the server returns, the expiration time of the data is:

 

Expires: Tue, 20 Mar 2007 04:00:25 GMT

File Contents (cocould be an image, HTML, CSS, JavaScript ...)

 

 

In this way, we can avoid the connection to the server before it expires. The browser only needs to determine whether the materials in the hands have expired, and does not need to increase the burden on the server.

 

Caching Method 4: Max-age

 

The expires method is good, but we have to calculate a precise time each time. The max-age tag makes it easier for us to process the expiration time. We only need to say that you can only use this document for one week.

 

Max-age is measured in seconds. Below are some common units:

1 days in seconds = 86400

1 week in seconds = 604800

1 month in seconds = 2629000

1 year in seconds = 31536000

 

Additional labels

 

Cache tags will never stop working, but sometimes we need to control the cached content.

 

Cache-control: Public indicates that the cached version can be identified by the proxy server or other intermediate servers.

Cache-control: Private indicates that the file is different for different users. Only users' browsers can cache data. Public proxy servers cannot cache data.

Cache-control: No-Cache means that the file content should not be cached. This is useful in search or flip results, because the corresponding content changes with the same URL.

 

Note: Some labels are only available in browsers that support HTTP/1.1. For more information, rfc2616 and cache docs are recommended.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.