[Note] Experience Sharing: websites with high traffic can use static gzip to compress some pages and files

Source: Internet
Author: User
Tags gz file website performance
This article describes how to compress pages to save website bandwidth and increase user access speeds.
The website access speed is determined by multiple factors, such as the application response speed, network bandwidth, server performance, and network transmission speed between the client and the client. One of the most important factors
Su is the response speed of the application itself. So when you are worried about the website performance, the first thing you need to do is to increase the execution speed of the application as much as possible, you can use cache or optimized generation
Code execution efficiency to speed up applications.

However, this article does not describe how to improve the efficiency of application execution, but just to prevent you from having to go to medicine. Ensure that the performance of your application is good enough while the server is
If it can meet all requirements, try to compress the web page to further improve the browsing speed of the Web page. It is very important that it does not require any cost at all, however, the CPU usage of your server will be
Increase by only one or two percentage points.

Web page compression is a protocol that is shared by web servers and browsers. That is to say, web servers and browsers must support this technology. Fortunately, the popular browsers currently support this technology, including IE, Firefox, and opera. servers include Apache and IIS. The negotiation process is as follows:

First, the browser requests a URL address and sets the attribute accept-encoding value to gzip in the request header,
Deflate indicates that the browser supports gzip and deflate compression methods (in fact, deflate also uses Gzip
The compression protocol. The difference between the two is described below );
After receiving the request, the web server determines whether the browser supports compression. If yes, the compressed response content is transmitted; otherwise, the uncompressed content is transmitted;
After obtaining the response content, the browser determines whether the content is compressed. If yes, It decompress the content and then displays the content on the response page.
In actual application, we found that the compression ratio is usually 3 to 10 times, that is, the page with a size of 50 kb. After compression, the actual size of the transmitted content is only 5 to 15 kb.
This can greatly save the server's network bandwidth. If the application responds quickly enough, the website's speed bottleneck will be converted to the network's transmission speed, therefore, after the content is compressed, the page can be greatly upgraded.
Browsing speed.

Next we will introduce how to enable the web page compression function in several common environments.

Pure Tomcat server

If your web application runs under the Tomcat server and directly uses the HTTP
Service, it is recommended that you do it immediately, because it is too simple, you just need to add
The compression parameter is set to on and takes effect immediately after the Tomcat server is restarted. The configuration is as follows:

<Connector Port = "8080" protocol = "HTTP/1.1"
Maxthreads = "150" connectiontimeout = "20000"
Redirectport = "8443" compression = "on"/>

Tomcat adopts the HTTP/1.1 gzip compression protocol. It will check whether the value of accept-encoding in the request sent by the Browser contains
Gzip to determine whether the browser supports the gzip compression protocol. If the browser supports gzip compression, it enables gzip compression. Otherwise, no compression is performed. Tomcat
There is another parameter
Compressablemimetype. This parameter can be used to specify the type of content to be compressed. For example, you can specify the configuration values as text, HTML, and text.
/Plain, only the pages whose contenttype is text/html and text/plain are compressed.
JavaScript files are also included in the compressed file type, because the compression effects of these two files are also very obvious.

Apache server

In Apache 1.3, mod_gzip is commonly used to compress the output content. Currently, mainstream browsers support gzip decompression. In apache2
The module name is mod_deflate, and the corresponding module name is mod_deflate.so. Mod_gzip is not described in this article.
How to enable and configure the mod_deflate module in Apache 2. Apache installed by default, whether in Windows or
Linux/Unix does not enable this module. Linux/Unix does not even contain this module. You need to compile this module manually.

The following describes how to enable and configure the mod_deflate module in Windows and Linux.

In Windows, the Apache server installed with the installer already has the modules mod_deflate.so and mod_headers.so required by deflate. enable and configure the conf configuration file as follows:

Loadmodule deflate_module modules/mod_deflate.so
Loadmodule headers_module modules/mod_headers.so
<Location/>
# Insert filter
Setoutputfilter deflate
# Netscape 4.x has some problems...
Browsermatch ^ Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
Browsermatch ^ Mozilla/4/. 0 [678] No-Gzip
# MSIE masquerades as Netscape, but it is fine
# Browsermatch/bmsie! No-gzip! Gzip-only-text/html
# Note: Due to a bug in mod_setenvif up to Apache 2.0.48
# The above RegEx won't work. You can use the following
# Workaround to get the desired effect:
Browsermatch/bmsi [e]! No-gzip! Gzip-only-text/html
# Don't compress images
Setenvifnocase request_uri .(? : GIF | jpe? G | PNG) $ no-gzip dont-vary
# Make sure proxies don't deliver the wrong content
Header append vary User-Agent Env =! Dont-vary
</Location>

For Linux/UNIX operating systems, if you do not have the mod_deflate and mod_headers modules required during compilation and installation
If you compile it, it will be a little troublesome. First, let's first look at how to compile the two modules during the compilation and installation of Apache. Please execute configure
Two parameters are added to the program:

#./Configure -- enable-Deflate -- enable-headers

After compiling Apache, you can enable and configure the deflate module in httpd. conf. The configuration method is the same as that on Windows.

If your Apache is already running and you do not want to re-compile it, You can compile only the files required by the deflate module mod_deflate.c and
Mod_headers.c. These two files are located in the {APACHE-Src}/modules/filters/directory (where {APACHE-Src}
Directory of the Apache source file ). Use the following command to compile the two source files separately.

# {APACHE-bin}/apxs-I-a-c {APACHE-Src}/modules/filters/mod_deflate.c
# {APACHE-bin}/apxs-I-a-c {APACHE-Src}/modules/filters/mod_headers.c

{APACHE-bin} is the bin directory under the Apache installation directory. Configure the module directly in httpd. conf.

Many times you may encounter a compilation error when compiling the deflate module separately. The prompt is:

Cannot load/opt/Apache/modules/mod_deflate.so into server:/opt/Apache/modules/mod_deflate.so: Undefined Symbol: deflate

The solution is as follows:

Edit the/usr/local/apache2/bin/APR-config file and change the ldflags value to "-LZ". Then re-compile the mod_deflate module and apxs-ica mod_deflate.c.

To save unnecessary trouble, add the -- enable-Deflate -- enable-headers parameter directly during compilation and installation.

IIS server

Microsoft's IIS server is also one of the most widely used Web servers, and it is also essential to run ASP pages. IIS6 itself supports gzip compression, iis5 is more difficult, you can find some third-party components to deal with, such as httpzip, URL is: http://www.port80software.com/products/httpzip? Vid = 3354166, but it is billed. Next we will introduce how to enable the compression function in IIS6.

Open the Internet Information Service (IIS) manager, right-click "website"-> "properties", and select "service ". In the "HTTP compression" box, select "compressing Application Files" and "compressing static files", and set "temporary directory" and "maximum temporary directory limit" as needed, as shown in:

Figure 1 Set website attributes

Next, configure the gzip component. In the Internet Information Service (IIS) manager, click "Web Service extension"-> "to add a new Web
Service extension... ", enter the extension" HTTP compression "in the" New web service extension "box, and add" required file"
C:/Windows/system32/inetsrv/gzip. dll, and select "set extension status to allow", as shown in:

Figure 2 set web service extension

Figure 3 new Web Service Extension

Before modification, we need to modify a configuration file. Before modification, stop the IIS service and open C:/Windows/system32/inetsrv/metabase. XML. The file is large. Find the following information:

<Iiscompressionscheme location = "/lm/w3svc/filters/compression/gzip"
Hccompressiondll = "% WINDIR %/system32/inetsrv/gzip. dll"
Hccreateflags = "1"
Hcdodynamiccompression = "true"
Hcdoondemandcompression = "true"
Hcdostaticcompression = "true"
Hcdynamiccompressionlevel = "0"
Hcfileextensions = "htm
Html
TXT"
Hcondemandcomplevel = "10"
Hcpriority = "1"
Hcscriptfileextensions = "ASP
DLL
EXE"
>
</Iiscompressionscheme>

Add some file suffixes to be compressed. hcfileextensions is the extension of static files, and JS and CSS are added. hcscriptfileextensions is the extension of dynamic files, and aspx is added. After saving the files, start IIS to take effect.

Finally, we will introduce how to test whether the previous work is effective. You may find it strange that the configuration is complete. Open the page in a browser and check the page source code. The content has not changed, the size is also the same as the original one.
What's going on? This is because the browser has extracted the content and has two methods to determine whether the compression takes effect: first, view the Web server logs, whether it is Apache
Or IIS, the two access log formats are similar to the following format:

127.0.0.1--[14/May/2006: 08: 44: 28 + 0800] "Get/manual/style/CSS/manual.css HTTP/1.1" 200 19351

The last two digits are the HTTP result codes (200 indicates OK) and 19351 respectively.
Indicates the size of the response content. Compare the size of the response content with the size of the source code in your browser to check whether the response takes effect. Another method is to write an HTTP
The client Applet and set the value of accept-encoding to gzip and deflate. This program requests a URL on the server.
Address, and then print the response content. If there are a bunch of garbled characters, congratulations, the configuration is successful. The following is a test client code written in Java (used
Commons-httpclient package ):

Httpclient HTTP = new httpclient ();
String url = "http://www.dlog.cn/javayou ";
Getmethod get = new getmethod (URL );
Try {
System. Out. println ("fetching URL:" + URL );
Get. addrequestheader ("Accept-encoding", "gzip, deflate ");
Int ER = http.exe cutemethod (get );
If (ER = 200 ){
System. Out. println (get. getresponsecontentlength ());
String html = Get. getresponsebodyasstring ();
System. Out. println (HTML );
System. Out. println (html. getbytes (). Length );
}
} Finally {
Get. releaseconnection ();
}

Conclusion

The above are two popular web server software and Tomcat server page compression configuration methods. Other J2EE
If the application server does not support this function, you can use the servlet filter for processing. For specific code and configuration methods, see resin.
The document provided by the server. However, we need to remind you that the access described in this article is only implemented when the server's response speed is sufficiently optimized. That is to say, bandwidth becomes a bottleneck in the system.
You can only consider this solution.
Bytes --------------------------------------------------------------------------------------------------------------
Add another example and some related knowledge: Get/lookfor.htm HTTP/1.1
Accept :*/*
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; infopath.1;. Net CLR 2.0.50727;. Net CLR 3.0.04506.30)
HOST: www.veryhman.com
Connection: keep-alive

HTTP/1.0 200 OK
Date: Wed, 09 Jul 2008 13:03:01 GMT
Server: Apache
Last-modified: Wed, 09 Jul 2008 11:51:24 GMT
Content-Length: 98279
Content-location: search.gz
Content-encoding: Gzip
Content-Type: text/html
Age: 64
X-Cache: hit from www.veryhman.com
Connection: keep-alive

Many websites adopt Dynamic gzip compression, reducing traffic, but the website is still slow due to high system overhead. The solution described here adopts a pre-compressed static GZ file and adds HTTP header information for redirection.

It was enough, but some people seem not satisfied, so I want to add one more thing: to an application instance.

This solution focuses on static gzip compression, which must be used in conjunction with the browser cache mechanism. Compress the frequently used items to be loaded together and put them into the browser cache when loading for the first time. The locally cached content will be frequently used in the future.

A cartoon search engine: http://www.veryhman.com/lookfor.htm

One of the scripts is relatively large, so Gzip is compressed according to the above idea. In order to make full use of the browser cache, add the head header mark expires to this file and set it as a future
Time (note: the date specified for 'expires' cannot be earlier than January 1, 1980, or later than January 19, 2038, 3:14:07
GMT. Otherwise, some systems may encounter errors .), Then set several other headers according to the red letter above.
Content-Length: {GZ file size (number of bytes )}
Content-location: {GZ file name}
Content-encoding: Gzip
Content-Type: Application/X-JavaScript (because it is a script, most browsers also support text/html)

Unless you press F5 to force the refresh, or clear the browser cache, the cached content will be preferentially used each time the browser accesses this file before the cache expires (as for how to do it)
Delayed update, that is, when the file needs to be updated without being blocked by the cache, it is another mature problem that people have already been resolved. In this example, I use the simplest and most straightforward method to include the version number in the file name.
.).

In addition, when a browser accesses a file that is not cached from the server according to the HTTP/1.1 protocol, the server returns a last-modified header indicating the last update time of the file, at the same time, an etag (entity tags, entity identifier) header is returned:

Basic knowledge
1) What is "last-modified "?

When the browser requests a URL for the first time, the server returns 200, and the content is the resource you requested, at the same time, there is a last-modified attribute to mark the last modification time of this file on the service end. The format is similar to this:

Last-modified: Fri, 12 May 2006 18:53:33 GMT

When the client requests this URL for the second time, the browser will send the IF-modified-since or unless-modified-since header to the server according to the HTTP protocol, check whether the file has been modified after this time:

If-modified-since: Fri, 12 May 2006 18:53:33 GMT
Unless-modified-since: Thu, 27 Dec 2007 02:35:45 GMT

If the server resources remain unchanged, HTTP 304 (not
Changed.) status code. The content is empty, which saves the amount of data transmitted. When the server code changes or the server is restarted, the resource is re-issued, and the returned result is similar to the first request. Thus
Ensure that resources are not repeatedly sent to the client, and that the client can obtain the latest resources when the server changes.

2) What is "etag "?

The HTTP specification defines etag as the object Value of the requested variable ".
In other words, etag is a token that can be associated with web resources ). A typical web resource can be a Web page, but it may also be a JSON or XML document. Server ticket
It is solely responsible for determining what a mark is and its meaning, and transmitting it to the client in the HTTP Response Header. The following is the format returned by the server:

Etag: "50b1c1d4f775c61: df3"

The query update format of the client is as follows:

If-None-Match: W/"50b1c1d4f775c61: df3"

If etag does not change, status 304 is returned and no result is returned, which is the same as last-modified. Etag is mainly useful for resumable download.

How does last-modified and etags help improve performance?
Smart developers will
It is used with the HTTP header of the etags request, so that the cache of the client (such as the browser) can be used. Because the server first generates
Last-modified/etag mark. The server can use it later to determine whether the page has been modified. Essentially, the client requests the server to verify the token by sending it back to the server.
User) cache.
The process is as follows:
1. The client requests a page ().
2. The server returns to page A and adds a last-modified/etag to page.
3. The client displays the page and caches the page together with last-modified/etag.
4. The customer requests page a again and passes the last-modified/etag returned by the server in the last request to the server.
5. The server checks the last-modified or etag and determines that the page has not been modified since the last client request. The server returns the response 304 and an empty response body.

Some additional knowledge about etag
Http://bbs.chinaunix.net/archiver/tid-1186771.html

In our example, etag is not used at all (this can further reduce the overhead of the server, because it consumes a little resource to calculate the number of etags). Only last-
Modified. In fact, in this example, due to the use of front-end techniques to control file version updates, we do not even have to honestly check the IF-modified-since time,
If-modified-since exists in the Request Header, it indicates that this file exists in the browser cache and the 304 not modified message is returned directly.
Bytes -------------------------------------------------------------------------------------------------------------

Code:

GET /lookfor.htm HTTP/1.1
Accept: */*
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
.NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)
Host: www.veryhman.com
Connection: Keep-Alive
HTTP/1.1 200 OK
Date: Tue, 15 Jul 2008 01:03:29 GMT
Content-Length: 3348
Content-Type: text/html
Content-Location: http://www.veryhman.com/lookfor.htm
Last-Modified: Mon, 14 Jul 2008 10:45:40 GMT
Accept-Ranges: bytes
ETag: "ae2e8c29ee5c81:2daef"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Have you changed the server ???
How can I get it to IIS/6.0? Instead of Apache? And gzip compression is not declared...

Only when the JS is obtained.

Http://www.veryhman.com/search20080713.asp

It is declared that gzip compression is used, and what you said is not used.Copy content to clipboard
Code:

Content-Location: search.gz

Instead of redirecting it... It is directly declared. The current page uses gzip.Copy content to clipboard
Code:

GET /search20080713.asp HTTP/1.1
Accept: */*
Referer: http://www.veryhman.com/lookfor.htm
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
.NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)
Host: www.veryhman.com
Connection: Keep-Alive
HTTP/1.1 200 OK
Cache-Control: private
Connection: close
Date: Tue, 15 Jul 2008 01:03:29 GMT
Content-Type: application/x-javascript
Content-Encoding: gzip
Expires: Thu, 31 Dec 2037 15:59:58 GMT
Last-Modified: Sun, 27 Apr 2008 16:50:52 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Set-Cookie: ASPSESSIONIDQCQSDRRR=OPOFNGNBGFFNKGDLEFHNHGBE; path=/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.