Step 1 Data System Technology (5. Use gzip technology to optimize File Cache)

Source: Internet
Author: User
Tags xslt

In the previous article, we talked about the cache on the server. My Geographic Name Information System has 0.7 million pages, the process of generating this page requires me to load 5 pieces of SQL data returned from a remote database, as well as news in RSS format. Therefore, it is very difficult to generate a page, therefore, I used the server Cache Technology to significantly improve the performance. However, I encountered new problems with hard disk space and traffic.

Of course, 0.7 million place names on the website will not be accessed within one day, but the daily access volume of this topic on my website is about 30000. Considering repeated accesses, assume that only 10000 place names are accessed (in fact, because the access to the Place Names is scattered), 10000 cached files are generated, assuming that each file is only 10 KB (in fact, it is not that big), and on the first day, 300 MB of cache will be generated (in fact, my website has Mb on the first day ), after that, I increased at a speed of 50 m every day, and my website space was only 600 m. I once did not pay attention to the website for 10 days. As a result, hichina was shut down because the hard disk space station arrived at 2G.

In addition, as the traffic increases, the website traffic is also very high. In the first day of a month, the 50 GB traffic restricted by hichina is used to 20 GB, and the website traffic is still rising.

At this time, I thought of gzip.

Gzip is a technology that uses the compression technology on HTTP. It is very simple to say, that is, it allows the server to compress the file content and send it to the browser. After the browser accepts the content, it decompress the file before parsing it, in this case, the cost of network traffic is reduced, and there are a lot of text transmitted by the browser, while text compression can usually reach 20% of the original file, in addition, Gzip technology is currently supported by mainstream browsers, which saves traffic by consuming server performance.

However, my use does not consume server performance, but directly saves traffic, because when I am using it, files are directly stored in GZIP format during cache, therefore, when the client requests this file, I do not need to compress the file before sending it, but directly transfer it. In contrast, it even improves the server performance.

When I use it, when the cache file is generated, the text is directly compressed into GZIP format for storage. In this case, the size of my cached file changes from around 15 kb to 2-3 kb, which greatly saves the website space. When a user accesses this file, I will make a judgment, if the browser supports gzip, I will directly return the byte stream of the cached file to the browser. If the browser does not support it (this possibility is quite low), I need to decompress the file, return the extracted byte stream to the client.

As we can see from the above, I did save the website space and traffic, and because more than 95% of the current browsers (conservatively estimated, because I didn't hear which one is not supported) Support gzip, in fact, the performance of the website's server has been improved, which is known as the benefit of 51.

How can I determine whether the browser supports gzip? When sending an HTTP request, the browser will have a header field named accept-encoding, which represents the returned encoding format supported by the browser. Usually the following values (deflate stands for plaintext ):

Accept-encoding: gzip, deflate

How does the browser determine whether the data returned by the server is in plaintext or GZIP format? It also uses the head field of the server, for example:

Content-encoding: Gzip

If the value is not gzip, it indicates plain text, and this field may also represent the encoding of text files, such as UTF-8

The following describes how to use gzip. First, the write process of the cache:

Filestream = file. Create (PATH); // create a cache file
Stream writer = new gzipoutputstream (filestream); // writes cached file streams using Gzip
Xsltargumentlist list = new xsltargumentlist (); // my website uses XML + XSLT to generate pages.
XSLT. Transform (XML, list, writer, null); // transmits the content generated by XSLT to gzip for compression.Program
Writer. Close (); // finish writing
Filestream. Close (); // close the file

The second is the process of reading the cache:

If (acceptencoding! = NULL & acceptencoding. indexof ("gzip")> = 0) // determine whether Gzip is supported by the HTTP head of the request.
{// If Gzip is supported, GZIP format is returned.
Response. addheader ("content-encoding", "gzip"); // notify the browser that the file is compressed by gzip.
Response. transmitfile (PATH); // directly return the file
}
Else
{// Return the plaintext format. In this case, we need to extract the file content from the ZIP file without sending the content-encoding head, because the content is in plaintext by default.
Stream reader = new gzipinputstream (file. openread (PATH); // open the file
Byte [] buffer = new byte [1024]; // create a file read Cache
Int P;
While (P = reader. Read (buffer,)> 0) // The content is returned every time the corresponding cache is read.
{
Response. outputstream. Write (buffer, 0, P );
Response. Flush ();
}
}

The above is the whole process of implementation. Because I use XML + XSLT to implement pages, it seems smooth to use this technology, users returned by JSP and other technologies may have some trouble.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.