Step 1 Data System Technology (5. Use gzip technology to optimize File Cache)

Last Update:2018-12-07 Source: Internet

Author: User

Tags xslt

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the previous article, we talked about the cache on the server. My Geographic Name Information System has 0.7 million pages, the process of generating this page requires me to load 5 pieces of SQL data returned from a remote database, as well as news in RSS format. Therefore, it is very difficult to generate a page, therefore, I used the server Cache Technology to significantly improve the performance. However, I encountered new problems with hard disk space and traffic.

Of course, 0.7 million place names on the website will not be accessed within one day, but the daily access volume of this topic on my website is about 30000. Considering repeated accesses, assume that only 10000 place names are accessed (in fact, because the access to the Place Names is scattered), 10000 cached files are generated, assuming that each file is only 10 KB (in fact, it is not that big), and on the first day, 300 MB of cache will be generated (in fact, my website has Mb on the first day ), after that, I increased at a speed of 50 m every day, and my website space was only 600 m. I once did not pay attention to the website for 10 days. As a result, hichina was shut down because the hard disk space station arrived at 2G.

In addition, as the traffic increases, the website traffic is also very high. In the first day of a month, the 50 GB traffic restricted by hichina is used to 20 GB, and the website traffic is still rising.

At this time, I thought of gzip.

Gzip is a technology that uses the compression technology on HTTP. It is very simple to say, that is, it allows the server to compress the file content and send it to the browser. After the browser accepts the content, it decompress the file before parsing it, in this case, the cost of network traffic is reduced, and there are a lot of text transmitted by the browser, while text compression can usually reach 20% of the original file, in addition, Gzip technology is currently supported by mainstream browsers, which saves traffic by consuming server performance.

However, my use does not consume server performance, but directly saves traffic, because when I am using it, files are directly stored in GZIP format during cache, therefore, when the client requests this file, I do not need to compress the file before sending it, but directly transfer it. In contrast, it even improves the server performance.

When I use it, when the cache file is generated, the text is directly compressed into GZIP format for storage. In this case, the size of my cached file changes from around 15 kb to 2-3 kb, which greatly saves the website space. When a user accesses this file, I will make a judgment, if the browser supports gzip, I will directly return the byte stream of the cached file to the browser. If the browser does not support it (this possibility is quite low), I need to decompress the file, return the extracted byte stream to the client.

As we can see from the above, I did save the website space and traffic, and because more than 95% of the current browsers (conservatively estimated, because I didn't hear which one is not supported) Support gzip, in fact, the performance of the website's server has been improved, which is known as the benefit of 51.

How can I determine whether the browser supports gzip? When sending an HTTP request, the browser will have a header field named accept-encoding, which represents the returned encoding format supported by the browser. Usually the following values (deflate stands for plaintext ):

Accept-encoding: gzip, deflate

How does the browser determine whether the data returned by the server is in plaintext or GZIP format? It also uses the head field of the server, for example:

Content-encoding: Gzip

If the value is not gzip, it indicates plain text, and this field may also represent the encoding of text files, such as UTF-8

The following describes how to use gzip. First, the write process of the cache:

Filestream = file. Create (PATH); // create a cache file
Stream writer = new gzipoutputstream (filestream); // writes cached file streams using Gzip
Xsltargumentlist list = new xsltargumentlist (); // my website uses XML + XSLT to generate pages.
XSLT. Transform (XML, list, writer, null); // transmits the content generated by XSLT to gzip for compression.Program
Writer. Close (); // finish writing
Filestream. Close (); // close the file

The second is the process of reading the cache:

If (acceptencoding! = NULL & acceptencoding. indexof ("gzip")> = 0) // determine whether Gzip is supported by the HTTP head of the request.
{// If Gzip is supported, GZIP format is returned.
Response. addheader ("content-encoding", "gzip"); // notify the browser that the file is compressed by gzip.
Response. transmitfile (PATH); // directly return the file
}
Else
{// Return the plaintext format. In this case, we need to extract the file content from the ZIP file without sending the content-encoding head, because the content is in plaintext by default.
Stream reader = new gzipinputstream (file. openread (PATH); // open the file
Byte [] buffer = new byte [1024]; // create a file read Cache
Int P;
While (P = reader. Read (buffer,)> 0) // The content is returned every time the corresponding cache is read.
{
Response. outputstream. Write (buffer, 0, P );
Response. Flush ();
}
}

The above is the whole process of implementation. Because I use XML + XSLT to implement pages, it seems smooth to use this technology, users returned by JSP and other technologies may have some trouble.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More