Introduction to using ETags to reduce Web application bandwidth and load
Recently, the public has been strongly interested in the REST-style application architecture, indicating that the elegant design of the Web has begun to attract attention. Now, we gradually understand the scalability and elasticity inherent in the Architecture of the World Wide Web, and further explore ways to use its paradigm. In this article, we will explore a little-known tool that can be used by Web developers. The unnoticeable "ETag Response Header )", and how to integrate it into dynamic Web applications based on Spring and Hibernate to improve application performance and scalability.
The Spring Framework application we will use is based on petclinic. The downloaded file contains instructions on how to add necessary configurations and source code. you can try it on your own.
What is "ETag "?
The HTTP specification defines ETag as the entity value of the requested variable (see https://www.php1.cn/-- Chapter 14.19 ). In other words, ETag is a token that can be associated with Web resources ). A typical Web resource can be a Web page, but it may also be a JSON or XML document. The server is separately responsible for determining what the mark is and its meaning, and transmitting it to the client in the HTTP response header.
How does ETag help improve performance?
Smart Server developers will use ETags together with the "If-None-Match" header of the GET request, so that the cache of the client (such as the browser) can be used. Because the server generates ETag first, the server can use it later to determine whether the page has been modified. Essentially, the client sends this mark back to the server and requires the server to verify its (client) cache.
The process is as follows:
The client requests A page ().
The server returns to page A and adds an ETag to page.
The client displays the page and caches the page together with ETag.
The customer requests page A again and passes the ETag returned by the server in the last request to the server.
The server checks the ETag and determines that the page has Not been Modified since the last client request. the server returns the response 304 (Not Modified -- Not Modified) and an empty response body.
The rest of this article will show two ways to use ETag in Spring-based Web applications, which use Spring MVC. First, we will use the Servlet 2.3 Filter and the MD5 checksum of the display view (rendered view) to generate the ETag (implemented by a "simple" ETag ). The second method uses a more complex method to track the model used in the view to determine the ETag validity (an "in-depth" ETag implementation ). Although we use Spring MVC, this technology can be applied to any MVC-style Web framework.
Before we proceed, we will emphasize the technologies that are presented here to improve the dynamic generation of page performance. The existing optimization technology should also be considered as part of the overall optimization and application performance characteristics adjustment analysis. (See below ).
Top-down Web cache
This article mainly involves using HTTP cache technology for dynamically generated pages. To improve the performance of Web applications, we should adopt a holistic, top-down approach. For this purpose, it is very important to understand the layers through which HTTP requests pass. the appropriate technologies to be applied depend on the hot spots you are concerned about. For example:
Use Apache as the front-end of the Servlet container to process static files such as slices and javascript scripts. you can also use the FileETag command to create the ETag response header.
Use optimization techniques for javascript files, such as merging multiple files into one file and compressing spaces.
Use GZip and Cache Control headers ).
To determine the pain points of your Spring framework application, consider using jamonjavascemonitorinterceptor.
Make sure that you fully utilize the cache mechanism of the ORM tool, so objects do not need to be frequently regenerated from the database. It is worthwhile to take the time to determine how to make the query cache work for you.
Make sure youMinimize the amount of data obtained in the database, Especially the large list. If each page only requests a small subset of the large list, the data of the large list should be obtained once by a page.
Minimize the amount of data put into the HTTP session. In this way, the memory is released, and it will be helpful when the application cluster is deployed.
UseDatabase profiling)Tool to check what indexes are used during the query. during the update, the entire table is not locked.
Of course, the best saying of application performance optimization is: two measurements, one cut (measure twice, cut once ). Oh, wait. this is for woodworking! That's right, but it applies here too!
ETag Filter content body
The first method we need to consider is to create a Servlet Filter that generates its ETag mark based on the page ("View" in MVC) content. At first glance, any performance improvement obtained by using this method seems to be contrary to intuition. We still have to generate pages and increase the computing time for generating tags. However, the idea here is to reduce bandwidth usage. In the case of a large response time, such as the distribution of your host and client on both ends of the planet, this is largely beneficial. I have seen an application hosted on a server in the Tokyo office in New York. its response time is 350 ms. As the number of concurrent users increases, this will become a huge bottleneck.
Code
The technology we use to generate a mark is to calculate the MD5 hash value from the page content. This is achieved by creating a package on top of the response. The package uses byte arrays to store the generated content. after processing the filter chain, we use the MD5 hash value of the array to calculate the mark.
The implementation of the doFilter method is as follows.
public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException { HttpServletRequest servletRequest = (HttpServletRequest) req; HttpServletResponse servletResponse = (HttpServletResponse) res; ByteArrayOutputStream baos = new ByteArrayOutputStream(); ETagResponseWrapper wrappedResponse = new ETagResponseWrapper(servletResponse, baos); chain.doFilter(servletRequest, wrappedResponse); byte[] bytes = baos.toByteArray(); String token = '"' + ETagComputeUtils.getMd5Digest(bytes) + '"'; servletResponse.setHeader("ETag", token); // always store the ETag in the header String previousToken = servletRequest.getHeader("If-None-Match");if (previousToken != null && previousToken.equals(token)) { // compare previous token with current one logger.debug("ETag match: returning 304 Not Modified"); servletResponse.sendError(HttpServletResponse.SC_NOT_MODIFIED);// use the same date we sent when we created the ETag the first time through servletResponse.setHeader("Last-Modified", servletRequest.getHeader("If-Modified-Since")); } else { // first time through - set last modified time to now Calendar cal = Calendar.getInstance(); cal.set(Calendar.MILLISECOND, 0);Date lastModified = cal.getTime(); servletResponse.setDateHeader("Last-Modified", lastModified.getTime()); logger.debug("Writing body content"); servletResponse.setContentLength(bytes.length); ServletOutputStream sos = servletResponse.getOutputStream(); sos.write(bytes); sos.flush(); sos.close(); } }
Listing 1: ETagContentFilter. doFilter
You need to note that we have also set the Last-Modified header. This is considered to be the correct form of content produced for the server because it caters to clients that do not know the ETag header.
The following example uses the tool class EtagComputeUtils to generate the byte array corresponding to the object and process the MD5 digest logic. I used javax. security MessageDigest to calculate the MD5 hash code.
[] serialize( obj) {[] byteArray = ; baos = ; out = ; try {. baos = (); out = (baos); out.writeObject(obj); byteArray = baos.toByteArray(); } { (out != ) { out.close(); } } byteArray; } String getMd5Digest([] bytes) { MessageDigest md; { md = MessageDigest.getInstance(""); } (NoSuchAlgorithmException e) { RuntimeException(, e); }[] messageDigest = md.digest(bytes); BigInteger number = BigInteger(1, messageDigest); StringBuffer sb = StringBuffer(''); sb.append(number.toString(16)); sb.toString(); }
Listing 2: ETagComputeUtils
Configure filter directly in web. xml.
ETag Content Filter
org.springframework.samples.petclinic.web.ETagContentFilter
ETag Content Filter
/*.htm
Listing 3: Configure filter in web. xml.
Each .htm file is filtered by EtagContentFilter. if the page has not changed since the last client request, it returns an HTTP response with an empty content body.
The methods we present here are useful for specific types of pages. However, this method has two disadvantages:
We calculate ETag after the page has been displayed on the server, but before returning to the client. If Etag matches, you do not need to load the data for the model, because the page to be displayed does not need to be sent back to the client.
For pages similar to displaying the date and time in the footer, even if the content does not actually change, each page will be different.
ETag Interceptor (Interceptor)
Spring MVC's HTTP request processing path includes the ability to Interceptor in front of a controller, so it has the opportunity to process requests. This is an ideal place to use our ETag's comparative logic. Therefore, if we find that the data for building a page has not changed, we can avoid further processing.
How do you know that the page data has changed? To achieve the purpose of this article, I created a simple ModifiedObjectTracker, which clearly knows the insert, update, and delete operations through the Hibernate event listener. This tracker maintains a unique number for each view of the application, and a Hibernate ing about which Hibernate entities affect each view. Every time a POJO is changed, the counter that uses the view of the object is added with 1. We use this count value as the ETag, so that when the client sends the ETag back, we will know whether one or more objects on the page have been modified.
Code
Let's start with ModifiedObjectTracker:
ModifiedObjectTracker { notifyModified(> entity); }
Is it simple enough? This implementation is even more interesting. When an object changes at any time, we update the counters of each view affected by it:
notifyModified( entity) { views = getEntityViewMap().get(entity); (views == ) {; // } (counts) { ( view : views) { count = counts.get(view); counts.put(view, ++count); } } }
A "change" is INSERT, UPDATE, or delete. Here is the processor that listens for the delete operation (configured as the event listener on Hibernate 3 LocalSessionFactoryBean ):
DeleteHandler DefaultDeleteEventListener { ModifiedObjectTracker tracker; onDelete(DeleteEvent event) HibernateException { getModifiedObjectTracker().notifyModified(event.getEntityName()); } ModifiedObjectTracker getModifiedObjectTracker() { tracker; } setModifiedObjectTracker(ModifiedObjectTracker tracker) {.tracker = tracker; } }
ModifiedObjectTracker is injected to DeleteHandler through Spring configuration. There is also a SaveOrUpdateHandler to process new or updated POJO.
If the client sends a valid ETag (meaning that our content has not changed since the last request), we will block more processing to improve our performance. In Spring MVC, we can use HandlerInterceptorAdaptor and overwrite the preHandle method:
preHandle(HttpServletRequest request, HttpServletResponse response, handler) ServletException, { method = request.getMethod(); if (!"GET".equals(method)); previousToken = request.getHeader(""); token = getTokenFactory().getToken(request); ((token != ) && (previousToken != null && previousToken.equals( + token + ))) { response.sendError(HttpServletResponse.SC_NOT_MODIFIED); response.setHeader("", request.getHeader("")); } (token != null) { response.setHeader(, + token + ); cal = .getInstance(); cal.set(.MILLISECOND, 0); lastModified = cal.getTime(); response.setDateHeader("", lastModified.getTime()); }; }
First, we are sure that we are processing GET requests (the ETag with PUT can be used to detect inconsistent updates, but it is beyond the scope of this article .). If the mark matches the one we sent last time, we return a "304 not modified" response and a "short circuit" request to process the rest of the chain. Otherwise, we set the ETag response header to prepare for the next client request.
You need to note that we extract the generate Mark logic into an interface, so that we can insert different implementations. This interface has a method:
ETagTokenFactory { getToken( request); }
To minimize the code list, the simple implementation of SampleTokenFactory also plays the role of ETagTokenFactory. In this example, we generate a mark by simply returning the change count value of the request URI:
getToken( request) { view = request.getRequestURI(); count = counts.get(view); (count == null) { null; } count.toString(); }
Success!
Session
Here, if nothing changes, our interceptor will block any overhead for collecting data or displaying views. Now let's take a look at the HTTP header (via LiveHTTPHeaders) to see what happened. The downloaded file contains descriptions about how to configure the interceptor, because owner.htm "can use ETag ":
The first request we initiated indicates that the user has read this page:
---------------------------------------------------------- GET /petclinic/owner.htm?ownerId=10 HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8 X-lori-time-1: 1182364348062 If-Modified-Since: Wed, 20 Jun 2007 18:29:03 GMT If-None-Match: "-1" HTTP/1.x 304 Not Modified Server: Apache-Coyote/1.1 Date: Wed, 20 Jun 2007 18:32:30 GMT
Now we should make some modifications to see if ETag has changed. We add a pet to the owner:
---------------------------------------------------------- GET /petclinic/addPet.htm?ownerId=10 HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.php1.cn/:8080/petclinic/owner.htm?ownerId=10 Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8 X-lori-time-1: 1182364356265 HTTP/1.x 200 OK Server: Apache-Coyote/1.1 Pragma: No-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT Cache-Control: no-cache, no-store Content-Type: text/html;charset=ISO-8859-1 Content-Language: en-US Content-Length: 2174 Date: Wed, 20 Jun 2007 18:32:57 GMT ---------------------------------------------------------- POST /petclinic/addPet.htm?ownerId=10 HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.php1.cn/:8080/petclinic/addPet.htm?ownerId=10 Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8 X-lori-time-1: 1182364402968 Content-Type: application/x-www-form-urlencoded Content-Length: 40 name=Noddy&birthDate=1000-11-11&typeId=5 HTTP/1.x 302 Moved Temporarily Server: Apache-Coyote/1.1 Pragma: No-cache Expires: Thu, 01 Jan 1970 00:00:00 GMT Cache-Control: no-cache, no-store Location: https://www.php1.cn/:8080/petclinic/owner.htm?ownerId=10 Content-Language: en-US Content-Length: 0 Date: Wed, 20 Jun 2007 18:33:23 GMT
For addpet.htm, we have not configured any known ETag or header information. Now, we can view the owner with id 10 again. Note that ETag is 1 at this time:
---------------------------------------------------------- GET /petclinic/owner.htm?ownerId=10 HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: https://www.php1.cn/:8080/petclinic/addPet.htm?ownerId=10 Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8 X-lori-time-1: 1182364403109 If-Modified-Since: Wed, 20 Jun 2007 18:29:03 GMT If-None-Match: "-1" HTTP/1.x 200 OK Server: Apache-Coyote/1.1 Etag: "1" Last-Modified: Wed, 20 Jun 2007 18:33:36 GMT Content-Type: text/html;charset=ISO-8859-1 Content-Language: en-US Content-Length: 4317 Date: Wed, 20 Jun 2007 18:33:45 GMT
Finally, we can view the owner with id 10 again. This time our ETag hits, and we get a "304 not modified" response:
---------------------------------------------------------- https://www.php1.cn/:8080/petclinic/owner.htm?ownerId=10 GET /petclinic/owner.htm?ownerId=10 HTTP/1.1 Host: localhost:8080 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: JSESSIONID=13D2E0CB63897F4EDB56639E46D2BBD8 X-lori-time-1: 1182364493500 If-Modified-Since: Wed, 20 Jun 2007 18:33:36 GMT If-None-Match: "1" HTTP/1.x 304 Not Modified Server: Apache-Coyote/1.1 Date: Wed, 20 Jun 2007 18:34:55 GMT
We have used the HTTP cache to save bandwidth and computing time!
The Fine Print): In practice, we can trace object changes in a more fine-grained manner to achieve a greater effect, for example, using object id. However, the idea of associating a modified object to a view is highly dependent on the overall data model design of the application. The implementation here (ModifiedObjectTracker) is descriptive and intended to provide ideas for more exploration. It is not intended for use in the production environment (for example, it is not stable to use in the cluster ). An optional deeper consideration is to use a database trigger to track changes and allow the interceptor to access the table written by the trigger.
Conclusion
As Newton (Isaac Newton) said: "If I see it farther, it is because I am standing on the shoulders of giants ." The core of REST-style applications is simple, good software design, and never re-invent the wheel. I believe that with the increase in usage and popularity, the REST-style architecture based on Web applications will benefit the migration of mainstream application development. I hope it will play a greater role in my future projects.
About the Author
Gavin Terrill is the chief technical executive officer of BPS. Gavin has more than 20 years of software development history, good at Enterprise Java applications, but still refuse to throw his TRS-80. In his spare time, Gavin enjoys sailing, fishing, playing guitar, and tasting wine (in no particular order ).
Thanks
I would like to thank my colleagues Patrick Bourke and Erick Dorvale for their help and their feedback on this article.
The code and instructions can be downloaded from here.
The above section describes how to use ETags to reduce the bandwidth and load of Web applications. For more information, see php Chinese website (www.php1.cn )!