How the cache technology in Java is implemented

Last Update:2015-12-25 Source: Internet

Author: User

Tags jboss

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Why does the cache exist?
2 where can the cache exist?
3 What are the properties of the cache?
4 Cache Media?

To figure out these 4 questions, we can randomly use the application scenario to determine what kind of cache is being used.

1. Why does the cache exist?
In general, a website, or an application, its general form is that the browser requests the application server, the application server does a bunch of calculations and then requests the database, the database receives the request and then makes a pile of calculations to return the data to the application server, The application server returns the data to the browser after a heap of calculations. This is a standard process. But with the popularization of the Internet, more and more people online, more and more information on the Internet, in these two more and more cases, Our applications need to support more and more of the concurrency. Then our application server and database server do more and more calculations, but often our application server resources are limited, the number of requests per second of the database is also limited (who called our hard drive speed is limited). If you use limited resources to provide as much throughput as possible, one way: Reduce the amount of compute, shorten the request process (reduce network IO or hard disk IO), and this time the cache can be great. The basic principle of caching is to break the standard process depicted in this standard process, in which any A link can be cut off. Requests can be taken from the cache and returned directly to the data. This saves time, improves responsiveness, and saves hardware resources. We have limited hardware resources to serve more users.

2 where can the cache exist?
Java code

Browser---?---? app-between your browser and app? database

Browser---?---? app-between your browser and app? database

In, we can see the general flow of a request, and let's redraw this diagram to make our structure a little bit more complicated.
(Layering apps)
Browser---?---? app-between your browser and app? database

In theory, any part of the request is where the cache can function. The first link, the browser, if the data exists on the browser, then the speed is the fastest for the user, because this time there is no need for network requests. The second link, between the browser and the app, if the cache is added to this place, Then the cache is transparent to the app. and the cache is full of pages. The third node, the app itself has a few levels, then the cache can be placed on different levels, this part is the situation or the scene is more complex parts. You need to be cautious when choosing a cache. Fourth link, There can also be caches in the database, such as MySQL's Querycache.

So that means we can cache at any point in the entire request process. But all the data can be put into the cache. Of course not, the data needs to be put into the cache always have some characteristics, to clearly determine whether the data can be cached, how can be cached must be from the data change characteristics.

What are the characteristics of the data? The simplest is two, changing and invariant. As we all know, data that doesn't change doesn't need to be calculated every time. The question is whether all the data in theory will change, and change is the eternal theme of the world. That is, it is wrong to divide the data into different and unchanging So let's add one more condition: time. Then we can summarize the data characteristics into a period of time or unchanged. Then, based on this data feature, we can cache the data in the appropriate location and in the appropriate cache type.

3 What properties are cached
From an object-oriented point of view, the cache is an object, then the object, there must be attributes. So let's look at the properties of the cache. The following is a list of the 3 properties that we commonly use.
(1) Hit rate
The hit rate is the ratio of the number of requests to cache and the number of times the cache returns the correct result. The higher the scale, the higher the cache usage.

The hit rate problem is a very important issue in the cache, and we all want to hit 100% of our cache, but it often backfired, and the cache hit rate is an important metric to measure cache effectiveness.

(2) Maximum element
The maximum number of elements that can be stored in the cache, once the number of elements in the cache exceeds this value, then the cache emptying policy will be used, according to different scenarios reasonable setting the maximum element value can often increase the cache hit ratio to some extent. Thus more efficient when caching.

(3) Clear policy

1 FIFO, first on first out, the data that goes into the cache is initially cleared when there is not enough cache space (when the maximum element limit is exceeded)
2 LFU, less frequently used, the least-used elements will be cleared away. This requires that the cached element has a hit attribute, and that the minimum hits value will be cleared out of cache if the cache space is insufficient.
2 LRU, Least Recently used, the least recently used, the cached element has a timestamp, and when the cache is full, and you need to make room to cache the new element, the element in the existing cache element that has the longest timestamp from the current time is cleared out of the cache.

4 Cache Media
From the hardware media will be nothing but two, memory and hard disk (corresponding to the application layer of the program to consider the register and so on). But often we do not partition from the hardware, the General division method is from the technical division, can be divided into several, memory, hard disk files. Database.
(1) memory. Placing the cache in memory is the fastest option, and any program that directly operates the memory is much faster than operating the hard drive, but if your data takes into account the break-down problem, because the data that is placed in memory we call data without persistent words, if there is no backup on the hard disk, the machine is down after the machine , difficult or impossible to recover.

(2) hard drive. In general, many cache frameworks combine memory and hard disks, such as allocating memory to a full amount of space, allowing the user to choose to persist data that needs to exit the memory space to the hard disk. Of course also choose to put the data directly to the hard disk (in memory, a copy of the hard disk, down machine is not afraid) There are other caches that directly put the data on the hard drive.

(3) database. When it comes to databases, there may be people who would like to think, before talking about reducing the number of database queries, reducing the pressure on database computing, now how to use the database as a cache of media. This is because there are many types of databases, such as Berkleydb, which do not support SQL statements. There is no SQL engine, just the storage structure of key and value, so the speed is very fast, on the contemporary general PC, more than 10 W times per second query is no problem (of course, this is based on the business characteristics of the decision, if you access the data is evenly distributed, then Ahuaxuan can not guarantee this Speed).

In addition to cache media, the Ahuaxuan divides it into local cache and remote cache based on the degree of coupling between cache and application.
The Local cache is a cache component that is included in the app. While remote cache refers to and applies cached components that are decoupled from the application. The typical local cache has ehcache,oscache, while remote The cache has a famous memcached.

The biggest advantage of LocalCache is that the application and the cache are within the same process, the request cache is very fast, no network overhead is required, and so on. So the single application, no cluster or cluster in the case of the cache node without the need for mutual notification of the use of the local The cache is more appropriate. This is why Ehcache and Oscache in Java are so popular.
However, the local cache has some drawbacks, generally this caching framework (such as Ehcache or Oscache in Java) is the local cache. That is, when you follow the application, multiple applications cannot share the cache directly. The problem is more obvious when the cluster is applied, and of course, some cache components provide the ability of the cluster nodes to notify each other of cache updates, but because this is a broadcast or loop update, the network IO overhead is very high when the cache is updated frequently. Serious time can affect the normal operation of the application. And if the amount of data in the cache is larger, using localcache means that each application has a cache of that size, which is definitely a waste of memory.

So in this case, often we will choose the remote cache, such as memcached. In this cluster or distributed scenario, each application can share data from the memcached, both through sockets and tcp/-based The memcached protocol on the upper layer of the IP protocol connects directly to Memcached, and one app updates the values in memcached, all of which have the latest value. Even though it's a lot more overhead on the internet, However, this is often more common than the LocalCache broadcast or loop update cache node, and the performance is higher than the latter. Because the data only needs to be saved one copy, it also improves memory utilization.

As you can see from the above analysis, both the local cache and the remote cache have their own place in the caching domain. So Ahuaxuan recommends that when choosing or using a cache, be sure to determine exactly which cache to use based on the characteristics of the cache and our business scenario. This will give you full play to the functionality of the cache.

Ahuaxuan believes that the use of caching is an architect's skill, and that a good architect can determine exactly what type of cache to use based on the type of data, the scenario of the business, and how to use this type of cache. There is no silver bullet in the cached world, There is no cache that can solve any business scenario or data type, and if this technology emerges, the architect will be less valuable. hehe.

Oscache
　　
Oscache is a broad-based, high-performance, Oscache cache framework that can be used in a common caching solution for any Java application.
　　
Oscache has the following features:
　　
Cache any object, you can cache portions of JSP pages or HTTP requests without restriction, and any Java object can be cached.
　　
Having a comprehensive Api--oscache API gives you a comprehensive program to control all the Oscache features.
　　
Persistent caching-caches can write to the hard disk at will, so it allows expensive creation (expensive-to-create) of data to keep the cache, even allowing the app to restart.
　　
Support cluster--the cluster cache data can be configured by a single parameter without the need to modify the code.
　　
Expiration of cached records-you can have maximum control over expiration of cached objects, including pluggable refresh policies (if default performance is not required).
　　
Official website http://www.opensymphony.com/oscache/
　　
　　 Java Caching System
　　
JSC (Java Caching System) is a server-based Java application with a distributed cache system. It accelerates dynamic Web applications by providing the management of various dynamic cache data.
　　
JCS, like other cache systems, is also an application for high-speed read, low-speed writing.
　　
Dynamic content and reporting systems can achieve better performance.
　　
If a Web site has a duplicate site structure, using a database that is intermittently updated (rather than continuously updating the database), the same results are repeatedly searched, and the performance and scalability of the cache can be improved by performing a caching approach.
　　
Official website http://jakarta.apache.org/turbine/jcs/
　　
　　 EHCache
　　
EHCache is a pure Java in-process cache that has the following features: Fast, simple, Hibernate2.1 as pluggable cache, minimal dependencies, comprehensive documentation and testing.
　　
Official website http://ehcache.sourceforge.net/
　　
　　 Jcache
　　
Jcache is an open source program, is trying to become JSR-107 Open Source Specification, JSR-107 specification has not changed for many years. This version is still built on the original feature definition.
　　
Official website http://jcache.sourceforge.net/
　　
　　 Shiftone
　　
Shiftone Java Object Cache is a Java Lib that executes a series of strict object caching policies, just like a lightweight framework for configuring caching working state.
　　
Official website http://jocache.sourceforge.net/
　　
　　 Swarmcache
　　
Swarmcache is a simple and efficient distributed cache that uses IP multicast to communicate with other hosts on the same LAN and is specifically designed for clustered and data-driven Web applications. Swarmcache provides better performance support for applications where typical read operations greatly exceed write operations.
　　
Swarmcache uses javagroups to manage the communication of dependencies and distributed caches.
　　
Official website Http://swarmcache.sourceforge.net
　　
　　 Treecache/jbosscache
　　
Jbosscache is a replicated transaction cache that allows you to cache enterprise application data to better improve performance. Cached data is automatically copied, allowing you to easily perform cluster work between JBoss servers. Jbosscache can run an Mbean service through JBoss application service or another Java EE container, and of course it can run independently.
　　
The Jbosscache consists of two modules: Treecache and TREECACHEAOP.
　　
Treecache--is a tree-structured transactional cache of replication.
　　
TREECACHEAOP-is an "object-oriented" cache that uses AOP to dynamically manage Pojo (Plain old Java Objects)
　　
Note: AOP is the continuation of OOP and is the abbreviation for Aspect oriented programming, meaning aspect-oriented programming.
　　
Official website Http://www.jboss.org/products/jbosscache
　　
　　 Whirlycache
　　
Whirlycache is a fast, configurable cache of objects that exist in memory. It can speed up the site or application by caching objects, or it must be established by querying the database or other expensive handlers.

How the cache technology in Java is implemented

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More