Because the work of historical products using terracotta as a distributed cache linear expansion platform, so have to advance the principle of the relevant understanding, of course, many of the design ideas and Oracle, memcached design similar, but also has its own highlights, That is the lazy loading of the JVM with fine-grained copies and linear scaling, which greatly reduces the consumption of serialized objects, increases CPU usage, and seamlessly linearly expands memory.
When I was studying terracotta, there was no rush to try out the environment and the demo, first I went to understand why the product before the selection of the use of it, what it is, what can be done, and other similar techniques compared to what advantages, and finally did the relevant demo verification, As well as the tuning in related products, let's now uncover its veil (the article is organized from official, web, and corporate repositories)!
What is terracotta?
Terracotta is a distributed Java clustering technology that subtly hides the complexities of multiple distributed JVMS, enabling Java objects to be transparently shared and synchronized across multiple JVM clusters, and can be persisted. In a sense it is similar to zookeeper in Hadoop and can be used as an alternative to zookeeper.
Terracotta uses a structure called central radiation. The JVM running the distributed application in this architecture will be connected to a central terracotta server at boot time. The terracotta server is responsible for storing the DSO object data and coordinating the concurrent threads between the JVMs. The terracotta library is located in the application JVM, which is used during class loading to enhance the class's bytecode, handle lock and unlock requests within the synchronization block, handle Wait (), notify () requests between the application JVMs, Handle the connection between the runtime and the terracotta server, etc.
What
terracotta can do
is a JVM-level open-source cluster framework, and one of its most important features is the DSO (Distributed Shared Object), Through the DSO we can cache those frequently accessed, important data on the TC server, and then shared with different JVMs in the cluster, thus reducing the load on the database, it also provides: HTTP session replication, spring Security Integration, Integrated with Hibernate, distributed cache (acquisition of Java Open source cache project Ehcache and Java task Scheduling project quartz and deep integration), Pojo clusters (such as the introduction of terracotta in the spring framework, Then the beans in spring can be distributed in clusters. Beans are distributed, so we can seamlessly extend our web system, our web system has a natural failover mechanism, and implement distributed application orchestration across the cluster's JVM ( in code injection mode, so you don't need to modify any ), while the TC server itself can be configured as a cluster, the activities in the down-TC server are automatically and seamlessly transferred to the cluster and other TC servers
Why Choose terracotta
terracotta is an excellent solution to provide HA and HP without changing any existing system code, compared to other clusters:
1, Since most Web application servers use Java serialization and data broadcasting to share session data, any node can modify any session data, causing a lot of memory, CPU and network bandwidth consumption. This consumption grows exponentially as the application server nodes increase. When the number of nodes more than 4, often due to session replication caused by excessive consumption, resulting in the overall cluster throughput began to decline. Because of the defects of the performance and usability of the normal session replication mechanism, many web developers have to save and share session-related data through data, thus increasing the pressure on the database, forming a new performance bottleneck, and terracotta the cluster to realize Sessino data sharing. Do not use broadcast mechanism, avoid the Java serialization, only the data of the modified field to the server and use the node, greatly reduce CPU and memory consumption
2, Use the server to realize the network expansion of memory, so that the limited memory of the client node can access much larger than its memory capacity of the data structure, Without worrying about a memory overflow exception
3, data is saved on the server side, so client JVM downtime does not result in data loss
4, incremental data transfer, intelligent data push, minimizing the burden on the network, allowing the client JVM to scale horizontally
5, server sharding, to achieve server data storage and data throughput scale-out
6, through the server to achieve shared data persistence, through the server cluster to achieve fault tolerance, such as
7, No need to learn new APIs, greatly reduce development costs
8, widely support a variety of application servers: WebLogic, WebSphere, Tomcat, JBoss, Jetty, Geronimo and so on, automatic session data migration, cluster-wide data visibility and a powerful management and monitoring function interface, It is convenient to monitor, debug and optimize the implementation of the shared data, performance data, hardware and software indexes of the integrated cluster (through JMX open server monitoring information)
9, Enterprise version of the terracotta server also provides data sharding function, Enables a linear increase in cluster throughput as the number of servers increases
The above should be able to clearly know what it is and what to do with, then we have to use a simple principle to design ideas to improve their own design capabilities, how we should do to maximize its advantages
injection and bytecode
How does a normal application get this distributed cluster behavior? When application classes are loaded by the JVM, they are secretly injected into the distributed cluster behavior (obtained through configuration file configuration) via bytecode enhancement techniques. This byte-code enhancement Injection technique is often used in many AOP frameworks, such as ASPECTJ and Aspectwerkz. The byte code of the class is parsed and checked by the terracotta library when it is loaded. These bytecode are then passed to the JVM to be reconstructed into a class, before which the bytecode is modified according to the configuration. The
Putfield and GetField byte directives are overloaded in order to maintain the modification of objects. The Putfield directive is replaced and can store changes to the various domains of a distributed object. The getfiled byte instruction overload enables you to get the domain data of an object from the server when needed, but only if it has not yet obtained the object referenced by the domain from the server's query, and the object referenced by the domain has not been instantiated in the JVM heap at this time. That is, GetField is a lazy initialize mode that will load the domain data if the domain is empty, otherwise it will not load.
in order to manage the coordination between threads, the Monitorenter and Monitorexit bytecode instructions are also overloaded, and invokevirtual instructions are overloaded, and these instructions are all object.wait () and the Objecti.notify () method used. Monitorenter means that a thread requests a monitor of an object. A thread blocks on this instruction until it obtains a lock on the object. Once a lock is acquired, the thread holds an exclusive lock on the object until the monitorexit instruction for that object is executed. If this object happens to be a cluster object when Monitorenter requests to query Monitor,-dso,terracotta will guarantee that the thread will also have an exclusive lock on the entire JVM cluster on the DSO object, in addition to requesting the object's local lock on the local JVM. Until then, the thread will remain blocked. When a thread releases a local lock on the DSO object on the local JVM, he also releases the lock on the corresponding entire JVM cluster.
in terracotta applications, all synchronized methods and synchronized blocks are often configured as "autolocking" (we can see that the synchronized is applied to the test example, The configuration file is configured with "autolocking":
), which means that the monitorenter and Monitorexit methods are processed with bytecode enhancement. Of course, some developers may be reluctant to use explicit synchronized keywords, so you can declare a method in the terracotta configuration file as a locked method, which allows the application to obtain cluster synchronization features (such as configuration).
The byte code instructions corresponding to the wait () and notify () methods of the object are also enhanced by bytecode. When the Wait () method of a shared object is called, the terracotta server joins the thread that calls the wait () method into a thread queue, which records all the threads in the entire JVM cluster that are waiting for the lock on the object. When the Notify method of this object is called, the server ensures that all threads that are blocked on that object in the entire cluster are notified. Once the object's notify is called in a JVM, the terracotta server chooses a thread that blocks on the object and then wakes it up to notify it. When Notifyall is called, the terracotta server causes all threads in the JVM to wait on the DSO to be awakened.
ROOT, cluster object graph
The cluster object starts from the root of a shared object graph, which can be configured by one or more domains in the Terracotta configuration file, and when a root is instantiated first, all objects that the root object and this root can reach become cluster objects:
Data from their various domains are passed on to the server and stored by the server, and once a root object is created in any JVM, the domain that the root object is created in ignores the allocation of the local heap object and instead allocates a server cluster object. This often happens when the second application instance creates the root object, and since the root object was created by the first application instantiation, the other applications, although these root domains are required by code to generate objects through constructors, are ignored. Instead, the Terracotta client library gets the root object from the server, instantiates it in the local heap, and assigns the reference to the appropriate domain, which is transparent and hidden by the terracotta library. This is where terracotta's work mechanism is the most important and valuable part of our application. Once an object has become a cluster object, he is assigned a unique object ID throughout the cluster and maintains cluster characteristics for the remainder of the lifetime. Once a cluster object suddenly becomes unreachable by any root object, and there is no instance of it in the entire cluster JVM, the cluster object is reclaimed by the terracotta server GC.
Fine-Grained change replication
The transaction containing the changes to the DSO object contains only the data for those fields that have changed. These transaction are sent to the terracotta server and other cluster JVMs to maintain cluster consistency. The server is not broadcast transaction sent to all other JVMs, these transaction will only be sent to a specific JVM, these JVMs contain objects represented by transaction, and these objects are instantiated on the heap of these JVMs. In other words, the terracotta server only sends part of the transaction that other JVMs must use. For example, if a thread changes the domain Q in the domain p and object B in object A, only the domain data of A.P and B.Q will be placed in transaction and sent to the server. Change the multiple related domain of an DSO in the terracotta must be atomic, must be used to synchronize with the Synchronized keyword, according to the previous definition, then change the DSO of these domains is transaction, is sent to the server. (a single domain that changes only one DSO itself is atomic and does not need to be explicitly synchronized with synchronized, meaning that a single DSO domain modification is itself a transaction). The terracotta server determines which JVMs contain instances of A and B. If a JVM's local heap contains only instances of object A and does not have an instance of object B, then the JVM will only receive A.P data without receiving B.Q data
Identification and serialization of objects
Because the history of object changes is confined to the domain hierarchy of the object and transaction contains only the fragments of the DSO rather than the entire object graph, terracotta does not use Java serialization to replicate the changes of the propagated object. For example, when we change the price field of a product object, we need to send the ID of the object that is changing on the cluster, the ID of the domain where the object changed, and the byte that contains the Price field data. The remainder of the product object is ignored. If we use the object serialization technique, each domain of the product object needs to be serialized, and the fields are referenced to other objects, so that the final result is simply a double field modification of the product object that will cause the entire object graph to be serialized. Terracotta's current approach is more efficient than Java serialization because it sends only the changed objects rather than the entire object graph. However, in addition to efficiency, the use of object domains as the basic unit of change has additional benefits: to preserve the uniqueness of objects, and if Java serialization is used to transfer objects between clusters, then the client application of the JVM cluster needs to deserialize the changed object and have to replace the existing object instance. This is why many other clusters and distributed technologies require the Put/get API, because a cluster object is obtained from the cluster and must require a get call, and when the object changes, it must need a put call to put the changed object back into the cluster. There is no such restriction on terracotta. A clustered object also survives in the JVM heap like a normal object. When objects are modified locally by the JVM, these modifications are directly on the JVM's heap. If these modifications are made through remote references to this DSO object on another JVM, then the local JVM will receive this transaction and will directly transaction on objects that already exist on the local heap. This means that for an DSO, at any given moment a JVM may have only one instance reference to it in the heap.
With terracotta, you don't have to consider that each JVM actually holds a copy of an object, and you don't have to think about putting the copy of the object back into the cluster when you're done with local modifications. Because there is no concept of object copy, a cluster object is a common object in the cluster heap, and behaves no different from ordinary objects, and any modification to the cluster object is also valid for any object that owns the cluster object reference. Because the uniqueness of the object is preserved, this makes the application of the cluster, multi-JVM is not different from the normal, single-JVM application in the behavior performance. The simplicity and power of preserving the uniqueness of objects in a cluster allows distributed features to be stripped out of the design and implementation of the application. Distributed behavior is pushed to the terracotta server, which is already integrated into the infrastructure. Just as Java's GC makes memory-managed code disappear completely from the application-layer code, terracotta makes distributed computing behavior disappear from the application code.
Virtual Heap/Network memory
In addition to sharing and synchronizing objects between multiple JVMs, terracotta is also able to work with the local heap for very large object graphs. As the shared object graph grows, it may not be able to be placed in the heap of a single JVM, and terracotta will maintain a configuration window on the distributed object graph, so that when the distributed object uses the heap more than a certain threshold, it will be flush out in accordance with a certain policy. When these flush out object fragments are used, they are then removed from the terracotta server and placed in the JVM heap. You can think of the terracotta server as an infinite virtual heap or network memory, since terracotta can be seen as a huge network memory that can be infinitely extended, and you can load the entire object into a distributed object graph without worrying about its size, Object needs to be mounted only once, which greatly reduces the time it takes for the application instance to start.
Article citation: http://yale.iteye.com/blog/1541612
Terracotta Design Principle analysis-(part of the content from the official description)