Tomcat cluster source code level analysis
With the rapid development of the Internet, a variety of external access systems are increasing and the access volume is growing. In the past, Web containers were able to accept, process, and respond to the entire request lifecycle, now, in order to build a system that allows more users to access more powerful systems, the logic processing of web containers is distributed to other middleware through constant business decoupling and architecture decoupling, such as cache middleware, message queue middleware, and data storage middleware. The Web container may be less and less responsible, but it is indeed an essential part. It is responsible for receiving user requests and respectively calling the final response of each service. It can be said that currently the most popular web container is the tomcat kitten written in Java. Because production tomcat considers load balancing and high availability, it generally runs in cluster mode, this article focuses on how tomcat cluster functions are implemented and how production deployment is selected.
If a web application does not involve sessions, it is quite simple to create a cluster because nodes are stateless and nodes in the cluster do not need to communicate with each other, you only need to evenly distribute requests to the cluster nodes. However, basically all web applications use the session mechanism. Therefore, the difficulty in implementing a web application cluster lies in the synchronization of session data. Of course, you can use some policies to avoid complicated data synchronization operations, for example, the session information is stored in the distributed cache or the database for centralized management, avoiding communication between tomcat clusters. However, this method is insufficient. We need to introduce additional database or cache services and ensure their high availability, which increases machine and maintenance costs. This document assumes that the session is managed by the tomcat cluster instead of the unified session management mode.
Cluster incremental session manager -- DeltaManager
The tomcat cluster node completes data synchronization. No matter which node is accessed, the corresponding session can be found. For example, the first time the client accesses the generated session, tomcat synchronizes session incremental information to other nodes, and all session operations during the request are synchronized after each request is completed, in this way, the next request to any node in the cluster can find the response session information and ensure the timeliness of the information.
This is the default tomcat cluster session manager -- DeltaManager. It is mainly used for synchronization and maintenance of session states between nodes in the cluster. DeltaManager is responsible for synchronizing the sessions of a node to other member nodes in the cluster. It belongs to the full-node replication mode, full node replication means that the status of a node in the cluster needs to be synchronized to the remaining nodes in the cluster after the status changes. Non-full nodes may only be synchronized to one or several of the nodes. As shown in a general step of session replication for all nodes in the cluster, the client initiates a request, assuming that a certain Server Load balancer device distribution policy is assigned to one node node1, if no session object exists, the web Container creates a session object and then performs some logical processing, before responding to the client, it is important to synchronize the session object to other nodes in the cluster and then respond to the client. When the client initiates the second request, if the request is distributed to the node3 node, the session value of node1 is not lost in the execution logic because the session of node1. If you delete a session object, you must notify other nodes to delete the corresponding session. If you modify some attributes of a session, you must also update it to the sessions of other nodes.
DeltaManager is actually a session synchronization communication solution. In addition to the full-node replication mentioned above, DeltaManager also has the feature of only copying session increments. The increment is a full request cycle, the cluster synchronizes all session modifications in a request before the response. Next, let's look at the specific implementation scheme of Tomcat.
To differentiate different actions, you must first define various events, such as session Creation events, Session access events, session invalidation events, getting all session events, session incremental events, and session ID change events, in fact, there are nine events in the tomcat cluster. The cluster can communicate with each other based on these events, and the receiver can perform different operations on different events. For example, after a session is created on node 1, The EVT_SESSION_CREATED event is sent to the other three nodes. After receiving the event, the other three nodes create a session on their own, A session contains two important attributes: Session ID and creation time. These attributes must be sent by node node1 along with EVT_SESSION_CREATED, after the local session is created successfully, the synchronization of the session creation is completed. You can find the corresponding session on any node in the cluster by using the session ID. Similarly, for session access events, node1 sends the EVT_SESSION_ACCESSED event and session ID to other nodes. Other nodes locate the corresponding session based on the session ID and update the last session access time, this prevents expired sessions from being cleared. Similar operations include session failure events (destroying a session in the synchronization cluster), session ID change events (changing session IDs in the synchronization cluster), and so on.
Tomcat uses the SessionMessageImpl class to define various cluster communication events and operation methods. In the whole cluster communication process, it communicates according to the defined events, sessionMessageImpl contains the following events: {EVT_SESSION_CREATED, expires, expires, EVT_GET_ALL_SESSIONS, EVT_SESSION_DELTA, expires, expires, EVT_CHANGE_SESSION_ID, and expires}. In addition, it inherits the serialization interface (convenient serialization), cluster message interface (cluster operation), session message interface (event definition and session operation ).
The cluster incremental Session Manager DeltaManager manages DeltaSession through SessionMessageImpl messages, that is, it responds to different operations according to the events in SessionMessageImpl. DeltaManager has a messageDataReceived (ClusterMessagecmsg) method. This method is called after the node receives messages sent from other nodes, and the input parameter is of the ClusterMessage type, which can be converted to the SessionMessage type, perform different processing based on the nine events defined by SessionMessage. One event needs to be concerned with EVT_SESSION_DELTA, which is an event for incremental synchronization of sessions, all operations on a session-related attribute of a node in a complete request process are abstracted to the DeltaRequest object, and DeltaRequest is serialized and put into SessionMessage, therefore, the EVT_SESSION_DELTA event processing logic is to obtain and deserialize the DeltaRequest object from SessionMessage, and then synchronize all the operations on a session contained in DeltaRequest to the local session. This completes session incremental synchronization.
In general, DeltaManager is the manager of DeltaSession. It provides incremental synchronization instead of full synchronization, which greatly improves the synchronization efficiency.
Cluster backup Session Manager-BackupManager
The network traffic of Full-node replication increases by square meters as the number of nodes increases. It is precisely because of this that large-scale clusters cannot be built. In order to make the cluster nodes larger, the primary solution is traffic growth during data replication. Therefore, tomcat proposes another session management method, with only one backup for each session, it increases the network traffic of session backup linearly with the increase of the number of nodes, greatly reducing network traffic and logical operations, and building a large cluster.
Let's take a look at the specific working mechanism of this method. The cluster generally provides the overall service through Server Load balancer, and all nodes are hidden in the backend to form a whole. The implementation of the preceding modes does not require the assistance of Server Load balancer, so the Server Load balancer is omitted in the figure. The most common load method is to use apache to drag all nodes. It supports the "326257DA6DB76F8D2E38F2C4540D1DEA. the session id of tomcat1 is decomposed and located on the node named tomcat1 In the tomcat cluster (this method is called SessionStick, implemented by the apachejk module ). Each session has an original and a backup, and the backup and the original are not stored on the same node. For example, when the client initiates a request and is distributed to the worker AT1 instance node through Server Load balancer, generate an include. the session id of the tomcat1 suffix, And the tomcat1 node selects the node backed up by the session object according to certain policies, and then contains the {session id, the backup ip} information is sent to tomcat2, tomcat3, and tomcat4, as shown in the dotted line, so that each node has a session id and backup ip list, that is, each node has a backup IP address for each session.
After completing the preceding step, back up the session content to the backup node. If the backup addresses of the s1 and s2 sessions of tomcat1 are Tomcat 2, back up the session object to Tomcat 2, similarly, Tomcat 2 backs up s3 sessions to Tomcat 4, and Tomcat 4 backs up s4 and s5 sessions to Tomcat 3, so that all sessions in the cluster have a backup. When the Session 1 fails, the SessionStick client will continue to access the worker AT1 node to ensure that the session can be obtained. When tomcat1 fails, tomcat also provides a failover mechanism. apache perceives that the worker AT1 node in the backend cluster is removed, and then requests are randomly allocated to any other node, there are two scenarios:
① The backup node tomcat2 can still get the s1 session. In addition, tomcat2 also needs to mark this s1 session as the original and continue to select a backup address to back up the s1 session, so that there is a backup again.
② If the non-Backup node atat3 is found, the s1 session cannot be found at this time, so it will ask all nodes in the cluster, "Who has the backup IP address of the s1 session ?", Because only atat2 has s1 backup address information, it receives the request and replies to inform tomcat3 node that the s1 session backup is in tomcat2. Based on this information, the s1 session can be found, in addition, tomcat3 generates an s1 session locally and marks it as the original, and the copy on tomcat2 remains unchanged. In this way, the s1 session can also be found, and the entire request is processed normally.
Next, we will analyze Tomcat's detailed implementation of the above mechanism. Under normal circumstances, to support efficient concurrent operations, all tomcat session sets use ConcurrentHashMap Structure storage. The String type refers to SessionId, And the MapEntry type encapsulates session, source node members, backup nodes, and so on. (The detailed class structure is shown in. Although the backup node is an array type, but in actual situations, we only set one backup node). Generally, when the session object is generated by a node, which node is the source node, and the backup node is any other node in the cluster, therefore, MapEntry can be considered as a session object that contains the source node and backup node information. The Session Manager encapsulates the operations on the session set. From the perspective of design, to change the operation behavior of the session set, you only need to inherit the ConcurrentHashMap class and override some of the methods, for example, put, get, remove, and so on. Therefore, tomcat's BackupManager encapsulates cross-node operations on the entire session set into a LazyReplicatedMap subclass that inherits the ConcurrentHashMap class. There are many things to do to implement cross-node operations, for example, the maintenance of the backup node list, the selection of backup nodes, the communication protocol, serialization & deserialization, and complex I/O operations. After figuring out the working principle of LazyReplicatedMap, it is also clear how BackupManager works.
Each node maintains a cluster node information list for selecting the session backup route. The information list is maintained by broadcasting node information and heartbeat to all nodes at startup, such as left, when n1 is started, it broadcasts its own information to other nodes. After receiving the information, other nodes add n1 to their own list, while n1 adds n2, n3, and n4 to their own list, next, send heartbeat to other nodes at a certain interval. For example, if n2 does not respond to n1, n1 deletes n2 from its own list. BackupManager uses the classic Roundrobin algorithm to select backup nodes. It is an average distribution algorithm and selects nodes in sequence. For example, a cluster has three nodes: node1, node2, and node3, node1 backs up session1 to node2. session2 backs up session1 to node3. For the node information list BackupManager, HashMap is used. Structure storage. Member is a node abstraction that contains node information attributes. Long indicates the latest node survival time, during heartbeat, the node is determined based on the latest survival time and timeout threshold.
Communication protocols and information carriers are defined by the MapMessage class. communication protocols are actually the semantics agreed by both parties, the defined constants include {MSG_BACKUP, MSG_RETRIEVE_BACKUP, MSG_PROXY, MSG_REMOVE, MSG_STATE, MSG_START, MSG_STOP, MSG_INIT, MSG_COPY, MSG_STATE_COPY, and MSG_ACCESS}. Each value represents a semantic meaning, for example, MSG_BACKUP allows the receiver to back up the received Session object, and MSG_REMOVE allows the receiver to delete the corresponding session according to the received session id. In addition, the MapMessage class also contains valuedata (byte []), keydata (byte []), nodes (Member []), primary (Member ), indicates the session object byte stream, session id byte stream, backup node, and source node respectively. In this way, all the elements are available. In the backup operation, the MapMessage object is like a sentence: "My session id is keydata, the session value is valuedata, and my source node is primary, now I need to perform backup operations ".
In addition, serialization and deserialization are completed by jdk's ObjectInputStream and ObjectOutputStream, while complex network IO is completed by the tribes communication framework.
What do source nodes, backup nodes, and proxy nodes mean? Each session in each cluster has only one source node, one backup node, and several proxy nodes. For example, if node1 is the source node, it indicates that the session object is created and the original Session object is saved. node3 is the backup node and stores the backup parts of the session object; node2 and node4 are proxy nodes that only store session location information, such as the ip address of the machine on which the node node3 is backed up. This classification aims to provide the failover capability. ① if the source node is down and the request falls to the backup node, the session object can be obtained. At this time, the backup node becomes the source node, select a backup node from node2 and node4 and copy the session object to the new backup node. ② If the backup node goes down, the request can also get the session object from the source node, however, a new backup node will be selected from node2 and node4, and the session object will be copied to the new backup node. ③ if the proxy node goes down, everything will not affect and work properly.
After figuring out the basic principles described above, let's take a look at how LazyReplicatedMap achieves Session Object Storage locally and back up across nodes.
First, let's take a look at how it can be saved by calling the put method. The first step is to instantiate the MapEntry object used to save session-related information. The input parameter key is the session id, value is the session object and the current node is set as the source node. Step 2: Determine whether the session is included in the session set. If the session already exists, delete the session on the local node and backup node; step 3: Use the Roundrobin algorithm to select a backup node and assign a value to the backup node attribute of the MapEntry object. Step 4, assemble the MapMessage object containing the MSG_BACKUP ID and send it to the backup node to tell the backup node to back up the session information I passed in. Step 5, assemble a MapMessage object containing the MSG_PROXY ID and send it to other nodes except the backup node, telling them "you are a proxy, record the session id, source node, backup node, and other information. "Step 6: Put the MapEntry object in the local cache;
PublicObjectput (Objectkey, Objectvalue ){
① Instantiate MapEntry, pass in the key and value, and set the source node as the current node.
② Determine whether the local memory contains a key. If so, not only local remove, but also cross-node remove.
③ Use the Roundrobin algorithm to select a backup node from MapMember.
④ Instantiate a MapMessage object containing the MSG_BACKUP identifier and send it to the backup node.
⑤ Instantiate a MapMessage object containing the MSG_PROXY ID and send it to other (proxy) nodes except the backup node.
⑥ Put into the local cache.
}
Next, let's see how to get the session object through get:
PublicObjectget (Objectkey ){
1. Obtain the local MapEntry object. It may directly contain the session object, or contain the location information of the session object.
② Determine whether the current node belongs to the source node. For example, if it is the source node, the session object in the MapEntry object is directly obtained and the result is returned.
③ Determine whether the current node belongs to the backup node. If it is a backup node, the session object in the MapEntry object is directly obtained as the return object, in addition, it also upgrades the current node to the source node, selects a new backup node, and copies the MapEntry object to the new backup node.
④ Determine whether the current node belongs to a proxy node. If it is a proxy node, a session object copy request will be sent to other nodes. "Please send me any of these session objects in the cluster ", place the received Session object on the current node as the return object, and then upgrade the current node to the source node.
}
Finally, let's take a look at the implementation of the delete Session Object remove operation:
PublicObjectremove (Objectkey ){
① Delete the local MapEntry object.
② Broadcast other nodes to delete this MapEntry object.
}
The preceding three methods clearly describe how the new Map performs addition, deletion, modification, and query across nodes. The BackupManager Session Manager uses this new Map to manage sessions.
The above is a basic analysis of the source code of the tomcat cluster mechanism. Both of them have their own advantages and disadvantages. The full node mode is replicated to each other. Once the number of cluster nodes and access volume increase, A large amount of session information needs to be replicated and synchronized to each other, which may easily lead to network congestion, and these synchronization operations may become a bottleneck of overall performance. Based on experience, in actual production, this solution recommends 3-6 cluster nodes. A larger cluster cannot be created, and a large amount of data is redundant, resulting in low utilization. The session backup mode greatly reduces network traffic and logical operations. A large cluster can be built and more than 10 nodes can be formed in production. Although this mode supports larger clusters, but it also has its own shortcomings. For example, it only has one data backup. If the source data and the machine where the backup data is located are down at the same time, there is no way to restore the data, however, the probability of downtime is very small.