1. Overview
When I started thinking about this article on "DIY ESB middleware", I had several attempts to give up. This is not because of the inertia of lengthy articles, but rather the technical knowledge involved in the ESB and the design difficulties that need to be overcome, and the lengthy few posts cannot even summarize them all, and if the idea is a little bit wrong it will mislead the reader. an ESB middleware that can be used consistently condenses the efforts of many participants in a team, and one must not be able to do it . But I think about it again and again, and make up my mind to do this even if the article is done, as this is the best summary of the knowledge points presented in this topic from the 19th article to the 39th article. We designed our own ESB middleware not to make it commercially available, nor to make it comparable to an ESB middleware on the market, or even to complete the solution to the technical difficulties in the ESB. Our aim is to test whether the knowledge points introduced in the whole project can be applied comprehensively after the reader's own digestion, and whether the technical knowledge can be ingenious and on-demand .
2. Top-level design of the ESB
(Top floor plan)
is the top-level design of the ESB middleware that we are going to implement. As you can see, the entire ESB middleware is divided into the following modules: Client clients, process orchestration/Registration tools, Master service modules, service State coordination groups (modules), Service run groups (modules). First, we describe the work of these modules in general:
Client clients are the various business service systems that need to access ESB middleware. such as logistics system, the system, CRM system and so on. when these client systems access ESB Middleware, integrated ESB Middleware is provided to their various development language versions of the Esb-client component . If you are using the C # language, the Esb-client component may be provided as a DLL file, and if you are using the Java language, the Esb-client component may be provided as a jar file, or if you are using Nodejs, it may be one (or more) JS files ...
Developers of these client systems will be able to use an independent process Orchestration/registration tool provided by the ESB middleware, which is generally named "..." in many ESB middleware systems. Studio, and these process orchestration/registration tools are generally available in the form of a variety of IDE plugins, such as making eclipse-plugin available to developers. The primary role of these tools is to enable the developer of the client system (the development team) to have the ability to register atomic services with the ESB Master service, and to have the developer query all available other atomic services (from other business systems) to the current server for the process orchestration/ The registration tool completes the new service flow orchestration and the release of the new version of the existing service flow. This is the step labeled "1" in.
In addition, the ESB middleware, in order to ensure that the atomic service used by the orchestration is not affected by the changes in the business system that provides the atomic service, typically specifies the version and invocation permissions of the atomic service when the business system is registered for the Atomic service. Call permissions are generally divided into blacklist permissions and whitelist permissions. In the case of whitelist permissions, only the business systems listed in the whitelist have permission to invoke this atomic service. Even if the atomic service participates in a process orchestration in an ESB, the call will fail if the business system requesting the orchestration is not in the whitelist.
The Master service provides the process Orchestration/publishing tool with new atomic service registration requests, new process release requests, and new release requests for existing processes. Data such as the latest atomic services, process orchestration, etc. will be stored in a persistent container (such as a relational database) by the master service, and the latest data changes are sent to the Service state Coordinator module. Note that the Master service is not responsible for performing the orchestrated process, but only for recording changes in the data orchestration and sending these data changes to the data reconciliation module , which is the step 2 indicated in. The master service also has two other functions: responsible for the status monitoring of the Rights Management and service running modules.
Because there are many nodes (hereinafter referred to as Esb-broker server nodes) in the Service runtime module responsible for the eventual execution of the process orchestration, and the number of these Esb-broker server nodes is constantly changing (new or reduced) during the service, the master service Do not know which Boker are running . In order to notify these running states that the Broker has a new service orchestration being published (or other event information), these running Esb-broker server nodes are connected to the service state coordination module, and the latter completes the event notification for data changes . This is the main function of the service State coordination module, as shown in step 3. In our own design of the ESB middleware, the "Service State Coordination module" consists of a set of zookeeper services (in my other blog post specifically introduced zookeeper, this topic does not explain the basic operation of zookeeper), if you have other functions in the actual work/ Technical requirements, you can also design the "Service State coordination module".
In the process of business system integration, the role of ESB middleware is to make atomic service calls, transform data, invoke principle service calls, and re-convert between various business systems. Finally, the result is returned to the requester who performed the service orchestration. Therefore, ESB middleware services often have high performance requirements . If there is only one node executing the ESB service orchestration, it is often not possible to achieve the design requirements of the ESB middleware or even make the ESB middleware service a performance bottleneck for the entire software architecture. So in the ESB middleware we've designed, the node that really executes the ESB service has multiple such esb-broker server nodes.
There are many benefits of using multiple broker servers during the ESB run service, which, first of all, guarantees that these request pressures can be evenly distributed across the system to the broker server nodes in the event of a request flood. Ultimately, the ESB service does not become the bottleneck for the entire top-level design. The allocation of the request pressure is done by the zookeeper cluster. In addition, multiple broker servers can guarantee that one (or several) broker server nodes will not stop serving the entire ESB middleware after an exception occurs and exits the service-an off-the-shelf fault-tolerant scheme. The Backoff algorithm can be used by developers to determine Esb-client the next time an attempt is made to access the broker server node where the error occurred, or to reassign a healthy broker server node for esb-client immediately. Finally, this solution guarantees the dynamic scale-out of the broker server while the ESB service is running: When the ESB Master Service module discovers that the performance of the entire Broker Server service group has reached (or is nearing) its peak, operations personnel can immediately open a new broker Server node, the zookeeper cluster is responsible for dynamically loading data information such as customized orchestration, custom processor processors into the new broker server node, and having the latter immediately join the entire service group to begin work.
When Esb-client (a business system) requests execution of a service orchestration, the Esb-client (a business system) already integrated zookeeper client is used to request the Zookeeper Cluster service of the ESB, from which the currently running broker server node information, and through an algorithm to determine which broker server node (many algorithms: polling algorithm, weighted algorithm, consistent hash algorithm, etc.), such as "Top layer Design" in step 4, step 5 is shown. To ensure that the new broker server nodes mentioned above are able to join the service group and serve Clientesb-client, steps 4 and 5 of the process can be periodic and, as appropriate, reassign the broker server node to esb-client.
When Esb-client determines the target broker server node, it formally initiates a request to the broker server to perform a service orchestration. When the same esb-client executes the service orchestration for the second time, it is possible to stop taking steps 4 and 5 for a certain period of time (within the effective time), and can initiate the request directly to the same target broker server node. Until this broker server is no longer able to respond to these requests (or if there are other grounds for determining that the broker server node is already unable to provide services), Esb-client will perform steps 4 and 5 to identify another new, working broker Server node. In the following article I will also focus on how to choose the Broker server node.
3. Log collection of the master service
As mentioned in the previous section, there are two modules in the ESB middleware we have designed: the Master Service module and the Service run Group (module). One of the functions of the Master Service module is to monitor the performance status of several service run group nodes (Broker Server) that are currently in the running state. The purpose of performance status monitoring is to ensure that operations personnel are aware of the operational status of these broker servers in real time, and can start new broker servers to share the pressure when the entire service run group is nearing the performance bottleneck, or when the entire service run group is not burdened with any requests. Stop some broker servers.
So how does the master node know the performance status of multiple service nodes across the broker server group? You know, the Broker server node is dynamically extensible. As already mentioned, the master node does not know which broker server nodes are currently running. Then log collection based on Kafka Message Queuing is a solution that designers can use Flume + Storm solutions for automated log collection and instant analysis. Here we introduce the two types of log collection scenarios. Note that the Kafka and Flume have been described in detail in the article before this topic, so the section on Kafka, Flume technology in this section no longer introduces the implementation of the design scheme.
3-1. Collect performance data using Kafka
Kafka server is characterized by fast, although in certain cases Kafka server will have a message loss or duplicate sending problem, but this problem for the log data collection scenario is not a big problem. Using Message Queuing to collect performance logs for each broker server node is also compatible with the dependency characteristics of the modules in the ESB: because in the ESB middleware we design, the Master service module does not know how many broker server nodes are running, nor does it know that the broker The IP location of the server node. This means that the Master Service module is unable to proactively collect performance data on these broker server nodes. The best approach is to proactively send log data by the broker server nodes of these activities.
3-1-1, design ideas
is a design example diagram that uses the Kafka component to collect performance data on the broker server node and perform performance processing and result storage:
, on each broker server node, in addition to starting a Camel context instance (detailed later), you also need to configure a Kafka producer end for sending data. The performance data collected by the Kafka-producer end may include CPU usage, memory usage, local I/O speed, operating system thread conditions, routing instance status in Camel context, endpoint in cache, Client-to-orchestration routing in Broker Server, and so on--business and non-business data can be monitored in this way, and based on business data.
Three Kafka Broker server nodes (recommended values) are deployed in Kafka servers to receive performance log data sent by each kafka-producer on several ESB Broker server nodes. To ensure the performance of the entire Kafka cluster, each Kafka Broker server has at least two partitions (partition, or recommended values). In order to save the service resources, you can put Kafka Broker server and Kafka-consumer on one service node, and you can even put them together with the Master service node.
Kafka-consumer is responsible for the processing of performance log data. Some readers may have to ask, since consumer receive the performance log data can be stored independently, then only need to find a suitable storage scheme (such as hbase) to store it, and what needs to consumer do? This is because the producer sampling frequency that the development team has completed may not be the same as the frequency of monitoring that is required by the OPS team.
To ensure the accuracy of performance monitoring data, the development team leverages the throughput advantages offered by the Kafka cluster to set a higher sample rate on the Kafka-producer integrated with each ESB Broker server node (or, of course, the resource consumption of the node itself). For example, 10 samples are completed per second for fixed business and non-business indicators. However, the OPS team monitors each ESB Broker server node through the master service, often without the need for such a high sampling rate (here you can provide a setup option for the OPS team to adjust at any time), which is roughly the same as 1 updates per second.
So how does consumer handle 9 more samples per second? There are two ways to think about it: one is to deal with the frequency at which the performance metrics on the console of the master service are displayed, and the consumer will write the data out of the storage system, and the other way is to discard the excess data that consumer will receive. Data is written to the persistence storage system only according to the sampling frequency set by the operations team. There is one situation in the second approach that requires special attention: if the performance data to be discarded reaches the performance threshold (for example, the memory usage of this acquisition is more than 2GB), this log data still needs to be retained. The first kind of processing basically has nothing to introduce, the pros and cons are also very clear: the advantage is that you can perform a complete performance history back in the late, the disadvantage is that it will occupy a large storage space-although there are many large storage solutions available and are very mature and stable, but they need more powerful funding budget support.
The realization of 3-1-2 and consumer
Here the author mainly discusses the second way to deal with consumer: discard redundant data. We can use the Concurrentlinkedhashmap as described in the previous article to set the fixed size of the Cache,cache that stores the performance message log in consumer to 200 (or one of the other larger values). This cache structure can help us do a lot of work: first of all, its reliable performance guarantees that a consumer will not be a bottleneck for the entire Performance Log collection scenario--although concurrentlinkedhashmap performance is not the fastest Secondly, this cache structure can help us automate the cleanup of redundant performance logs, since the first record at the end of the LRU queue is automatically excluded from the queue when the No. 201 log record is pushed into the cache, and finally, the garbage collection policy is recycled; Consumer when the performance log data in the cache is persisted in accordance with the sampling period set by the OPS team, it is always only necessary to remove the record that is currently being rejected in the cache, thus eliminating the need to write a program to determine "which performance log data to" between the two-cycle time difference To persist the location of the saved work.
Incidentally, if you need to use the CONCURRENTLINKEDHASHMAP data structure tool provided by Google in your project, you need to first add the corresponding component dependency information in the Pom file:
<dependency> <groupId>com.googlecode.concurrentlinkedhashmap</groupId> <artifactId>concurrentlinkedhashmap-lru</artifactId> <version>1.4.2</version></dependency>
The following are code snippets in consumer for handling LRU queue additions, LRU periodic reads, and LRU Delete events:
....../** * This is the LRU queue for performance data * /PrivateStaticFinalConcurrentlinkedhashmap<long, string> Performance_cache =NewConcurrentlinkedhashmap.builder<long, String> (). Initialcapacity ( $). Maximumweightedcapacity ( $). Listener (NewEvictionlistenerimpl ()). build ();/** * This listener is used to check if the data is being removed from the LRU queue <br> * To see if the record needs to be persisted and stored as required by the functional requirements. * @author Yinwenjie * *public static class Evictionlistenerimpl implements Evictionlistener<Long, String> { //time of last data acquisition, initially 1 PrivateLong Lasttime =-1L//This is a data acquisition period set by the OPS team, 1000 for 1000 milliseconds ///formal system, this value will have external read PrivateLong period = +L@Overridepublic void Oneviction (Long key, String jsonvalue) {/* * When any of the following conditions are set up, this data needs to be collected: * 1, Lasttime 1 (description is the first time the program is collected) * * 2, Current event-Lasttime &G t;= period (acquisition cycle) * * 3, when the monitoring data is greater than the set warning threshold, in this sample code * This warning threshold value is 80, in the formal system, this threshold should be read from outside the * below the Threshol The D variable represents this value * * /Long threshold = theL Long Nowtime =NewDate (). GetTime ();//Get CPU usage in performance data //Note that it is best not to pass the JSON structure in the formal system, the data of the text structure is good.Jsonobject jsondata = Jsonobject.fromobject (Jsonvalue); Long cpurate = Jsondata.getlong ("CPU"); Boolean mustcollecting =false;if( This. Lasttime = =-1|| Nowtime-lasttime >= This. period | | Cpurate >= threshold) {mustcollecting =true; This. lasttime = Nowtime; }//If you do not need to make persistent storage of data, terminate the operation of this listener if(!mustcollecting) {return; }// ******************** //You can do the persistent data store operation here. // ********************LRUConsumer.LOGGER.info (key +":"+ Jsonvalue +"Complete data persistent storage operation ======="); }}......//The following code is the operation when Kafka-consumer receives performance log data//Store this data in Performance_cacheLong key =NewDate (). GetTime ();//The number of milliseconds that can be used as the key value (in a formal scenario, consider multiple consumer nodes, key will have a more prescriptive rule for determining)LRUConsumer.PERFORMANCE_CACHE.put (key, PerformanceData);
In the above code, we use the LRU data structure we have already described before to save the packets sent over the consumer side. If the reader is not clear about LRU, you can view an introduction to one of my other articles (architecture design: Inter-system Communication (--apache) Camel QuickStart (2)). The CONCURRENTLINKEDHASHMAP structure provided by Google provides us with an off-the-shelf LRU queue, so that when the LRU queue is full, the first performance log data received will be removed from the end of the queue. The most critical processing will be done in the implementation class of the Evictionlistener interface, and in practice developers can also start a thread specifically after determining that a performance log needs to be persisted. For example, use a dedicated thread pool (threadpoolexecutor). This allows the LRU queue to be truly unaffected by the delay time of the persisted storage operation.
3-2. Collect performance data using Flume + storm
In the above scenario of collecting performance data for the broker server node using Kafka, additional code is required on each of the broker server nodes written to send data to the Kafka broker server. In fact, this functional requirement situation using Apache flume to collect data will make the technical solution easier to implement and maintain, we will briefly introduce this technical solution implementation. Since the author has described in more detail how to use Apache flume for basic configuration in previous articles, here we focus on two issues with Apache Flume data sources and Storm server receiving flume How the data sent by the server is processed.
3-2-1, design ideas
Shows the design structure of the entire functional requirements. The flume program installed on the ESB Broker server node is responsible for collecting various functional and non-functional metrics on this node. These performance log data are then passed to several trunk flume server nodes in load-balanced mode, which is designed to host/summarize performance log data from multiple ESB Broker Server nodes and eventually write data to Storm Server.
First, note that the Flume program installed on the ESB Broker server node, in the 3-1 section, collects node functional and non-functional metrics that are written by the developer and sent to Kafka-broker. But this is really a big detour, because the Linux operating system has provided many ways to collect node non-functional indicators (such as collecting I/O information, memory usage information, memory paging information, CPU usage information, network traffic information, etc.), the developer only need some script to complete the collection work. For example, we do not need to write a program in Esb-broker server (collecting CPU information should be a task of Esb-broker server) without having to collect CPU information, and use the following script:
-d0.1 | grep Cpu >> cpu.rel#写法还有很多,还可以从/proc/stat文件中获取CPU状态
The above script can get CPU information in 100 milliseconds. This information is stored in the Cpu.rel file as a new record. This allows Apache Flume to read the changes in the Cpu.rel file as the source of the Performance log data:
#flume 配置文件中的片段......agent.sources.s1.type = execagent.sources.s1.channels = c1agent.sources.s1.command0 /root/cpu.rel......
In the Esb-broker server node, we can use this way to read different log information from different files, as shown in the following:
=================================
(next)
Architecture Design: Inter-system Communication (40)-Self-designed ESB (1)