Java Distributed System Switch function design (service lift level)
Blog Category:Summarize Java
First of all, the origin of the switch, such as Tokyo on June 18 to do shop celebration promotion, in the transaction of a single link, may need to call A, B, C three interface to complete, but in fact A and B is necessary, C is only the additional function (for example, when the next order to do a recommendation), optional, in peacetime system no pressure Sufficient capacity of the case, the call is not a problem, but in similar stores such as the big promotion link, the system is full load, this time in fact can not call C interface, how to achieve this. Change the code. No,no,no, this is too not agile, at this time the switch was born, developers as long as the simple execution of commands or click on the page, you can switch off the C interface calls, after the big promotion of the past, and then restore the switch back.
question one: How to implement the switch function in a single Java system.
In fact, for the switch, the corresponding Java type, very good mapping, is a Boolean value, in the need to do the switch operation, call this property, judge the state, and then go the corresponding logic. This class is a singleton, to ensure that the global unique (the code is not written, the singleton mode is generally learning design patterns in the first contact of hehe).
question two: In a single Java system, how to implement switch value change operation.
In a stand-alone system, changing the state of the switch is very simple (leaving a hole, the external can change the value of the property, for example, to true or false), this time, can be a page to maintain the switch, through the page's click Class to change the global unique properties, so that the switch action triggered.
question three: How to realize the synchronization of switch state in multiple homogeneous Java systems.
Through the introduction of one and two, in the case of a single machine, the switch can be changed, but in a number of isomorphic (here isomorphism, it is worth to deploy the same set of code, the logic is exactly the same, similar to the master and slaver mode) system, how to maintain consistency. Singleton mode, the switch attribute is loaded into the local cache, that is, Java has been holding the object, in the fullgc of the time to recover the kind of not go. At this point, it is necessary to load this data from a third-party external system if you want to maintain the state of the switch attributes in each system.
What system can act as a third-party external system? Can be a database access system, we call it Metaserver, the properties of the switch to prevent in the DB, and then metaserver provide the page to modify the data, while providing interface reading switch data, when the application starts, through the metaserver to read the data, Loaded into the local cache. The problem is that I changed the value through the Metaserver page and how each application knew I changed the properties. This time you need to pass some methods (many ways, can be a message system, can be zookeeper, can be a page trigger) to clean up the cache of the switch properties, let the cache reload, so as to achieve the latest state acquisition.
The general idea is that the Metaserver maintains the switch data-the application reads the data in DB to the local cache--db data changes-triggering the switch property cache reload.
This is not a bit complicated, there is no easier way. Of course, before Taobao Open source a system of diamond (persistent configuration management system, http://code.taobao.org/p/diamond/wiki/index/), in fact, can be understood as "configuration information pseudo-push service", For example, I changed the properties of a switch, no longer need to do the cleanup cache of things, diamond to help you do it (the principle is very simple, such as system a subscribed to the switch in the diamond, a will start a thread, every time to round the diamond service side, See if the data for the switch attribute is changed, and if there is a change, load the latest data on the diamond server side.
The general idea is to maintain the configuration information in the diamond--the System subscription switch attribute--The system rotation configuration is changed, the change is changed directly.
question four: Several pits for switch design
Sometimes, we for convenience, without the help of the question three kinds of metaserver or diamond way, is to leave an HTTP interface to trigger the modification switch (multiple machines, can write batch script), this time actually need us in Apache or Nginx, Disable access to this URL to prevent malicious users from piecing together links to make changes to the switch, which can only be triggered on the server through the Linux curl.
There is another, that is, if you modify the properties of the switch through the form of HTTP, there is a need to pay attention to, that is, the operation of the switch to power, so that convenient operation, to avoid the occurrence of inconsistent data in the cluster state (that is, the switch is open, the first execution is open, the second execution is off).
question five: How to do in the case of switch combination.
In the above cases, simply executing a single switch should be relatively straightforward. But I also a, B, c three switches, in different business scenarios, may need to close the A and B switch, in another scenario, may need to close the A and C switch, this time that the operation may be missing or negligence, how to do it. On the basis of a separate property switch encapsulation, such as A and b add a layer of properties, called "AB", modify the value of AB, the corresponding system to modify the values of A and B, so that the human flesh to avoid remembering some combinations.
question six: How to achieve automatic lifting level.
The above situation, are mentioned in the case of predictable circumstances, we do some artificial operation, this can not be automated. Of course, it is the automatic lifting level discussed in this section.
For example, now Tokyo and the external logistics companies have many, will call their system or logistics node state, this time, the logistics company system is not stable, if hung or response time is slow, for their own system will affect relatively large, the more ideal way is, in the logistics company system problems, This logic automatically degraded processing, and then after the logistics company system is good, then this part of the logic automatically upgrade, the whole process of no human participation, automatic maintenance system stability. Let's talk about the general idea here:
The first step: Make a counter, record the interface, tentative a of the number of successful calls, the number of failures and response time;
The second step: put this information in the queue, and set the threshold (such as RT more than 5 seconds to downgrade, 1 seconds to upgrade) and the threshold trigger change switch;
The third part: Asynchronously initiates a thread, scans the queue, reaches our condition, triggers to make the change (has a problem, is joins the business demotion, this time does not have the call quantity, also does not have the automatic promotion condition, how to engage in. At this time the business downgrade, not completely 100% of the stop, you can reserve a portion of traffic to continue to call a, the information of a call into the queue, according to this information, can be implemented upgrade);
Summary:
Above these are in succession of system maintenance to try or see the treatment method, through the switch mode, to achieve the system's lifting level, thereby better protect the system. This article only elaborated the general idea, did not involve the concrete code, hoped can achieve the contribution.