In recent time, the development of message bus feature was postponed, and some time was spent to optimize and reconstruct the original pubsub mechanism. Here is a record of the optimization process and how it has changed compared to the original design.
The role of pubsub within the message bus
PubSub is primarily used within the message bus for real-time control of all online clients. Each client in the use of the message bus, are "forced" to register to PubSub, and "forced" to subscribe to a number of channel, so that the message bus control console in real-time issued a number of control commands timely effective.
Previous Design Review
It is necessary to review the previous design. The pub/sub mechanism inside the message bus is implemented through the implementation of third-party technology components (currently supported by Zookeeper and Redis), about pub/sub here first to popularize a few concepts, first of all the components according to their business definition channel, A component that registers a focus on a channel if it needs to focus on a change in a channel (subscribe), when a component sends a change (publish) to the channel because the business requires it, Any component that subscribe the channel will get the change. This is because zookeeper and Redis support data storage, so the content of publish here can be either push to subscribe all components of the channel, or pull down other components according to the channel.
In fact, the previous approach focused on "automation" and "extensibility". For so-called extensibility, we use Java annotation scanning to "automate" the definition of the entire channel so that it doesn't have to be hard coded. And when the subsequent expansion of the business, a new channel is added, the definition of the previous channel does not need to make any changes. In addition, for the client's first acquisition (the current push mechanism zookeeper and REDIS support KV data storage) and subsequent updates to the client-side consistency of the push data, we have a channel corresponding to the database of a table, At the same time, each channel corresponds to its own method of automatic data acquisition.
Of course pub/sub from the point of view of the server is the uplink of data (extracting data from the database, push to the Subscribe client), from the client's perspective is the downlink of the data. So here we define a Idataexchange interface that is used to exchange data with the Pub/sub component:
A @exchanger annotation is then defined, which contains two properties:
- Table: represents the corresponding tables;
- Path: also channel, corresponding channel name;
Then the table involved in the change will be implemented as a separate xxxexchanger.
In order for each channel data source to be provided externally with a consistent interface, there is a unified definition of the interface that gets the data source: Idatafetcher:
Public interface Idatafetcher {public byte[] Fetchdata (Idataconverter Converter);}
The interface receives a data serializer and then serializes the obtained data and takes byte[] as the unified return value, because the data needs to be stored in the Pub/sub component (most of them support the API interface of the byte array).
The overall design is as follows:
Such a design is sufficient for the initial focus (automation, extensibility, the client's first fetch of data, and subsequent acquisition of change data resulting in code-processing consistency). But in terms of performance, it's very inefficient. Because it is a table corresponding to a channel, so it is a full table push, since it is full table push, then can not authenticate the client, unable to authenticate the client, it may be invalid code push (with a client-independent relationship data, will be pushed over), resulting in frequent push, invalid parsing and a series of vicious circle. In addition, the entire table of data, relative to the original data, but also require the client to do the corresponding resolution, calculate the appropriate view, for internal control and authorization, and so on, and all the client in this step to execute the logic is almost the same. The resulting view needs to be parsed as follows:
Private map<string, node> proconnodemap; Private map<string, node> reqrespnodemap; Private map<string, node> rpcreqrespnodemap; Private map<string, node> pubsubnodemap; Private map<string, node> idnodemap; Private map<string, node> secretnodemap; Private map<string, config> Clientconfigmap; Private Exchangermanager Exchangemanager; Private map<string, sink> tokensinkmap; Private map<string, string> Pubsubchannelmap; Private Node Notificationexchangenode;
The design after optimization
For the pub/sub re-design after the adoption-push-pull combination mode. No longer pushes the data, only the change notifications and the changed key (secret) are pushed. The client then pulls on demand.
The optimized design offers some of the following advantages:
Reduce client memory consumption
The previous design of Pub/sub was the "first pull, change full push" approach. And the full table data is pulled, which is a great loss to the client's memory consumption. When optimized, only the data view associated with the current secret will be stored.
Server-side Prepare Data View to reduce client computing time
After optimization, the data structure is specifically tailored for the client's use, and the view data that the client needs to use in the service side is calculated as a key-value pair and cached in the Pub/sub component's memory. The data structure of this view is as follows:
This way, the client will be very fast when verifying the communication permissions.
Reduce the number of remote access traffic overhead traffic
The primary means of reducing the number of communications is the local cache, where the client obtains the data: if it is local, it is fetched locally, and if not locally, it is cached in local memory after it has been fetched from the remote end. Some of the code looks like this:
Public synchronized Nodeview Getnodeview (String secret) { if (Strings.isnullorempty (Secret)) { throw new NullPointerException ("The secret can is not null or empty"); } if (This.secretNodeViewMap.containsKey (Secret)) { //local cache return This.secretNodeViewMap.get (Secret); } else { //remote data then local cache Nodeview nodeviewobj = This.pubsuberManager.get (Secret, Nodeview.class); C10/>this.secretnodeviewmap.put (Secret, nodeviewobj); return nodeviewobj; } }
Of course, the reduction in the number of communications, but also benefit from the custom-tailored "Data View", and according to the secret of each queue split into Key/value. The data changes caused by the console transition to a change notification event, and then the local cache is updated sequentially. Instead of pushing data changes as they were, it causes too many invalid network interactions and data calculations.
Amount of communication data
The primary means of reducing the amount of communication data is to obtain only valid data, such as when invoking the message bus API, each API requires passing in a secret to indicate the current corresponding queue node, so we only need to obtain from the remote client the current secret-related "Data View". Of course, here's a hypothesis: in most scenarios, a client typically uses only one secret within a JVM process. This assumption is justified because the API is designed to be used by a user who only needs to know the corresponding secret of their queue. Of course, it doesn't rule out that an app involves multiple queues, which can fetch up to a few secret views of the data. But the basic principle is: Do not take redundant data, on-demand access. And, the push also changed from the original data to the current change notice, although the notification is broadcast, but it is "self-claim" mechanism:
public void onchanneldatachanged (String channel, Object obj) { logger.debug ("=-=-=-=-=-=-received Channel: "+ Channel +" =-=-=-=-=-=-"); if (Channel.equals (Constants.pubsub_nodeview_channel)) { String secret = obj.tostring (); This.updatenodeview (secret); } else if (channel.equals (Constants.pubsub_server_state_channel)) { String serverstate = obj.tostring (); This.setserverstate (serverstate); } else if (channel.equals (Constants.pubsub_config_channel)) { this.updateconfig (obj.tostring ()); } else if ( Channel.equals (Constants.pubsub_notification_exchange_channel)) { this.updatenotificationnode (); } }
Pull Update:
Public synchronized void Updatenodeview (String secret) { if (This.secretNodeViewMap.containsKey (Secret)) { This.secretNodeViewMap.remove (secret); This.getnodeview (secret); } }
As you can see, the remote pull update is only available if the push secret is cached locally. Otherwise, the change notification is discarded directly.
Choice
Of course, this completely customizable mechanism also completely discards the previously focused automation and extensibility features. This is necessary because our team's message bus is positioned to hopefully have better performance.
PubSub of Message Bus optimization