Preface
Enode is a message-based architecture. It is a system developed using Enode. Each stage processes messages and generates new messages after processing. In this article, I want to analyze in detail how the Enode framework implements the entire message processing process. In order to better understand the subsequent process descriptions, I think we should first paste the Enode architecture diagram so that when you look at the subsequent analysis, you can think about and understand the architecture diagram.
Enode Architecture
Process analysis within the Enode framework
- The controller sends an icommand to the Message Queue (equeue );
- [Process command from this step] commandconsumer in Enode. equeue receives this icommand. First, create an icommandcontext instance, then call icommandexecutor in Enode to execute the current icommand and pass the icommandcontext to icommandexecutor;
- Icommandexecutor obtains a unique icommandhandler based on the current icommand type, and then calls the icommandhandler's handle method to process the current icommand. The current icommandcontext is passed to icommandhandler during the call;
- After icommandhandler completes command processing, icommandexecutor obtains the newly added or modified aggregate root in the current icommandcontext;
- Check whether there is only one new or modified aggregate root in the current icommandcontext. If there are more than one, an error is returned. This check ensures that only one aggregate root can be modified by one command at a time at the framework level;
- If the number of newly added or modified aggregate roots is 0, the icommandcontext oncommandexecuted method is called if the current icommand has been processed, in this method, the equeue is notified to send a commandresult message to the Controller. Then, a commandresultprocessor in the Controller process receives the commandresult message, and then the icommand processing result is displayed;
- Icommandexecutor obtains the currently unique modified aggregate root from icommandcontext, and obtains the idomainevent generated in the aggregate root. Since an aggregation root may produce multiple mainmainevents at a time, we will build an eventstream object. This object contains all the idomainevents generated by the current aggregation root. An eventstream contains many important information, including the ID of the current icommand, the ID of the aggregated root, the version number of the aggregated root, and all the idomainevents;
- Icommandexecutor adds the eventstream to icommandstore. Because icommandstore uses commandid as the primary key (that is, the key ),Therefore, if the commandid is repeated, the framework will know and then perform logic processing during the repetition., Which will be analyzed in detail later;
- If the command is successfully added to the icommandstore, call the commit method of the ieventservice to persists the current eventstream to the ieventstore;
- Ieventservice mainly implements three internal tasks: 1) persists eventstream to ieventstore; 2) Call imemorycache to update cache after persistence is successful (cache can be configured as local cache or distributed cache redis, if the command is processed by the cluster, we should use the shared cache, that is, the distributed cache such as redis); 3) after the cache is updated, call the publish method of the ieventpublisher interface to publish eventstream. The actual implementer of ieventpublisher sends the current eventstream to equeue. The three steps are normal.If the version number is repeated during persistence to ieventstore (the version number of the same aggregated root ID + aggregated root is the same, concurrency conflicts are considered), different logic processing is required for the framework; this is also analyzed in detail later.
- [Processing domain events from this step] eventstream is received by eventconsumer in Enode. equeue, and eventconsumer calls ieventprocessor to process the current eventstream;
- Ieventprocessor first checks whether the current eventstream can be processed. Here, we need to ensure that the persistence sequence of the event must be exactly the same as that of the subscriber, otherwise, the data at the command end is inconsistent with the data in the read dB at the query end.We will analyze how to ensure the consistency of this sequence in detail later.Here is a simple example to illustrate why the sequence is consistent. For example, if there is an attribute of an aggregate root, the default value of this attribute is 0, then, three domain events occur in this attribute (representing + 1, * 2,-1) respectively ). If these three events occur in this order, the final value of this attribute is 1. However, if these three events are consumed by the consumer in the order of + 1,-1, * 2 the final result is not 1, but 0. Therefore, through this example, I think everyone should know why the order of aggregated root persistence events must be exactly the same as that of consumed events;
- If the current eventstream is allowed to be processed, ieventprocessor will perform the following processing for each idomainevent in the current eventstream: 1) obtain all registered ieventhandler on the current ieventprocessor Node Based on the idomainevent type, then, call their handle method to update the read dB on the query side.However, it is not that simple, because we also need to ensure that the current idomainevent will only be processed by the current ieventhandler once.,Otherwise, ieventhandler will repeat the processing of the idomainevent and cause the final data to be incorrect. The idempotence here is also discussed in detail later.
- Some ieventhandler will generate a new icommand (SAGA Process Manager) after processing the idomainevent. In this case, we also need to automatically send these generated icommands to the Message Queue (equeue) by the framework; but the process is not that simple. What if these icommands fail to be sent?So we need to resend it. How can we design the resend to ensure that no matter how many times the resend will not lead to repeated icommand execution? The most important thing here is to ensure that the ID of the icommand you resend is always the same as that of the first message, otherwise the framework will not be able to know whether it is the same command. The specific design here is further analyzed.
Idempotence of command
In step 1 of the above process, the command will be added to the icommandstore. Here, I actually added a handlecommand object to icommandstore, which contains the current command and the modified aggregate root ID. For the reason for doing so, please refer to the explanation below. We know that icommandstore uses commandid as the primary key, so that we can ensure that a command will not be added repeatedly. If the command is successfully added to the icommandstore, it would be better to go directly to the subsequent step. But what should we do if the commandid is repeated?
If there are duplicates, You need to extract the previously persisted handledcommand Based on the commandid (primary key). Then we get the modified aggregate root ID from handledcommand, then, the most critical step is to query a possible eventstream from the ieventstore using the aggregated root ID and commandid as the condition. If yes, the domain event generated by this command has been persisted. Therefore, we only need to perform the event publishing operation again. Call the ieventpublisher. Publish method to publish the event to query side. So why should we release it? Although the event is persistent, it does not mean that the event has been successfully published. In theory, it is possible that the domain event is successfully persisted, but the power is down when the event is to be published! In this case, the restart of the server will be discussed here. So we need to perform the publish event again.
Then, what if eventstream is not found based on the commandid and the aggregated root ID? This means that although the command is persistent, The eventstream generated by the command is not persistent to eventstore. Therefore, we need to call the current eventstream to ieventservice. commit Method for persistent events.
In addition, there is actually a question here: Why can't I search for eventstream based only on commandid? The reason is: Technically, we can only find a unique eventstream Based on the commandid, but with this design, eventstore must support a commandid to globally locate an eventstream. However, considering that the eventstore data volume is very large, we may perform horizontal sharding Based on the aggregated root ID in the future ). In this case, we simply rely on the commandid to find out which part to find the corresponding eventstream. Therefore, if the aggregated root ID can be specified during the query, we can easily find the Shard to which the eventstream is located first, then, you can easily locate a unique eventstream Based on the commandid.
Now, let's talk about the horizontal split design of commandstore. The data size of commandstore is also very large because it stores all the commands. Fortunately, we only need to search for commandstore by commandid, so we can perform hash modulo Based on commandid to split horizontally. In this way, even if the partition is complete, we only need to know a given commandid and the shard under which it is currently located, so it is easy to find the command.
Therefore, through the above analysis, we know that commandstore and eventstore not only consider how to store data, but also how to partition large data volumes in the future, and how to easily find our data in the case of sharding.
Finally, there is another case not described above, that is, when the command is added to the commandstore, It is found to be repeated, but when trying to query the command from the commandstore according to the commandid, it is not found, day! In fact, this situation should not occur. If so, it indicates that there is a problem in commandstore. Why is there a duplicate when I add it, but the query is similar? Haha. This situation cannot be handled. We can only record error logs and perform subsequent troubleshooting.
Detection and processing of concurrent conflicts during domain event persistence
In step 1 of the above process, we mentioned: If the version number is repeated when eventstream is persistent to ieventstore (the version of the same aggregated root ID + aggregated root is the same, there is a concurrency conflict ), in this case, the Framework requires different logic processing. Specifically:
First, we can first think about why the same aggregation root generates domain events with the same version number at almost the same time and persists them to eventstore. First, let's talk about the reason why this situation rarely occurs: In Enode, when icommandexecutor processes a command, check whether at least one aggregation root is being processed for the current aggregated root to be modified by the command. If yes, the current command is discharged into the waiting queue corresponding to the aggregated root. That is to say, it will not be executed for the moment. Then, after the command in front of the current aggregation root is executed, the next waiting command will be retrieved from the waiting queue for processing. Through this design, we ensure that all the commands of an aggregation root will not be executed in parallel, but will only be executed in order. Because each icommandexecutor will automatically create this waiting queue for an aggregate root as needed, as long as two or more commands for the aggregate root come in at the same time.
What if the cluster is used? If you have a machine, the above method ensures that all the commands of an aggregated root instance are processed sequentially. However, in a cluster, an aggregation root may be processed on multiple machines at the same time. To solve this problem, the command is routed according to the aggregated root ID. Generally, as long as the command modifies the aggregated root, there will always be an aggregated root ID, therefore, we can follow this feature to route the sent command according to the aggregated root ID. As long as the commandid is the same, it will always be routed to the same queue, and then because a queue will always be consumed by only one machine, therefore, we can ensure that the command for the same aggregation root will always be processed on one machine. What if hot data is used? For example, some aggregate root may suddenly modify a large number of commands (doubled), while some may only modify a small number. What should we do? It doesn't matter. We also have a message queue monitoring platform. When the command of an aggregation root suddenly exists, we can use the queue feature of the equeue topic to cope with this problem at any time. For example, if there are only four queue under the topic, the number is now increased to 8, and then the number of consumer machines is increased from 4 to 8. This is equivalent to doubling the processing capability of command. This allows you to easily solve hot data problems. Therefore, this is why I want to implement the distributed Message Queue equeue myself! In some scenarios, if you do not have full control over the architecture, the system will be passive, which will directly lead to serious defects in the entire architecture. In the end, the system will be paralyzed, but you will be incompetent. Of course, you can say that we can use high-performance distributed queues such as Kafka and rocketmq. But after all, this kind of high-end queue is very complicated and not a. NET platform. In addition to problems, maintenance is certainly more difficult than self-developed. Of course, unless you are very proficient in them and have the ability to operate and maintain them with confidence.
The above implementation ensures that the command of the aggregation root is always designed for sequential linear processing, which is of great significance to eventstore. In this way, eventstore will not encounter concurrency conflicts, which will not cause unnecessary access to eventstore, or greatly reduce the pressure on eventstore.
But when will there still be concurrency conflicts? Because:
1) when a machine that processes command crashes, messages in the queue consumed by this machine will be consumed by other machines. Other machines may batch pull some command messages from this queue for consumption. Then, if we restart the faulty server, after the restart, the queue will be consumed again. Then, a key point is that each time a machine is started, it will pull the last message location of the queue from the equeue broker, that is, offset, because the offset update is asynchronous, for example, it will be updated to the equeue broker within 5 seconds, therefore, the consumption location pulled by the restarted server from the broker is actually delayed, in this way, the command messages that have been consumed or are being consumed by the replaced server before that may be consumed. Of course, this situation will not happen because the conditions are too harsh. Even if it happens, it will not lead to concurrent command execution. However, this is also a possibility. In fact, it is not only the case that a server is suspended and then restarted, it will lead to concurrency conflicts, as long as there are any machines in the cluster that process the Comand, this will cause the consumer cluster of the command message to re-load balancing. In the process of load balancing, some messages in the same queue under the same topic may be consumed on the two servers. The reason is that the update of the queue's consumption position (offset) is not real-time, but scheduled. Therefore, we generally recommend that you do not change the machine in the consumer cluster when there are many messages. Instead, try to avoid any messages, such as a.m, resize the cluster. This avoids the possibility of repeated message consumption or concurrent conflicts. Well, this section may be a bit confusing for many people. I can only talk about this level. Maybe I have to fully understand it. You still need to have a clear design for equeue!
2) even on the same machine, concurrent modifications to the same aggregation root may occur, that is, the two commands for the same aggregation root are executed simultaneously. The reason is: when the eventstream corresponding to a command is repeated during persistence, I will put it in a local memory queue for retry, then retry because it is in another dedicated retry thread, this thread is not a thread that processes commands normally. Therefore, if there is still a command to be processed after the aggregation root, it is possible that at the same time, an aggregation root is modified by two commands.
Now, let's come back and discuss what we should do if there is a conflict? As mentioned above, we need to retry the command. But it is not that simple logic. We need:
A. Check whether the current eventstream version is 1. If it is 1, a command for creating an aggregation root is executed concurrently. At this time, we do not need to retry, because even if we try again, the version of the generated eventstream is always 1, as long as it is the first time to create an aggregate root, the domainevent version generated by this aggregation root is always 1. In this case, we only need to take out the existing eventstream from eventstore directly, and then publish the eventstream through the ieventpublisher. Publish method. Why is it necessary to release again? The above explains the Power Equality of command and also explains the reason. Here is the same reason. There is also a small note here, that is, if you try to get this eventstream from eventstore, what if you don't get it? In fact, this problem should not occur because it is similar to the command power analysis above. Why does the prompt exist when the command power is added but cannot be found during the query? This is the case where the eventstore design is faulty and the read and write operations are not highly consistent.
B. If the current eventstream version is greater than 1, We need to update the memory cache (redis) first, and then retry the command. Why update the cache first? If no update is required, the state of the obtained aggregated root is still old during retry. Therefore, a version conflict occurs after retry. Why is the state of the aggregated root obtained from the cache still old? Because eventstream already exists in eventstore, it does not mean that the modifications to this eventstream have been updated to the cache. Because we are first persistent to eventstore and updating the cache. It is entirely possible that another command needs to be retried before you can update the cache! Therefore, the safest way is to update the aggregated root status in the cache to the latest value when you try again. So how to update it? Well, it's easy to trace events (event sourcing technology. We only need to get all the event streams of the current aggregated root from event store, then trace these events, and finally get the status of the latest version of the aggregated root, and then update it to the cache.
Finally, if you need to try again, how can you try again? It's easy to drop into a local memory-based retry queue. I use blockingcollection now.
How to ensure that the event generation sequence is the same as the consumption order
To ensure that the order is the same, we have explained in the process steps above. Here we will analyze how to achieve the same order. The basic idea is to use a table to store the max version number processed by all the aggregate root. If the max version number already processed is 10, then, you can only process the eventstream with the aggregated root version of 11. Even if version = 12 or later, you can only wait. How can we wait? Similar to the retry queue of command, it can be done in a local memory queue. For example, the maximum version number currently processed is 10, and now the eventstream of the 12 and 13 versions comes first, then the queue waits, and then the event whose version number is 11 comes, it can be processed. After the processing, the current maximum number of processed version numbers is programmed to 11, so wait for the eventstream with the version number 12 in the queue to be allowed to be processed. The entire control logic is like this. So this is a standalone algorithm. What if it is a cluster? In fact, this does not need to consider the cluster situation, because each of our machines has this sequential control logic, so if it is a cluster, the most possible situation (in fact, this situation is very low) Is that eventstream with version 11 is processed concurrently. This is what I want to analyze below.
There is actually another detail that I haven't mentioned yet. This detail is related to the consumergroup OF THE equeue consumer, that is, if a message has a lot of consumer consumption, then the consumer is divided into two consumergroups, the consumption of the two consumer groups is isolated from each other. That is to say, all these messages are consumed by the consumer in two consumer groups. If other designs are not used, potential problems may occur during use. I can't explain it clearly here. If I say it too clearly, it may lead to more confusion, and this is not the point of emphasis. So we will not expand it. If you are interested, you can take a look at the intention of the eventprocessorname field in the eventpublishinfo table in Enode.
How to ensure that an idomainevent is processed only once by an ieventhandler
I think everyone can understand this. For example, if an event handler updates the reading database, we may execute an SQL statement with side effects, such as update product set price = Price + 1 where id = 1000. If this SQL statement is executed again, the value of the price field is 1 more, which is not the expected result. Therefore, the framework should have basic responsibilities to avoid this situation. How can this problem be achieved? The idea is to use a table to record the ID of the executed domainevent and a code of the event handler type currently processing this domainevent, and perform a unique joint index on the two fields. Each time an event handler processes a domain event, it first determines whether it has been processed. If not, it will be processed, after processing, add the domain event ID and eventhandler type code to this table. What if it fails to be added? The event handler processes the same domain event repeatedly because concurrency may also occur. In this case, the framework is not strictly handled, because the framework itself cannot. Because the framework cannot know what the event handler is doing. It may be sending emails, recording logs, or reading updates (read dB ). Therefore, the most fundamental thing is to require internal event handler, that is, to develop the implementation of idempotence. Of course, the framework will provide developers with the necessary information to help them complete rigorous idempotent control. For example, the framework will provide the current domain event version to event handler, so that the version can be included in the where section of the update SQL to implement Optimistic Concurrency Control. For example, the following code example:
public void Handle(IEventContext context, SectionNameChangedEvent evnt){ TryUpdateRecord(connection => { return connection.Update( new { Name = evnt.Name, UpdatedOn = evnt.Timestamp, Version = evnt.Version }, new { Id = evnt.AggregateRootId, Version = evnt.Version - 1 }, Constants.SectionTable); });}
In the above Code, when we update a forum, we can use conditions such as version = evnt. verion-1 in the where condition of SQL. So that the current event you want to process must be the next version number of the previous event that has been processed, that is, the update sequence of the query side is strictly consistent with the event generation sequence. In this way, the event handler can perform strict sequential control within the framework even when there is a problem with the Internet leakage. Of course, if your event handler sends an email, I really don't know how to further ensure that this rigorous order or concurrency conflicts. If you are interested, you can contact me.
How can I enable the icommand generated in Saga Process Manager to support retry sending without causing repeated operations?
It's so tired. Persistence is victory! If the current saga event handler generates commands, the framework must ensure that the commands cannot be executed repeatedly when sending these commands. What should we do? If the command ID generated in the saga event handler is a unique new value each time, the framework cannot know whether the command is repeated with the previous one, the framework considers this as two different commands. There are actually two methods:
1. The framework first saves the command generated in the saga event handler, and then slowly sends it to the equeue. If one message is sent successfully, the other message is deleted. Until all messages are sent. This method is feasible, because the command we want to send is always obtained from the place where these commands are stored, therefore, the IDs of the same command to be sent each time are different. However, this design performance is not very good, because the command to be sent must be saved first, then sent, and deleted after being sent. Performance is certainly not too high.
2. The second approach is to directly send the command generated in the saga event handler instead of storing the command. But this design requirement: the Framework always produces the ID of the command to be sent according to a specific rule. This rule must ensure that the generated commandid must be unique first, followed by definite. Let's look at the code below:
private string BuildCommandId(ICommand command, IDomainEvent evnt, int eventHandlerTypeCode){ var key = command.GetKey(); var commandKey = key == null ? string.Empty : key.ToString(); var commandTypeCode = _commandTypeCodeProvider.GetTypeCode(command.GetType()); return string.Format("{0}{1}{2}{3}", evnt.Id, commandKey, eventHandlerTypeCode, commandTypeCode);}
The code above is a function used to build the ID of the command to be sent, we can see that the ID is a key of the command + the code of the command type to be sent + the ID of the domain event being processed, and the code of the current saga event handler type. If the same domain event is processed by the same event handler and the generated command type is the same, we can basically build a unique commandid through the three information, but sometimes this is not enough, because we may build two identical types of commands in an event handler, but the IDs of the modified aggregate root are different. Therefore, I only have a commandkey component. The key is the ID of the aggregate root to be modified by the command by default. In this way, with the combination of the four pieces of information, we can ensure that no matter how many times a domain event is processed by an saga event handler, the ID of the last generated command is always determined and remains unchanged. Of course, the commandkey above may not be enough to only consider aggregating the root ID, although I have not encountered such a situation yet. Therefore, in the Framework Design, developers are allowed to re-getkey. developers need to understand when to rewrite this method. After reading the instructions here, we should know!
Okay, it's almost time. It's time to go to bed!