Thoughts on how to design an event-driven architecture

Last Update:2013-08-05 Source: Internet

Author: User

Tags mongodb server msmq

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, I have been thinking about the following question: Is there a possibility that the state of a domain model is independent from the outside. it is only responsible for receiving external events and then responding to these events; there are two types of responses

Perform business logic processing based on the current memory status of the model and then generate events. note: This process does not change the current memory status of the model;
Change your status based on the event;
In addition, it is also the most important thing. the domain model does not need to care about the events it generates. for example, it does not care about persistence or concurrency conflicts with other events. It only responds to the above two points based on its current memory status;

If such an assumption is possible, the domain model is the real central business logic processor, which is similar to the CPU. So that it can truly get up fast.

Simply put, the event-> Model-> Event
The model only responds to events, then responds to processing, and then generates new events.

A domain model is a black box. it can only help you process business logic. it does not care about any other processing results. of course, a domain model must have its own status, however, this state is resident in the memory and integrated with the domain model.

Why do I have this idea? I wonder why the processing logic of the domain model depends on whether the processing results are correctly and smoothly persisted? This is ridiculous.
Since the domain model has its own memory state space, all its logic should also depend only on this state space and no longer on anything external.

Of course, the previously designed IRepository is actually directly retrieved from the database. In this case, the state space of the domain model is the database. But this is actually not good. why don't we use memory as the state space of the domain model?

Now let's look at LMAX as an actual example of my thoughts.

In this design, theoretically, you do not need to require a single thread to access the model, because the domain model does not depend on any external state, it only depends on the memory space in which it is located. a major benefit of a single thread is that it can prevent concurrent conflicts. In fact, we fully support multithreading or cluster methods, but the status of objects in the fields that may be accessed is old, because the status of the domain model memory objects between different machines needs to be synchronized, the possibility of accessing old data depends on the concurrency and the speed of data synchronization between machines;
The reason why LMAX uses a single thread is to consider that the performance between the single-thread domain model and performance is already very high enough to meet their requirements.

In this architecture, I think that a complete state update of any object in the domain model will respond to at least two events. for example:

First, respond to ChangeNoteCommand (command is also an event, which can be understood as NoteChangeRequested). then the Note model generates a NoteChanged event. Note that the state of the model has not changed yet, at this time, only an event is generated to indicate what happened;
Then the NoteChanged event is finally sent to the domain model for response. at this time, the domain model changes its Note status and saves the latest status to its memory space, such as in a dict or redis;
After the response to these two events, the final state of the Note is modified. we used to retrieve the Note from the database, change it, and save it to the database. This is not slow!

The above two event responses allow the domain model to respond very quickly to events because there is no IO at all.

For the rest, we only need to consider (I have considered the following six questions ):

Message serialization and deserialization;
Message transmission speed;
Speed of event persistence;
Design of retry after concurrency conflict;
What should I do if the message is lost;
How to implement memory synchronization between servers during cluster deployment;
You need to understand that these are not the considerations of domain models. Do not let the domain model consider any peripheral problems. we should look for solutions to the various problems one by one.

I have taken some measures to solve each problem:

Message serialization and Deserialization: this is simple. you can use BinaryFormatter or a faster open-source Serialization component. objects of the event size can reach times per second;

Message transmission speed: MSMQ/RabbitMq is used, and other queue components with persistence functions. if it is too slow, ZeroMq (no message persistence function) is used, but it can reach 30 W messages per second;

Speed of event persistence: because all events follow a single aggregation root, we only need to ensure that the events of a single aggregation root do not conflict (that is, events with no duplicate version number); for faster persistence, we can store events in partitions by means of aggregation root or other methods, and store events of different aggregation root on different servers. in this way, multiple events can be persisted at the same time by means of cluster persistence, this improves the overall persistent event throughput. for example, if a single mongodb server persists 5000 data entries per second, then 10 mongodb servers can persist data entries per second;
What to do after concurrency conflicts: generally, retry is selected, but in order to ensure that there is no uncontrollable situation (it may be due to a reason that the retry has been ongoing, causing message congestion ), you need to set a maximum number of retries. after the maximum number of retries is exceeded, no retry is performed, and logs are recorded for future queries. the retry here means: re-locate the command for the expected event, and then send the command again to the domain model for processing;

Message loss: if the message is lost, it will be lost. if you feel that the message cannot be lost, use a reliable message transmission queue with persistence function, such as MSMQ. of course, even if the message is lost, we often have to think about whether there is any impact. In general, the message is lost, at least we know that the program is faulty, because the model status is wrong at this time. We can record the log when the message is sent and received, so that we can find out which part of the message is lost in the future;

If any other exception occurs, I think if it is hosted code, you can add try catch where necessary and then record the log. Whether or not to retry depends on the situation;

In addition, if it is a multi-threaded access model or cluster access, then most of the time the Accessed domain objects in the memory are in the old state, what should we do? In fact, this is not a problem, because this concurrency repetition will be detected during event persistence, and the corresponding command will be retried.

If an event is successfully persisted, how can we let each application server know? I think this is also simple, that is, after the event persistence is complete, we use zeromq publish to send it to all application servers, each application server has a background thread that keeps receiving events that have been successfully persisted, and updates the status of domain objects in its memory space based on these events. This step can be automatically implemented by the framework. here, the second event (NoteChanged) I mentioned above is automatically handled by the framework without user code intervention. as mentioned above, because it is in the publish-subscribe mode, the data on each application server will naturally be synchronized;

In addition, this architecture transmits events and events are very small, so you don't have to worry about message transmission performance.

Some people have two concerns about the above ideas:

Is the incident a silver bullet that solves the current complex software architecture?
Will there be another disaster if there are massive incidents in the system?
I don't know who said it. the essence of OO is message communication. Command, event, or direct method call are essentially message communication between objects.

Method calls are too blunt (I remember you mentioned this before, of course, I think it is very suitable to use method calls in aggregation to implement communication between objects in aggregation)
In essence, command and event use message as the media to implement communication between objects. This reminds me of a metaphor that a senior once said. The following is an excerpt from his original words:

"Is the current SOA, ESB, and other things like building a" neural context "of an enterprise, and" OO "like" neurons ", communication between them relies on biological electric pulse, which is message-driven."

Therefore, I am wondering if the software should have a lot of objects and a lot of events as the core components to meet user needs. the objects are equivalent to neurons, A message is equivalent to a biological pulse. The entire software running process is such a network composed of objects and messages.

As for the complexity, I think the framework can help us implement message communication. what we programmers need to do is to define the object structure and then enable the object to send and receive messages. I think this is not very complicated!

Recently, I have been trying to implement this idea because my brother said, "I don't believe in any architecture, just show me the code ".

There are two things to think about and implement. how much of your ability, your design ability, your ability to control details, and the programmer's internal qualities can be understood by looking at the code.

Someone replied that the event itself is not wrong. I want to emphasize the issue of locating the "event. "Events" are the ways in which one field interacts with the other, but the boundaries are hierarchical. It is easy to understand in terms of human body, inter-cell events, inter-organizational events, and inter-organ events. Building such an event system is very complicated, and the current technology is hard to achieve. it can be solved without an EventBus.

In view of the above statement, I think this is mainly a problem of changing programming ideas. Event-driven programming is inherently asynchronous. The reason why I want to develop such a framework is mainly because:

The event-driven programming model does not impose any burden on the model, so that the model is only oriented to in memory, so as to achieve high performance is not a dream;
The version mechanism of the event allows us to easily implement optimistic concurrency, ensure strong consistency within a single aggregation root, and ultimately consistency between the aggregation root. then, we use the retry function automatically implemented by the framework, you can automatically retry after a concurrency conflict, which greatly avoids the failure rate of command execution;
Event data is not relational data, so there can be multiple event reporters and handlers. This means that we can easily create clusters and store events in any way, you only need to ensure that the events of the same aggregation root are put together, and the events of different aggregation root can be theoretically put on different servers, so that we can also run persistent events concurrently, we only need to create a unique index for the aggregated root id + commitSequence fields. Overcome the bottleneck of slow event persistence (I/O operations;

In the face of so many attractive features, what do we have to say? It doesn't matter if it is difficult. we can step by step. It's better than no idea. what do you mean?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More