A year ago, when I first developed equeue, I wrote an article about its overall architecture, the background of the framework, and all the basic concepts in the architecture. Through that article, you can have a basic understanding of equeue. After more than 1 years of improvement, equeue both functional and mature in a number of perfect. So, I hope to write an article about Equeue's overall architecture and key features.
Equeue Architecture
Equeue is a distributed, lightweight, high-performance, reliable, pure C # written Message Queuing to support consumer cluster consumption patterns.
There are three main parts: producer, broker, consumer. Producer is the message sender, Broker is a Message Queuing server, is responsible for receiving producer sent over the message, as well as persistent messages; consumer is the message consumer, Consumer from the broker using pull mode to the broker pull messages for consumption, the specific use of a long polling (rotation) way. The biggest benefit of this approach is that it makes the broker very simple and does not need to be proactive in pushing messages to consumer, but as long as it is responsible for persisting the message, which eases the burden on the broker server. At the same time, consumer because it is their own initiative to pull the message, so the consumption speed can be self-control, will not appear broker to consumer message pushed too fast lead to consumer too late consumption and hang off. In the aspect of the message real-time, because it is a long rotation way, so the real-time of the message consumption can also guarantee that the real-time and push model are basically equivalent.
Equeue is a topic-oriented architecture that differs from the traditional MSMQ approach to queue-oriented. With Equeue, we don't need to care about queue. When producer sends a message, it specifies the topic of the message, rather than specifying which queue to send to specifically. Similarly, consumer sends the same message, subscribing to the topic, and does not need to care about which queue you want to receive messages from. Then, inside the producer client framework, all available queues are fetched according to the current topic, then a queue is selected through some queue select strategy, and then the message is sent to the queue; consumer side, It also obtains all of the queues below it based on the current subscription's topic, as well as the consumer of all current subscriptions to the topic, and calculates, on average, which queue the current consumer should be assigned to. This allocation process is consumer load balancing.
The main responsibilities of broker are:
When sending a message: is responsible for receiving the producer message, then persisting the message, then establishing the message index information (the global offset of the message and its offset CV in the queue), and then return the result to producer;
When consuming messages: responsible for querying a batch of messages based on the pull message request of consumer (by default, pulling request pulls up to 32 messages at a time) and returning to consumer;
Crossing if you are not clear about some of the basic concepts in equeue, you can look at the introduction I wrote last year 1, written in detail. Below, I would like to introduce some of the features of Equeue.
Equeue Key features high performance and reliability design
Network communication model, adopted. NET comes with a SocketAsyncEventArgs, internal Windows IOCP network model. Send message support async, sync, oneway three modes, regardless of which mode, internal is asynchronous mode. When the message is sent synchronously, the framework helps us to synchronously wait for the message to send the result when the message is sent asynchronously, and then returns it to the sender of the message after the result is returned, and the report time-out exception if a certain time has not been returned. When sending a message asynchronously, using the excellent socket message from the Eventstore Open source project to send the design, currently tested, performance is efficient and stable. Several cases have been running for a long time without a communication layer problem.
The design of the broker message persistence. Using the Wal (Write-ahead Log) technology, and the asynchronous batch persistence to SQL Server, ensures that messages are efficiently persisted and not lost. When a message arrives at the broker, it is written to the local log file, which is common in databases such as DB, NoSQL, and so on to ensure that messages or requests are not lost. Then, asynchronously bulk persists the message to SQL Server, using the. NET comes with SqlBulkCopy technology. In this way, we can ensure the real-time and high throughput of message persistence, as a single message is written to the local log file and then put into a dict of memory.
When the broker unexpectedly goes down, there may be some messages that are not persisted to SQL Server, so when we restart the broker, we recover all the non-consumed messages from SQL Server to memory and log the current SQL Offset for the last message in server, and then we scan all messages starting with offset+1 from the local log file, restoring all to SQL Server as well as memory.
A simple mention is that when we write a message to a local log file, it is not possible to write it all to a file, so we want to split the file. Currently is based on Log4net to write the message log, every 100MB a log file. Why is 100MB? Because, the purpose of our message log file is primarily to recover those messages in SQL Server that have not been persisted in the last time when the broker is restarted. Under normal circumstances, these messages should not be much. Therefore, we hope that when scanning local log files, we can scan the files as quickly as possible. Usually 100MB of message log files, can already store a lot of message volume, and SQL Server is not persisted in the amount of messages usually not exceeded, unless before the machine, there is a long time message can not be persisted, this situation, we should be monitoring and timely discovery, and take action. Of course, the size of each message log file can support configuration. Another point is that when recovering from a log file, there is still a need for an algorithm, because the message is not persisted, it is possible not only in a recent message log file, it may be in more than one log file, because as mentioned earlier, there will be a large number of messages not persisted to SQL Server case.
But in short, in the premise of guaranteeing high performance, the message is not lost (reliability) is fully guaranteed.
Consumer news, the use of bulk pull messages to consume the way. By default consumer a pull message request pulls up to 32 messages (as long as there are so many unused messages), and then consumer consumes them in parallel, and can be configured for single-threaded linear consumption in addition to parallel consumption. When the broker queries the message, the general situation is that the message is always in memory, and only one case is not in memory, which is analyzed in detail below. Therefore, the query message should be said very quickly.
However, the above-mentioned reliability of the message, only try to ensure that the machine does not lose messages. Because the message is placed in the DB, as well as the local log. Therefore, if the DB server hard disk is broken, or the broker's hard drive is broken, there is a possibility of losing the message. To solve this problem, we need to do replication. Equeue Next will support the broker's cluster and failover (failover). At the moment, I have developed a daemon service that monitors whether the broker process is dead or not, and automatically restarts if it hangs up, which in some way increases the broker's usability.
I think that the easier it is to do things, the better, don't get too complicated from the start. Complex things, often difficult to maintain and control, although the theory is very good, but there will always be a variety of problems, hehe. It's like a central architecture. Although the theory seems to be very good, but in actual use, the discovery or the centralized structure is better, more practical.
Supports consumer load balancing
Consumer load balancing refers to all the consumers of a topic, who can spend an average of all the queues under this topic. We use Message Queuing and I think this feature is very important. Imagine that one day, our site had an activity, and then producer generated a surge of news. At this point, if our consumer server is still only the original number, it will probably be too late to process so many messages, resulting in a large backlog of messages on the broker. The response time of the user request will eventually be affected because many messages cannot be processed in time.
Therefore, in this case, we hope that distributed Message Queuing can easily allow us to dynamically add consumer machines, improve consumption capacity. The equeue supports such dynamic scaling capabilities. If a topic, there are 4 queues by default, and each queue corresponds to a consumer machine for consumption. Then, when we want to increase the consumer by one more time, just add 4 queue for this topic on the Equeue Web console, and then we add 4 more consumer machines. This allows the Equeue client to support automatic load balancing, and a few seconds later, 8 consumer will be able to consume their corresponding queue. Then, after the activity, the message volume will fall back to normal level, then we can reduce the queue again, and the redundant consumer machine.
In addition, Equeue also takes full account of the smoothness of the downline queue, and can support freezing a queue first, which ensures that no new messages are sent to the queue. Then we wait until this queue message is consumed, then we can consumer the machine and delete the queue. This, should say, Ali's ROCKETMQ also did not do, hehe.
Broker supports a large number of message stacks
This feature, I have previously written a special article, detailed introduction of design ideas, here is also a brief introduction. One of the most important role of MQ is to sharpen the peak, that is, in the face of a large amount of information in a flash of time and consumers too late to consume, Message Queuing can play a role in buffering, so as to ensure that the message consumer server will not collapse, this is the peak shaving. If RPC is used, then all of the last requests will overwhelm DB,DB and will not be able to withstand so many requests and hang off.
Therefore, we want MQ to support the ability of message stacking, not because it can only support putting messages into server memory for fast. Because the size of the server memory is limited, suppose our message server memory size is 128G, each message size is 1KB, that can only accumulate up to 130 million messages. But generally 130 million is enough, hehe. But this, after all, requires large memory as a precondition. But sometimes we may not have that much server memory, but also the ability to accumulate so many messages. That would require our MQ to support the design as well. Equeue can allow us to configure the number of messages that are allowed to be stored in memory on the broker server at startup and the global offset of messages in Message Queuing and the mapping of queueoffset (what I call the message index information). We can configure it based on the size of our server's memory. Then, the broker will have timed scan threads, timed scans for more messages and message indexes, and if so, remove the extra parts. With this design, you can ensure that the server memory will not run out. However, there is also a premise that the message must be persisted to SQL Server if it is to be removed. Otherwise, it cannot be removed. This should usually be guaranteed because 100 million messages are not persisted to the DB in general, and if this is the case, there must be a serious problem with the DB, or the broker cannot establish a connection with the DB. In this case, we should have already discovered that the Equeue Web monitoring console can view the maximum global offset of the message at any time, the largest global offset that has persisted.
One of the problems with the above design is, what if the message consumer to pull now is not in memory? One way to do this is to pull the message from the DB to the memory, but the pull is definitely too slow. So, we can do an optimization, is to find that the current message is not in memory, because it is likely that the next message is not in memory, so we can pull 10,000 messages (configurable) from SQL Server db at once, so that the subsequent 10,000 messages must be in memory, we need to access the DB again. This design is actually a tradeoff between the memory usage and the pull message performance. The purpose of Linux's Pagecache is also this.
Another point is that when we restart the broker, we cannot restore all the messages to memory, but to determine if we have reached the maximum number of messages that can be sustained by memory. If it has already arrived, it can no longer be put into memory; Similarly, the recovery of Message index information is the same. Otherwise, when the message accumulates too much, it causes the broker to restart and the memory bursts out.
Design of news consumption progress update
Equeue the design of the news consumption schedule, and Kafka, ROCKETMQ is a train of thought. is to periodically save the consumption progress of each queue (queue consumed offset), a long value. The advantage of this design is that we don't have to send an ACK reply message to the broker immediately after we consume a message. If so, the pressure on the broker is great. If you just send a spending schedule on a regular basis, the pressure on broker is minimal. So how does this spending schedule come about? Is the use of sliding door technology. Is the consumer end, after pulling to a batch of messages, first into the local memory of a sorteddictionary. Then continue to pull the next batch of messages. The task is then started to consume the messages that were just pulled in parallel. So, this local sorteddictionary is to store all the messages that have been pulled to the local but have not been consumed. Then, when a task thread consumes a message, it removes it from the sorteddictionary. Then, as I mentioned above, the sliding gate technique means that, after each removal of a message, the queue offset of the message with the smallest key in the current sorteddictionary is obtained. As the message continues to consume, this queue offset will also grow, from a macro point of view, like a door moving forward.
However, this design has a problem, that is, if the dict, there is a OFFSET=100 message has not been consumed, then even if the news is consumed, the last sliding door will not go forward. Because the last queue offset in this dict is always 100. This should be well understood. So in this case, when the consumer restart, the next time the consumption of the location will start from 100, the back will be re-consumption again. So, our customers need to support idempotent processing messages internally.
Support for message backtracking
Because the message on the broker is not deleted immediately when the message is consumed, it is deleted periodically, for example, every 2 days (configurable). So, Equeue was fully supportive when we wanted to re-consume the news 1 days ago. Just modify the consumption progress to a specific value before consumer starts.
Web Management Console
Equeue has a comprehensive web management console that we can use to manage topic, manage queues, view messages, view message consumption progress, and view information such as message stacking. But currently does not support the alarm, will gradually increase the alarm function.
With this console, we have a lot more convenience using equeue, and we can understand the health of the Message Queuing server in real time. A UI interface for a management console that gives you an impression:
Equeue plans for the future
- Broker supports clustering, Master-slave mode, making it more available and scalable;
- The Web management console supports alarms;
- Out a performance test report, at present I am mainly no actual server, no way to actually test;
- Consider support for non-DBC persistence, such as local key/value storage support, or full local file persistence messages (very difficult);
- Other small function perfect and code local adjustment;
I believe: not to do well, only impatient.
Equeue-A pure C # write distributed Message Queuing Introduction 2