Objective
Open source community has a lot of excellent queue middleware, such as RABBITMQ and Kafka, each queue seems to have its characteristics, in the project selection, often dazzled, overwhelmed. For RABBITMQ and Kafka, which one should I choose?
RABBITMQ Architecture
RABBITMQ is a distributed system, which has several abstract concepts.
- Broker: A service program run by each node that is capable of maintaining additions to the queue of that node and forwarding queue operation requests.
- Master queue: Each queue is divided into one primary queue and several mirror queues.
- Mirror queue: The mirror queues, as a backup of the master queue. After the master queue node is hung, the system promotes the mirror queue to the master queue, which handles requests for client queue operations. Note that the mirror queue is only mirrored, and is not designed to bear client read and write pressure.
As shown, there are two nodes in the cluster, one broker on each node, and each broker is responsible for maintaining the queues on the native, and Borker can communicate with each other. There are two queues A and B in the cluster, and each queue is divided into master queue and mirror queue (backup). So how does the production consumption on the queue come true?
Queue consumption
If there are two consumer consumption queue A, these two consumer are connected to different machines in the cluster. Any node in the RABBITMQ cluster has meta information for all the queues on the cluster, so connecting to any node in the cluster is possible, with the main difference being that some consumer are connected to the master queue node and some to the non-master queue node.
Because the mirror queue is consistent with the master queue, the synchronization mechanism is required, and because of the consistency constraints, all the read and write operations must be on the master queue (think why read from the master queue?). is not the same as the database read and write separation. ), and then the master node synchronizes the operation to the node where the mirror queue resides. Even if the consumer is connected to a non-master queue node, the consumer operation is routed to the node on which the master queue resides for consumption.
Queue production
The same principle as consumption, if you connect to a non-master queue node, the route is past.
So, here are the little friends who can see the RABBITMQ: Due to the Master Queue single node, which leads to performance bottlenecks, throughput is limited. Although the language of Erlang was used internally to improve performance, it was not possible to get rid of the fatal flaw in architecture design.
Kafka
To tell the truth, Kafka I think is to see the RABBITMQ this flaw to design an improved version, the improvement point is: The single master of a queue into multiple master, that a machine can not carry the QPS, then I use more than one machine to carry the QPS, Is it possible to spread the flow of a queue evenly across more than one machine? Note that there is no intersection of data between multiple master, that is, a message is either sent to the master queue or to another master queue.
Each master queue in this Kafka is called Partition, which is a shard. One queue has multiple primary shards, and each primary shard has several secondary shards to backup, and the synchronization mechanism is similar to RABBITMQ.
For example, we omit a different queue, assuming that there is only one queue on the cluster (called topic in Kafka). Each producer randomly sends the message to the primary Shard, after which the primary shard is then synchronized to the secondary shard.
When the queue reads the concept of a group, a topic internal message will only be routed to a consumer within the same group, the message of consumer consumption in the same group is different; The group shares a topic, Appears to be multiple copies of a queue. Therefore, in order to achieve more than one group to share a topic data, Kafka will not be like RABBITMQ as the message consumption is deleted immediately, but must be configured in the background to save the date, that is, only the most recent period of time messages, more than this time the message will be deleted from the disk, This ensures that the topic data is visible to all the group within a time period (this feature makes Kafka ideal for a company's data bus). Queue reads are also read primary shards, and in order to optimize performance, consumers have a one by one correspondence with the primary shard, and if the number of consumers is greater than the number of shards, there are some consumers who do not get the message.
Thus, Kafka is definitely for high-throughput design, such as set the number of shards to 100, then there are 100 machines to carry a topic of traffic, of course, than the rabbitmq of the single-machine performance is good.
Summarize
This article only did Kafka and rabbitmq contrast, but open source queue not only these two, zeromq,rocketmq,jmq and so on, time is limited also did not see, it is not within the scope of this article comparison.
So, don't be fooled by these sorts of queues, identify the key differences from the architecture, and combine your actual needs (such as this article only from the throughput of a demand to investigate) the easy to solve the selection. The final summary is as follows:
- Low throughput: Both Kafka and RABBITMQ are available.
- High throughput: Kafka.
This article refers to the official documents from RABBITMQ and Kafka, so I really want to understand the principle of a middleware best to see the official documents, the document has a detailed design, we can own the comparison of the design scheme, so as to find the middleware in line with their actual situation.