[This article is originally written by Yves Trudeau.]Http://java.dzone.com/articles/exploring-message-brokers
Message brokers is not regularly covered here but is, nonetheless, important web-related technologies. Some time ago, I was asked by one of our customer to review a selection of OSS message brokers and propose a couple of goo D candidates. The requirements were fairly simple:behave well when there ' s a large backlog of messages, being able to create a cluster and In case of the failure of a node in a cluster, try to protect the data but never blocks the publishers even though that M ight imply data lost. Nothing fancy regarding queues and topics management. I decided to write my findings here, before I forget ...
I don ' t consider myself a message broker specialist and I spent only about a day or both on each so, I could have done some B IG mistakes configuration wise. I'll take the blame if something are misconfigured or not used correctly.
RabbitMQ
RabbitMQ is the well known and popular message broker and it has many powerful features. The documentation on the RabbitMQ Web site was excellent and there are many books available. RabbitMQ is written in Erlang, not a widely used programming language but well adapted to such tasks. The company Pivotal develops and maintains RabbitMQ. I reviewed version 3.2.2 on CentOS 6 servers.
The installation is easy, I installed Erlang version r14b from Epel and the RabbitMQ rpm. The only small issue I had was, the server is expecting "127.0.0.1″to be resolved in/etc/hosts and the OpenStack VMs I used were missing that. Easy to fix. I also installed and enabled the management plugin.
The
The RabbitMQ configuration is set by the Rabbitmq.config file and it has tons of adjustable parameters. I used the defaults. In the term of the client API, RabbitMQ support a long list of languages and some standards protocols, like STOMP is available WI Th a plugin. Queues and topics can be created either by the Web interface or through the client API directly. If you had more than one node, they can is clustered and then, queues and topics, can is replicated to other servers.
I created 4 queues, wrote a Ruby client and started inserting messages. I got a publishing rate of about 20K/S using multiple threads If I got a few stalls caused by the Vm_memory_high_watermar K, from my understanding during those stalls it writing to disk. Not exactly awesome given my requirements. Also, some part was always kept in memory even if a queue was durable so, even though I had plenty of disk space, the memory Usage grew and eventually hit the Vm_memory_high_watermark setting. The CPU load is pretty high during the load, between 40% and 50% on an 8 cores VM.
Even though my requirements were not met, I setup a replicated queue on 2 nodes and inserted a few millions objects. I killed one of the nodes and insert were even faster and then ... I did a mistake. I restarted the node and asked for a resync. Either I didn ' t set it correctly or the Resync is poorly implemented but it took forever to Resync and it's slowing down As it progressed. At 58% do, it has been running for 17h and one thread at 100%. My patience was exhausted.
So, lots of feature, decent performance but behavior isn't compatible with the requirements.
Kafka
Kafka have been designed originally by LinkedIn, it's written in Java and it's now under the Apache project umbrella. Sometimes a technology and you just say:wow that's really done the the-the-it should be. At least I could say this for the purpose I had. What's so special about Kafka are the architecture, it stores the messages in flat files and consumers ask messages based On a offset. Think of it like a MySQL server (producer) saving messages (updates SQL) to its binlogs and slaves (consumers) Ask message s based on a offset. The server is pretty simple and just don ' t care about the consumers much. That's simplicity makes it super fast and low on resource. Old messages can is retained on a time base, like Expire_logs_days, and/or on a storage usage base.
So, if the server doesn ' t keep track of what is been consumed on each topics, how does can you have multiple consumer. T The He missing element is Zookeeper. The Kafka server uses Zookeeper for cluster membership and routing while the consumers can also use Zookeeper or something else for synchronization. The sample consumer provided with the server uses Zookeeper so you can launch many instances and they ' ll synchronize Autom Atically. For the ones this doesn ' t know Zookeeper, it is a highly-available synchronous distributed storage System. If you know Corosync, it provides somewhat the same functionality.
Feature wise Kafka, isn ' t that great. There's no web frontend builtin Although a few is available in the ecosystem. Routing and rules are inexistent and stats is just with JMX. But, the performance ... I reached a publishing speed of 165k messages/s to a single thread, I didn ' t bother tuning for more. Consuming was essentially disk bound on the server, 3M messages/s ... amazing. That was without zookeeker coordination. Memory and CPU usage were modest.
To test clustering, I created a replicated queue, inserted a few messages, stopped a replica, inserted a few millions more Messages and restarted the replica. I took only a few seconds to Resync.
So, Kafka are very good fit for the requirements, stellar performance, low resource usage and nice fit with the requirement S.
ActiveMQ
ActiveMQ is another big player in the field with an impressive feature set. ActiveMQ is more in the RabbitMQ league than Kafka and like Kafka, it's written in Java. HA can is provided by the storage backend, LevelDB supports replication but I got some issues with it. My requirements is not for full HA, just to make sure the publishers is never blocked so I dropped the storage backend R Eplication in favor of a mesh of brokers.
My understanding of the mesh of brokers is, and you publish or consume a message, the the members and you. You don't know on which node (s) of the queue is located, the broker you connect to knows and routes your request. To further help, you can specify all the brokers on the connection string and the client library would just reconnect to an Other if the one is connected to goes down. That's looks pretty good for the requirements.
With the mesh of brokers setup, I got an insert rate of about msg/s-over-threads and a single consumer were able to Read msg/s. I Let it run for a while and got 150M messages. At this point though, I lost the Web interface and the publishing rate were much slower.
So, a big beast, lot of features, decent performance, on the edge with the requirements.
Kestrel
Kestrel is another interesting broker, this time, and more like Kafka. Written in Scala, the Kestrel broker speaks the memcached protocol. Basically, the key becomes the queue name and the object is the message. Kestrel is very simple, queues be defined in a configuration file but can specify, per queue, storage limits, Expirat Ion and behavior when limits is reached. With a setting like "Discardoldwhenfull = true", my requirement of never blocking the publishers was easily met.
In the term of clustering Kestrel are a bit limited but each can publish their availability to Zookeeper so that publishers and C Onsumers can be informed of a missing server and adjust. Of course, if you had many Kestrel servers with the same queue defined, the consumers would need to query all of the broke R to get the message back and strict ordering can is a bit hard.
In the term of performance, a few simple bash scripts using NC to publish messages easily reached 10k messages/s which is very Good. The rate is a static over time and likely limited by the reconnection for each message. The presence of consumers slightly reduces the publishing rate and nothing drastic. The only issue I had is when a large number of messages expired, the server froze for some time and that is because I fo Rgot to set Maxexpiresweep to something like and all of the messages were removed in one pass.
So, fairly good impression in Kestrel, simple but works well.
Conclusion
For the requirements given by the customer, Kafka is like a natural fit. It offers a high guarantee that the service would be being available and non-blocking under any circumstances. In addition, messages can easily is replicated for higher data availability. Kafka performance is just great and resource usage modest.
Published at Dzone with permission of Peter Zaitsev, author and Dzone MVB. (source)
Exploring Message brokers:rabbitmq, Kafka, ActiveMQ, and Kestrel--reference