I. Kafka INTRODUCTION
Kafka is a distributed publish-Subscribe messaging System . Originally developed by LinkedIn, it was written in the Scala language and later became part of the Apache project. Kafka is a distributed, partitioned, multi-subscriber, redundant backup of the persistent log service . It is mainly used for the processing of active streaming data (real-time computing).
In big Data system, often encounter a problem, the whole big data is composed of each subsystem, the data needs in each subsystem of high performance, low-latency continuous flow. Traditional enterprise messaging systems are not ideal for large-scale data processing. In order to have both online applications (messages) and offline applications (data files, logs) Kafka appeared. Kafka can play two roles: reduce the complexity of the system network. Reduce the complexity of programming, each subsystem is not a mutual negotiation interface, each subsystem similar socket plug in the socket, Kafka assume the role of high-speed data bus.two. Main features of Kafka
provides high throughput for both publications and subscriptions。 It is understood that the Kafka can produce about 250,000 messages per second (in megabytes), processing 550,000 messages per second (in megabytes).persistent operation is possible。 Persist messages to disk, so it can be used for bulk consumption, such as ETL, and real-time applications. Prevent data loss by persisting data to the hard disk and replication.Distributed system, easy to scale out, can be combined with zookeeper。 All producer, brokers, and consumer will have multiple, distributed. Extend the machine without downtime. Status of the message being processedis maintained at the consumer end, not by the server side。 can automatically balance when it fails.support for online and offline scenarios. three. Why use a messaging system
Communication between systems can be done through Message Queuing, that is, coordination and invocation between systems
Note:the difference between using Message Queuing and the SOA architecture. 1.SOA is called directly (can be called directly via RPC and HttpClient) 2. Using Message Queuing is through the delivery of messages to complete the consolidation and invocation between the two systems
Benefits of:1. Decoupling
With the use of Message Queuing, there is no direct call relationship between the two systems, only through the delivery of the message to interact, the two systems are not intrusive.
2. Improve system response Speed Example: Order Processing &NBSP Order Payment Successful method () {
&NBSP ; 1, modify order status
2, calculate loyalty points
&NBSP ; 3, notification of logistics delivery
} Note: 1. The three steps in the system should be processed and returned at the same time, which is more time consuming;  2. It is now possible to deal with the user's most concern, the most urgent need to see the Change Order status success information, so that the "Modify order status" can be processed first, and then immediately return to the user, "Calculate loyalty Points", "Notify Logistics for distribution", and put it in the message queue to continue processing in the following system. Redundancy
In some cases, the process of processing data will fail. Unless the data is persisted, it is lost. Message Queuing persists the data until it has been fully processed, bypassing the risk of data loss in this way. In the insert-get-delete paradigm used by many message queues, it is necessary for your processing system to explicitly indicate that the message has been processed before it is removed from the queue, ensuring that your data is safely saved until you are finished using it. Scalability
Because Message Queuing decouples your processing, it is easy to increase the number of messages queued and processed, as long as additional processing is required. No need to change the code, do not need to adjust parameters. Expansion is as simple as adjusting the power button. Flexibility & Peak Handling capability
Applications still need to continue to function in the event of a surge in traffic, but such bursts are not common, and it is a huge waste to be ready to invest in resources that can handle such peak access. Using Message Queuing enables critical components to withstand burst access pressure without crashing completely due to sudden and overloaded requests. Recoverability
When a part of the system fails, it does not affect the entire system. Message Queuing reduces the degree of coupling between processes, so even if a process that processes messages is hung up, messages queued to the queue can still be processed after the system resumes. Order Guarantee
In most usage scenarios, the order of data processing is important. Most message queues are inherently sorted and ensure that the data is handled in a specific order. Kafka guarantees the ordering of messages within a partition. Buffer
In any important system, there will be elements that require different processing times. For example, loading a picture takes less time than applying a filter. Message Queuing uses a buffer layer to help the task perform the most efficient execution ——— the processing of the write queue is as fast as possible. This buffering helps to control and optimize the speed of the data flow through the system. Asynchronous communication
Many times, users do not want or need to process messages immediately. Message Queuing provides an asynchronous processing mechanism that allows a user to put a message into a queue, but does not immediately process it. How many messages you want to put into the queue, and then deal with them when you need them.Four. Classification of message Queues Classification of Message Queuing: Point-to-point, publish/Subscribe
1. Point-to-pointMessage producer production messages are sent to the queue, then the message consumer takes out the queue and consumes the message
Note (disadvantage):
1. After the message is consumed, there is no more storage in the queue, so consumers are not willing to consume the information that has been consumed.
There are multiple consumers in 2.queue, but for a message, only one consumer can consume it. (When a system consumes the message, the other system can no longer consume it.)
2. Publish/subscribe (most commonly used)
Message producer (POST) will messagePublish to Topic, and more than one message consumer (subscription) consumes the message. and point-to-point differently,messages posted to topic are consumed by all subscribed consumers.
Five. Common Message Queuing Comparisons1.RabbitMQ: Supported protocols are many, veryHeavyweight Message Queuing,good support for routing (Routing), load balancing (payload balance) or data persistence。
2.ZeroMQ: The fastest Message Queuing system, especially for high-throughput demand scenarios, excels at advanced/complex queues, but the technology is also complex and provides only non-persistent queues.
3.ActiveMQ(Implementation of JMS): A subkey under Apache, similar to ZEROMQ, that can be queued with agent and peer-to-peer technology.
4.Redis: is a key-value nosql database, but it also supports MQ functionality,small amount of data, performance is better than RABBITMQ, the data more than 10K is too slow to endure.
NOTE: Message Queuing cannot be a single point, but it also requires clustering. This involves load balancing and the persistence of messages six. Kafka Test results