This is a creation in Article, where the information may have evolved or changed.
Overview
The message system is usually composed of producer, consumer, broker, the producer will write the message to the broker, the consumer will read out the message from the broker, the different MQ implementation of the broker implementation will be different, But the essence of broker is to be responsible for landing the message on the server's storage system. Both the producer and the consumer are clients for broker, but one is a production message and one is a consumer message. In Figure 2-1, both the producer and the consumer are sent to the server through the client request to execute the process of storing messages or getting messages, and there is a connection object on both the client and the server side that is responsible for sending requests and receiving requests, as follows:
Producer client applications generate messages
The client Connection object wraps the message into the request and sends it to the server
The portal of the server also has a connection object that is responsible for receiving requests and storing the messages as files
The server returns the response result to the producer client
Consumer client application Consumer messages
The client Connection object wraps the consumer information into the request and sends it to the server.
Server to remove messages from the file storage system
The server returns the response result to the consumer client
The client reverts the response result to a message and begins processing the message
Figure 2-1 Client and server interaction
Kafka as a distributed message storage system, the producer client needs to pass the message to the Kafka cluster to complete the message store, this chapter from the Kafka consumer realization as the entrance, in the process of source code analysis, consider the following several questions how to realize:
How does the producer ensure that messages are stored in a distributed manner to the Kafka cluster?
How does the producer client organize messages, send messages, and receive a response from the server?
The communication mechanism between client and server, how to effectively use threading model to communicate more efficiently
The focus of this chapter is mainly on the client and the service side of the network communication process, for the time being not related to the Kafka service-side implementation. Because for any distributed system, there must be a set of network-level communication mechanism responsible for data transmission between different nodes, the framework of the bottom layer to be able to handle the protocol codec, client and server request send and receive and so on. In the Java network programming is the first socket mode, and then evolved the selector selector mode, combined with the queue model, buffer mechanism, you can design a network layer for their own system communication protocol framework. Although the communication model and the server side of the architecture is not much association, but can be added to the bottom of the framework of some additional features such as time-out retry, serialization and other functions, the server can be more focused on the principal business logic, without the need to devote too much attention to the network layer of the various anomalies.
In a distributed system, the protocol is custom-made by the server, and the client can ensure that the client's request is received and processed gracefully as long as it follows the protocol to send the request. So in fact the implementation of the client can be implemented by different languages themselves, the official Wiki lists the majority of languages currently supported. Because of the different languages have their own network layer programming API, such as Golang using channel communication, Akka using actor to deliver messages, they can make full use of their own language features to implement different clients.
Kafka was initially written in Scala, so the early Scala version of producer, consumer, and server implementations were placed under the core package, and the latest clients were implemented in Java and placed under the clients package. In this chapter we mainly analyze the following parts:
New version of the producer client Implementation (Java)
Older versions of producer client implementations (Scala)
Server-side network connectivity implementation (SOCKETSERVER)
Double-ended queue inflightrequests
Queue
Figure 2-32 is a record collector of the batches queue and the networkclient of the inflightrequests queue comparison, the record collector in the double-ended queue of the elements only save data, no status information, so the operation for this queue is simply appended to the last queue, The first element of a queue is taken out. The element in the Inflightrequests queue is the client request object, which is stateful, such as whether the request has been sent to completion or not. A request send completion does not mean that it can be removed from the queue, but it can be deleted if the client does not need the response result to be sent.
Figure 2-32 Inflightrequests dual-ended queue
In fact, if the client request is added to the tail of the queue is also possible, 2-33 only the corresponding order of Peek and poll to change:
Figure 2-33 Two ways to add the newest element to a double-ended queue
Figure 2-34 adds a new request to the queue header as an example of how multiple requests are queued and how they are removed from the queue when they are completed, where [R1,R2,R3] needs to respond to the results, and R4 does not need to respond, assuming that [R1-R4] four requests belong to a node. The client then joins the queue sequentially. However, the latter request must ensure that the previous request is sent to the server node before it can enter the queue for sending, and when the response request is completed, the R4 is removed from the queue header, while the other requests are removed from the tail of the queue.
Figure 2-34 Double-ended queue operation
The life cycle of a client request
When a client establishes a connection to a node on the server, it determines whether the first request is completed based on the current request queue in the client to determine if the node can send more request cansendmore. So when is a client request a completed? Note that although Clientrequest is saved in the queue, both add and peek are taken out of the Requestsend object in the Clientrequest. The inheritance system of Requestsend to Send is requestsend->networksend->bytebuffersend->send. For Bytebuffersend, the completion condition is that there is no data to send, that is, the data in the buffer is finished. so the request completion here means that the current send request has been successfully sent to the server, but it does not have to wait for the request to receive the response result .
Even in the same queue of the same target node, multiple different clientrequest requests are in order, and in the previous analysis there are two places where client requests are not allowed to be added to the queue arbitrarily:
When you are ready to connect, Queue.peekfirst (). Request (). Completed () =true
After you can connect to send a request, Kafkachannelsetsend also make sure that Send!=null, a Kafkachannel only allows to run one send at a time
The second condition will also directly affect the first condition, if the first request is not completed, it will also exist in the Kafkachannel, the second request, if not restricted even if send!=null, also set the second request to Kafkachannel, The second request is returned when the first request is returned, because send has been updated by the second request, so this is problematic.
However, the completion of the clientrequest.requestsend does not mean that this clientrequest is completed in Networkclient, the client's request is sent to the server, but also waits to receive the response from the server. So inflightrequests represents a request that is not yet completed in progress , and the following scenarios indicate that a clientrequest:clientrequest request has not yet been completed to wait for delivery. Clientrequest request is being sent, clientrequest request has been sent (at this point requestsend complete), clientrequest the corresponding request has not received a response result, Figure 2-35 is the life cycle of clientrequest in inflightrequests.
Figure 2-35 The lifetime of a client request in a queue
Client request send and receive samples
We start with the sending thread, for example, the sending and receiving of multiple requests, and the operations in the queue. The send thread first runs when it is ready to work, selects Readynodes, and then creates a connection and client request Clientrequest for the node that is already ready, Calling Networkclient.send will first join the request to the queue of the target node of the request, and then set to Kafkachannel, each kafkachannel only one in-progress send, if there is already a send (such as in progress, the client request is not sent to complete is not reset to null) it is not allowed to be called again. When the selector polls, the send from the selected Kafkachannel is sent to the server through the underlying socketchannel. Figure 2-36 simulates the work process after the first request joins the queue.
Figure 2-36 Networkclient.send included in the queue and called Selector.send
Assuming that the first request has not yet been sent, such as in step 2/3, the sending thread prepares to send a second batch of data (assuming both requests are to be sent to the same target node), and Cansendmore returns false because the first request in the queue has not yet completed. It is removed from the readynodes when it is ready to work, so that no new clientrequest is created for the node, that is, the second request is not generated at all. Even without cansendmore this layer of judgment, suppose that a second request was created, When preparing to call Networkclient.send, but encountered a second obstacle, because kafkachannel.setsend requires that send cannot be empty to set, and now send has been the first client request is occupied, has not been reset, so the client request is still unable to be successfully Set up. So there is a problem, the request has been added to the queue, but there is no way to set it to Kafkachannel, Can only wait for the next time to call Networkclient.send, but so the request queue for the same request is added multiple times, so early in the first frame to stop the second request is not put in. So the time for the new request to be created must wait until the first request in the queue header has been completed before it is created, and the first request is set to Send=null at the time of completion, and the newly created request can be successfully set to Kafkachannel. So, if the first condition satisfies (cansendmore=true) The second condition is usually satisfied (send=null), figure 2-37 is the request [R2,R3] to join the queue header each time a queue is allowed, Figure 2-38 is a request that does not require a response to the result, R3 the request from the queue header, and figure 2-39 is the request that needs to respond to the result [R1,R2,R4] receives the response results separately from the tail of the queue after the request is deleted.
Figure 2-37 Adding a new request to the queue must ensure that the previous request is complete
Figure 2-38 Send completion of request without response result Delete from queue header
Figure 2-39 requests that respond to results are received after a response is removed from the tail of the queue
Example of a queue
Here the double-ended queue and the real-world queueing method is similar, 2-40 to go to the bank for business, for example, the queue machine to each person a number to indicate the order of Clientrequest requests, only the last number of people completed the business, the next talent can handle. In order to networkclient with the same semantics here, we slightly modified the rules under the queue, assuming that the business is divided into three steps: Tell the salesman to handle what business, salesman processing business, salesman to complete the business, these steps can be executed in parallel, And after performing a small step to go back to their seats to continue to wait, assuming there is only one business processing window (but in fact you do not worry, assuming that the clerk is only a portal, his backstage is the service side is open a lot of threads in processing). The first person began to transact business when first joined to Inflightrequests, and told the salesman to take money, the salesman received instructions, recorded this information (can be regarded as a special responsibility to receive business orders, but not to deal with specific business), the first person back to his seat, he can not leave the hall , because he only communicated this designation, but the money was not yet taken; because the first person in the queue has already completed the request to send the instruction, the second person can transact, the same first join to the inflightrequests queue, then the second person said to change the password, The salesman received the instruction after the same does not really carry out the command to change the password, but if this time the third person can not wait, have not waited for the second person to convey the instructions to force the queue, sorry, please wait! So Inflightrequest says the request has been sent, or the request is being sent , but they are still unable to leave the lobby because the response has not yet been received. Because each request is sent to the clerk are in order, so added to the inflightrequest in the clientrequest is also in order, this queue is a double-ended queue, the queue header is the most recently joined requests, the queue tail is the first to join the request, If the request for the first element of the queue has not yet been sent, the next request is not allowed to join the queue, so the new join queue element, before which the request must have been sent to complete, otherwise he could not be added to the queue.
Figure 2-40 Bank transactions and queues
In Figure 2-40, although the new request is added to the queue header (we set the tail to face the salesman), in line with the way the understanding is more intuitive, the first request is processed before the second request, but it seems that the salesman always face the first request. In order to better understand the two-terminal queue in Figure 2-41 is divided into two queues, the queuing queue is responsible for receiving requests, processing queue is responsible for processing received requests , requests in accordance with the order of sending queued queue, once the request is sent, the clerk will put the received request into another queue, Both of these queues actually satisfy the queuing theory. However, a dual-ended queue can be operated at the same time, so you actually need only one queue.
Figure 2-41 Queued and processing queues
Now if you push from a business window to multiple windows, 2-42 is like a client can send requests to multiple server nodes at the same time, each server-side target node has a double-ended queue, and each queue is handled in a similar way to the previous window. Just now each request is carried with itself and will be queued to the specified window.
Figure 2-42 Queue for multiple windows
Assuming that the first person's business has been successfully accepted, and that he has successfully taken the money, he will be able to leave the bank hall happily with the money, and now that his business has been fully processed, it will be removed from the inflightrequests. Because the inflightrequests is saved is sent or is sending the request, but did not receive the response results, once received response results should not continue to stay in the hall, after all, inflightrequests capacity is also limited, if the bank hall seats are full, The request volume is too big, so take the money and hurry home.
For requests that require a response, the request is processed sluggishly on the server, the return is in order, and the server is processed in the order of the client's request, and only the first request returns and the second request results, and the second request returns the result to the client before the first request. So for client requests that don't need a response, it's obviously unfair to wait until handlecompletedreceives is removed in handlecompletedsends, because he could have returned immediately, But wait until the people in front of him have received the results before his turn. For example, supermarkets usually set up no shopping fast lane, if the customer did not buy anything do not need to queue on the shopping lane to go out quickly.
If a client request does not require a response, it is cleaned up as it was sent, because the client does not want to respond to the result, so the faster the request completes, the better. This quick cleanup ensures that before the next request comes in, the queue must be in response: because the last request does not require a response, the previous request has been removed from the queue header before the next request joins the queue header. Take the supermarket as an example, the person who enters the shopping channel queue must know that the people who lined up are shopping, no one is so silly not to buy anything but also innocently in line. Also take the bank business as an example, if some people are to consult the business, the salesman is immediately able to answer, do not need to interact with the backend server (or despite the interaction, but the client does not care about the results, after you come out, he may have gone). Such client requests are queued, the first element of the queue header is added when the request is ready to be sent, and can be removed immediately from the head of the queue when it is completed, without having to enter the processing queue.
Now that the Java version of the producer client has been analyzed, table 2-4 summarizes the main components and their uses for the client delivery process:
Table 2-4 Java version of producer major components
This chapter summarizes
This chapter mainly analyzes two versions of the producer client and the service side of the network layer implementation, focusing on the client's networkclient and the service side of the Socketserver, The Java version of the client and server processor both use the selector selector mode and Kafkachannel, while the Scala version of the client uses a comparison of the original Blockingchannel. In the client server communication model, typically a client connects to multiple services, and one server accepts connections from multiple clients, so using selector mode can make network communication more efficient. The reactor mode is also used on the server to separate the IO part from the thread of the business processing part. In addition, the client and server use the data structure of the queue in many places to queue requests or responses, which is a structure that ensures that data is processed in an orderly manner and can be cached. Table 2-8 summarizes the use of queues in the Scala version of the producer client and server side, where the Java version of the producer does not include a more advanced double-ended queue.
When the client wants to send a message to the server, we get the cluster cluster status (Java version)/cluster metadata topicmetadata (Scala version), select partition for the message, select partition leader as the target node, On the server side Socketserver will receive requests sent by the client to handler and Kafkaapis processing, and the message-related processing logic is done by Kafkaapis and other components in Kafkaserver.
Figure 2-57 is an internal component diagram of the Kafka server, the network layer consists of a acceptor thread and multiple processor threads, and the API layer's multiple API threads refer to multiple Kafkarequesthandler threads. There is a requestchannel in the middle of the network layer and the API layer, which is a data exchange broker for requests and responses, and the API layer is associated with the log subsystem because the API layer requests to read or write to the log file. The main management class of the replication subsystem is Replicamanager, and Kafkaapis has a direct association with it, and a kafkabroker is associated with other brokers and dependent ZK, which are analyzed in subsequent chapters.
Figure 2-57 Internal components of Kafkabroker
Photo Citation: Https://cwiki.apache.org/confluence/display/KAFKA/Index
The producer analyzed in this chapter include the consumer that are to be analyzed later are not built-in services as Kafka, but a client (so they are all in the clients package) and the client can be independent of the Kafka cluster. Therefore, to develop client applications only need to provide an address of the Kafka cluster, the client can be independent from the Kafka cluster, figure 2-58 shows a typical producer, consumer and Kafka cluster interaction, wherein the Kafka cluster and zookeeper communicate with each other.
Figure 2-58 Producer, consumer, Kafka cluster interaction
The client has send and receive requests, the service side also has the logic of receiving and sending, because for I/O is bidirectional: The client sends the request, it means that the server to receive the request, the same server responds to the request and sends the response results to the client, the client receives the response. Next we will analyze how the request sent by the client is handled by Kafkaapis on the server side.
Source: Zqhxuyuan.github.io
Original: http://zqhxuyuan.github.io/2016/05/26/2016-05-13-Kafka-Book-Sample/#%E7%AC%AC%E4%BA%8C%E7%AB%A0_%E7%94%9F% E4%ba%a7%e8%80%85