Org.apache.kafka.clients.KafkaClient

Source: Internet
Author: User

(according to version 0.10.0.0)

The only implementation class for this interface is networkclient, which is used to implement Kafka consumer and producer. This interface actually abstracts out the way the Kafka client interacts with the network.

In order to have a clear understanding of its API, first understand the requirements of Kafka protocol client and broker for network request processing rules.

Https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol

The server guarantees a single TCP connection, requests'll be is processed in the order they is sent and responses Would return in this order as well. The broker ' s request processing allows only a single in-flight request per connection in order to guarantee this ordering. Note that clients can (and ideally should) use non-blocking IO to implement request pipelining and achieve higher through Put. i.e., clients can send requests even while awaiting responses for preceding requests since the outstanding requests would b e buffered in the underlying OS socket buffer. All requests is initiated by the client, and the result of a corresponding response message from the server except where note D.

The amount of information in this paragraph is quite large.

Sequential nature

First, the broker processes the request in the order in which it was sent, and sends the response in the same order. Because the order of the messages is guaranteed by the Kafka:

  • Messages sent by a producer to a particular topic partition would be appended in the order they is sent. That is, if a message M1 was sent by the same producer as a message M2, and M1 are sent first, then M1 would have a lower off Set than M2 and appear earlier in the log.
  • A consumer instance sees messages in the order they is stored in the log.

The simplest and most reliable behavior to achieve this order guarantee is "the broker's request processing allows only a single in-flight request per connection in order to G Uarantee this ordering. , meaning that for a TCP connection, there will only be one processing (in-flight) message in the broker's request processing chain.

So, does Kafka need to cache pending messages at the broker end?

First, if the cache is requested, it may consume a lot of memory. Second, if the cache is requested, it makes it difficult for the Kafka client to control the order of the messages in the event of a request processing error, because this caching essentially makes the client's request asynchronous. Without caching, the broker's behavior is easier to understand for the client.

Therefore, the broker does not cache the request locally. When it reads a request from a connection, it stops continuing to read the request from the connection. That is, for each TCP connection, the broker's process is to receive a request---processing request, send response, receive next request, ...

Specific practices can be found in the kafka.network.Processor (that is, the subractor in the reactive model), in its run method, for the full read request and send the completed response, there are the following processing

Selector.completedReceives.asScala.foreach {receive = =Try{val Channel=Selector.channel (Receive.source) Val session= Requestchannel.session (NewKafkaprincipal (Kafkaprincipal.user_type, Channel.principal.getName), channel.socketaddress) v Al req= Requestchannel.request (processor = id, ConnectionID = receive.source, session = Session, buffer = Receive.payload, start Timems = time.milliseconds, SecurityProtocol =protocol) requestchannel.sendrequest (req)//send the request to Requestchannel, and later request handler will take out request for processingSelector.mute (Receive.source)//stop reading messages from the source of this request (not just the host)}Catch {             CaseE @ (_: Invalidrequestexception | _: schemaexception) =//Note that even though we got a exception, we can assume that Receive.source are valid. Issues with constructing a valid receive object were handled earlierError ("Closing socket for" + Receive.source + "because of error", E) Close (selector, Receive.source)}} selector.completedSends.asScala.foreach { Send=Val Resp=Inflightresponses.remove (send.destination). Getorelse {Throw NewIllegalStateException (S "Send for ${send.destination} completed, and not in ' inflightresponses '")} resp.request.updateRequestMetrics () Selector.unmute (send.destination)//set the source of the response that has been sent to be readable}

As can be seen, for a request being processed, the broker does not read the new message from its source until the request is processed and its response is sent.

Pre-fetch

On the other hand, for the client, if it receives a response from the last request before it starts generating a new request, and then sends a new request, the client processes the wait state during the wait for the response, which is inefficient. Therefore, "clients can send requests even while awaiting responses for preceding requests since the outstanding requests would be Buffered in the underlying OS socket buffer, which means that the client can continue to send requests while waiting for a response, because even if the broker does not read these requests over the network, the requests are cached in the OS socket Buffer, so that when the broker finishes processing the previous request, it can immediately read out the new request. However, if the client does this, it will make its behavior more complicated (because it involves the order of the error).

For consumer, it is difficult to determine the offset at the start of the next fetch until the response is received, so it is prudent to send the next fetch request after receiving the previous fetch respones. However, if the fetch response can be accurately judged to contain the number of messages, compared with the early fetch request, it is possible to commit consumer performance.

Moreover, the "Receive fetch Respone" and "the user finishes the fetch to the message" These two point of time is still different, after receiving the fetch response, sends the fetching the message to the user processing, issues the next fetch request, This can improve the efficiency of the consumer crawl. The new Consumer-kafkaconsumer did so. This is a piece of code in Kafkaconsumer's Poll method (the user obtains the message by executing this poll method)

  Do{Map<topicpartition, list<consumerrecord<k, v>>> records =pollonce (remaining); if(!Records.isempty ()) {                    //before returning the fetched records, we can send off the next round of fetches//and avoid block waiting for their responses to enable pipelining while the user//is handling the fetched records. //                    //NOTE that we use the quickpoll () in the case which disables wakeups and delayed//task execution Since the consumed positions has already been updated and we//must return these records to the users to process before being interrupted or//auto-committing Offsetsfetcher.sendfetches (Metadata.fetch ());                    Client.quickpoll (); return  This. Interceptors = =NULL?NewConsumerrecords<> (Records): This. Interceptors.onconsume (NewConsumerrecords<>(Records)); }                Longelapsed = Time.milliseconds ()-start; Remaining= Timeout-elapsed; }  while(Remaining > 0);

The middle section is talking about it, but it is more complicated than it was mentioned earlier.

First, if the pollonce get the records is not empty, it is necessary to return these records to the user, so before this to send a batch of fetch rquest (using fetcher#sendfetches). If empty, the pollonce in the Do-while Loop will send a new fetch request.

Second, because Fetcher's sendfetches does not perform network IO operations, it simply generates and caches the fetch request, It is also necessary to use the Consumernetworkclient Quickpoll method to perform an IO operation to send these fetch request. However, because the user has not received this pollonce return records, so can not be auto-commit operation, otherwise will not return to the user's records to commit, and also can not make the process of processing by another thread interrupt, Because of this, the user can not get these records. Therefore, calling Quickpoll,quickpoll here disables wakeup and does not execute delayedtasks (because Autocommittask is executed through the delayedtask mechanism).

Api

Kafkaclient, which is the interface between producer and consumer and broker communication, is designed based on the protocol above. This class includes methods related to connection state and request-response status. The implementation class that producer and consumer actually use is networkclient. The following methods combine the annotations of kafkaclient and Networkclient, but are based on the implementation of Networkclient.

public boolean IsReady (node node, long) checks to see if a node is ready to send a new request. Because it's for the client, "node" here is the broker.

public Boolean Ready (node node, long Now) is the connection to the specified Node that has been created and can send the request. If the connection is not created, create a connection to this node.

Public long Connectiondelay (Node, Long) is based on the connection state and returns the time to wait.  There are three states of connection: disconnected, connecting, connected. If it is a disconnected state, it returns the Backoff time of reconnect. When connecting or connected, return to Long.max_value, as it is time to wait for another event to occur (such as a successful connection or a response)

Public long connectionfailed (node node) checks to see if this node's connection failed.

public void Send (Clientrequest request, Long Now) puts this request into the send queue. If the request is to be sent to a node that is not yet connected, a IllegalStateException exception is thrown, which is a run-time exception.

Public list<clientresponse> Poll (long timeout, long now) reads and writes to the socket.

public void Close (String nodeId) Closes the connection to the specified node

Public node Leastloadednode (long) chooses node with the fewest unsent requests, requiring that node to be at least connected. This method takes precedence over the node that has the connection available, but if all of the connected nodes are in use, it chooses the node that has not yet established the connection. This method will never choose to remember a disconnected node or a connection being reconnect the backoff phase.

public int Inflightrequestcount () The total number of requests that have been sent but have not yet received a response

public int Inflightrequestcount (String nodeId) The total number of in-flight request for a particular node

Public RequestHandler Nextrequesthanlder (Apikeys key) constructs its request header for some kind of request. According to Kafka PROTOOCL, the request includes the following sections:

requestmessage = ApiKey apiversion Correlationid ClientId Requestmessage &NBSP;&NBSP; apikey = Int16 &NBSP;&NBSP; apiversion = int16 &NBSP;&NBSP; correlationid = Int32 &NBSP;&NBSP; clientid = string     requestmessage = Metadatarequest | Producerequest | Fetchrequest | Offsetrequest | Offsetcommitrequest | Offsetfetchrequest

And this method constructs the Apikey, Apiversion, Coorelationid and ClientID, as requested by the head, request Handler in the source code has the corresponding class Org.apache.kafka.common.requests.RequestHandler.

Apikey indicates the type of request, such as produce requests, fetch request, metadata ask, and so on.

Puclic RequestHandler Nextrequesthandler (ApiKey key, short version) constructs the requested header, using a specific version number.

public void Wakeup () If the client is in an IO blocking state, it wakes it up.

Summarize

Some details of Kafka protocol are embodied in the interface design of Kafka client. Also, there are some small details that are interesting.

Here's a look at Networkclient, which is the implementation of the Kafkaclient interface.

Org.apache.kafka.clients.KafkaClient

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.