Apache Kafka Source Analysis-producer Analysis---reproduced

Source: Internet
Author: User

Original address: http://www.aboutyun.com/thread-9938-1-1.html

Questions Guide
1.Kafka provides the producer class as the Java Producer API, which has several ways to send it?
2. What processes are included in the summary call Producer.send method?
3.Producer where is it difficult to understand?

analysis of the sending method of producer
Kafka provides the producer class as the Java Producer API, which has sync and async two modes of delivery.
Sync Frame composition


AsyncArchitecture diagram


The invocation process is as follows:


The code flow is as follows:
Producer: When new Producer (New Producerconfig ()), its underlying implementation, will actually produce two core classes of instances: Producer, Defaulteventhandler. At the same time as the creation, it will default to the new one producerpool, that is, each newJava producer class, there will be the creation of producer, EventHandler and Producerpool,producerpool for the connection of different Kafka broker pool, the number of initial connections is determined by the Broker.list parameter.
Call the Producer.send method flow:
When an application calls the Producer.send method, its interior is actually tuned to the Eventhandler.handle (message) method, and EventHandler serializes the message first.
Eventhandler.serialize (events)-->dispatchserializeddata ()-->partitionandcollate ()-->send ()-- Syncproducer.send ()
Invoke logical Interpretation: whenWhen a client application calls producer to send a message messages (either a single message or a list of multiple messages can be sent), Call Eventhandler.serialize to serialize all messages first, the user of the serialization operation can customize the implementation of the encoder interface, the next call partitionandcollate according to topics messages to group operations, messages assigned to the data Perbroker (multiple different broker's map), according to different broker calls different syncproducer.send bulk send message data, Syncproducer packaging the NIO network operation information.
Producer sync with async send message processing, you see aboveThe architecture diagram is at a glance.
The Partitionandcollate method works in detail: Gets the Leaderbrokerid of all partitions leader (that is, on which broker the Partiionid is located),
Create a HASHMAP>>>, messages assemble the data in accordance with Brokerid, and then prepare the Syncproducer to send messages separately.

Name Explanation: Partkey: partition keyword, when the client application implements the Partitioner interface, the incoming parameter key is the partition keyword, and the partition (partitions) index is returned according to Key and Numpartitions. Remember that the partitions partition index starts at 0.

Producer Smoothing expansion mechanism
If you have developed the producer client code, you will know the metadata.broker.list parameter, it means the Kafak broker's IP and port list, producer initialization, connect these brokers, then you will have questions , producer support Kafka cluster new broker node? It does not listen to the ZK broker node or get the broker information from ZK, the answer is yes, producer can support the smooth expansion broker, It is through regular communication with the existing metadata.broker.list, get the new broker information, and then put the new syncproducer into Producerpool. Wait for subsequent application calls.

initializes the instantiated Brokerpartitioninfo class in the Defaulteventhandler class,    Then periodically brokerpartitioninfo.updateinfo the method, Defaulteventhandler part of the code as follows: Def handle (Events:seq[keyedmessage[k,v]]) {...  while(remainingretries > 0 && outstandingproducerequests.size > 0) {Topicmetadatatorefresh++=Outstandingproducerequests.map (_.topic)if(topicmetadatarefreshinterval >= 0 &&Systemtime.milliseconds-Lasttopicmetadatarefreshtime >topicmetadatarefreshinterval) {Utils.swallowerror (Brokerpartitioninfo.updateinfo (Topicmetadatatorefresh.toset, correlationId.getAndIncrement) ) sendpartitionpertopiccache.clear () topicmetadatatorefresh.clear lasttopicmetadatarefreshtime=Systemtime.milliseconds} outstandingproducerequests=Dispatchserializeddata (outstandingproducerequests)if(Outstandingproducerequests.size > 0) {info ("Back off for%d Ms before retrying send. Remaining retries =%d ". Format (CONFIG.RETRYBACKOFFMS, remainingRetries-1))        //sleep time, how long it takes to refresh onceThread.Sleep (CONFIG.RETRYBACKOFFMS)//Producer periodically requests that the latest topics broker metadata information be refreshedUtils.swallowerror (Brokerpartitioninfo.updateinfo (Outstandingproducerequests.map (_.topic) ToSet,    correlationid.getandincrement)) ...} }  }

The Updateinfo method code for Brokerpartitioninfo is as follows:

def updateInfo (topics:set[string], correlationid:int) {var topicsmetadata:seq[topicmetadata]=Nil//According to Topics list, meta.broker.list, other configuration parameters, Correlationid indicates the number of requests, one counter parameter//Create a topicmetadatarequest and randomly select any one of the incoming broker information to fetch the metadata until it is taken .Val Topicmetadataresponse =clientutils.fetchtopicmetadata (topics, brokers, Producerconfig, Correlationid) Topicsmetadata=Topicmetadataresponse.topicsmetadata//throw partition Specific exceptionTopicsmetadata.foreach (TMD ={Trace (' Metadata for topic%s ' is%s '. Format (tmd.topic, TMD))if(Tmd.errorcode = =errormapping.noerror) {topicpartitioninfo.put (Tmd.topic, TMD)}ElseWarn ("Error while fetching metadata [%s] for topic [%s]:%s". Format (TMD, Tmd.topic, Errormapping.exceptionfor (Tmd.errorcode). GetClass)) Tmd.partitionsMetadata.foreach ( PMD={        if(Pmd.errorcode! = Errormapping.noerror && Pmd.errorcode = =Errormapping.leadernotavailablecode) {warn ("Error while fetching metadata%s for topic partition [%s,%d]: [%s]". Format (PMD, Tmd.topic, Pmd.partitionid, Errormapping.exceptionfor (Pmd.errorcode). GetClass))} //Any other error code (e.g. replicanotavailable) can is ignored since the producer does not need to access the replica and ISR metadata})} ) Producerpool.updateproducer (topicsmetadata)}

Clientutils.fetchtopicmetadata Method Code:

def fetchtopicmetadata (Topics:set[string], Brokers:seq[broker], Producerconfig:producerconfig, Correlationid:int): Topicmetadataresponse ={var Fetchmetadatasucceeded:boolean=falsevar i:int= 0Val topicmetadatarequest=Newtopicmetadatarequest (topicmetadatarequest.currentversion, Correlationid, Producerconfig.clientid, TOPICS.TOSEQ) var topicmetadataresponse:topicmetadataresponse=NULLvar t:throwable=NULLVal shuffledbrokers= Random.shuffle (brokers)//Generate random numbers     while(i Producerpool of Updateproducerdef updateproducer (Topicmetadata:seq[topicmetadata]) {val newbrokers=NewCollection.mutable.hashset[broker] Topicmetadata.foreach (TMD={Tmd.partitionsMetadata.foreach (PMD= {        if(pmd.leader.isDefined) newbrokers+=(Pmd.leader.get)})}) locksynchronized{Newbrokers.foreach (b= {        if(Syncproducers.contains (b.id)) {syncproducers (b.id). Close () syncproducers.put (b.ID, Producerpool.cre Atesyncproducer (config, b))}Elsesyncproducers.put (b.id, producerpool.createsyncproducer (config, b))}) }  }

When we start Kafka broker, and a large number of producer and consumer, the following exception information is often reported.

    1. [Email protected]:/opt/soft$ Closing socket connection to 192.168.11.166
Copy Code




The author is often a long time to see the source analysis, only to understand why Producerconfig configuration information does not require users to provide complete Kafka cluster broker information, but choose one or a few can. Because he will get all the latest broker information through the broker and topics information you choose.
It is worth knowing that the syncproducer used to send the topicmetadatarequest is built using the Producerpool.createsyncproducer method, but does not return Producerpool, but directly close .


difficult to understand:
Refreshing the metadata is not only done at the first initialization time. In order to be able to adapt to Kafka broker running for various reasons hanging off, paritition change and other changes,
EventHandler will refresh the metadata periodically, and the refresh interval is defined with the parameter topic.metadata.refresh.interval.ms, and the default value is 10 minutes.
Here are three points to emphasize:

The client calls send to create a new syncproducer, only the call to send will be periodically refreshed metadata at each fetch of metadata, Kafka will create a new syncproducer to fetch metadata, and then close after the logic is processed. The latest complete metadata, based on the current syncproducer (a broker's connection), Refreshes the connection to the broker in Producerpool. Every 10 minutes of refresh will re-rebuild the socket connection to each broker directly, meaning that the first request after that will have a hundreds of millisecond delay. If you do not want the delay, change the topic.metadata.refresh.interval.ms value to-1 so that it is refreshed only if the send fails. Kafka cluster If a partition is located in the broker is hung, you can check the error after restarting rejoin the cluster, manual rebalance,producer connection will be broken again until rebalance complete, Then the newly added broker will be in the connection that is taken after the refresh.


Description: Each Syncproducer instantiation object establishes a socket connection


Special Note:
After the Clientutils.fetchtopicmetadata call is complete, return to Brokerpartitioninfo.updateinfo to continue execution, at the end of which the pool will build all syncproducer based on the latest metadata obtained above, i.e. Socket channel Producerpool.updateproducer (topicsmetadata)

in Producerpool, The number of Syncproducer is controlled by the number of partition in the topic, that is, each syncproducer corresponds to a broker that internally marshals a socket connection to the broker. Each time the refresh, will already exist syncproducer to close off, that is, close the socket connection, and then create a new syncproducer, that is, the new socket connection, to overwrite the old.
If it does not exist, the new is created directly.

Apache Kafka Source Analysis-producer Analysis---reproduced

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.