"Original" KAKFA API package source code Analysis

Last Update:2015-05-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Since the package name is an API, it must be a common Kafka API.

First, Apiutils.scalaAs the name implies, there are some common API helper classes that define methods including: 1. Readshortstring: Reads the string length and string from a bytebuffer. The format of this bytebuffer should be: 2 bytes string length value n+n bytes of string 2. Writeshortstring: In contrast to readshortstring, write 2 bytes of length n and then write n bytes to Bytebuffer in 3. Shortstringlength: Gets the bytebuffer length that conforms to the format in the above method-that is, 2+n4. Readintinrange: Returns an integer value from the current position of the bytebuffer and determines whether it is within a given range. If you do not throw an exception directly, but actually call this method always pass in Int.maxvalue, so usually if satisfied. This integer value can represent the number of partitions, partition ID, number of replicas, number of ISR, or topic number 5. Readshortinrange: Similar to Readintinrange, just this method reads a 2-byte short, which is usually used as the error CODE6. Readlonginrange: Similar to the first two, except that it reads a long number, but this method does not seem to have been called. Second, Requestorresponse.scalaThere are a number of client requests (request) in Kafka that define a request object that abstracts the properties that all requests share: Ordinaryconsumerid: Represents a copy of follower Iddebuggingconsumerid: for debug use only Isvalidbrokerid: is a valid broker ID, must be non-negative The following also defines an abstract class, which is particularly important, Because all subsequent kinds of requests or responses inherit the class Requestorresponse class-that is, if the request or response class represents a request, the subclass must pass in a RequestID that represents the kind of request (the specific kind is defined in Requestkeys); , the subclass does not need to pass in any parameters to call the parameterless constructor directly. This versatile class defines 4 abstract methods: 1. sizeInBytes: Calculates the number of bytes in a request or response by 2. WriteTo: Writes the request or response to ByteBuffer3. HandleError: Error 4 used primarily for processing requests. Describe? For requests only, returns a description string for the request Third, Requestkeys.scalaDefines all the types of requests, including Producekey, Fetchkey, Offsetskey, and so on, each of which has a number. It also defines a map that associates the request type number with the function that reads the request or response, and two corresponding methods that return the name of the request type and the corresponding parse function. Iv. Genericrequestandheader.scalaAn abstract class that inherits the Requestorresponse class and naturally implements the 4 abstract method 1 defined by the Requestorresponse class. WriteTo: Write versions, Correlationid, ClientID, and Body2. Sizeinbytes:2 byte version number + 4 bytes correlation number + (2 + N) bytes of client id+body bytes 3. Tostring/describe: Two methods together form a description string for the request Wu, Genericresponseandheader.scalaThe response class corresponding to the Genericrequestandheader, the code is written in extends Requestorresponse (RequestID), because all response should be extends Requestorresponse (), so I cautiously suspect it was written wrong here, but RequestID is not used in this class anyway. Because the class inherits the Requestorresponse, it is natural to implement the 4 methods: Writeto,sizeinbytes, ToString and describe. This is not to be discussed here. Here's a combination of request and response, Liu, Producerrequest.scala/producerresponse.scalaBefore specifically speaking the corresponding request/response, talk about Kafka general request and Response structure: (Most of the following is from: https://cwiki.apache.org/confluence/ display/kafka/a+guide+to+the+kafka+protocol#aguidetothekafkaprotocol-producerequest) RequestOrResponse = size + (Requestmessage or responsemessage). Where size is a 32-bit integer that indicates the length of the subsequent request or response. Request format = ApiKey + apiversion + correlationid + ClientId + requestmessageAn integer of type--APIKEY:SHOTRT that identifies the type of request, such as metadata request, producer request, or Fectch request, as defined in Requestkeys.scala An integer of type--apiversion:short, mainly used for iterative upgrade, the current value is 0, for example, add some request fields, the version becomes 1 and so on. However, the current unification is a 0--correlationid:4 byte integer, the server side and the client Association reponse use--clientid: A user-defined name that can be used to log logs and monitor usage. such as monitoring the number of requests generated by different applications response format = Correlationid + responsemessageAs you can see, the response format is clearly more concise than the request. The meaning of the above 2 components is clear and no longer detailed. Okay, having said some common formats for request and response, let's start by saying some specific request and response Metadata APIKafka provides several types of APIs, where a class of APIs can query for some meta-data information, such as

What are the topic in the cluster?
How many partitions are there per topic?
Which broker is the current leader for each partition?
What are these broker's Host/port?

The API then sends the metadata request to the Kafka cluster and obtains the corresponding response from the cluster. It is important to note that requests sent by such APIs can be handled by any broker in the cluster, and the request of other APIs does not have the ability to do so! After a customer makes a request, Kafka does not always return all topic, and the customer can provide a topic list of topic of interest. Before we start looking at the metadata request code, we need to look at how the topic metadata is defined. Topicmetadata.scalaThe Scala file structure is very distinct, with two groups of associated objects, each defining the topic level of metadata and partition level metadata. First look at the partition level of metadata. constructor FunctionPartitionid--partition leader--the partition leader broker, may be empty replicas--all copies of the partition collection isr-- The partition ISR set errorcode--error code, which is initially noerror class Methodsizeinbytes--The total number of bytes of metadata information, including the 2-byte error code + 4-byte partition number + 4-byte leader number + (4+ replica collection size n1* 4) + (4 + ISR collection size n2*4). The first 4 in each pair of parentheses represents the collection-length byte, where the value is N1 or n2writeto--written bytebuffer in the order of the error code, partition number, Leaderid, replica collection size, all IDs of the replica collection, ISR size, and all IDs of the ISR collection. It is particularly important to note that if there is no leader, write directly to 1, indicating that leader node does not exist tostring--the above information into a string output formatbroker--a string that is spelled like this: Broker ID + (broker Host:brok Er:port) and Partitionmetadata object provides only a Readfrom method that reads various information in WriteTo order from a bytebuffer into a partitionmetadata instance return. Defining the metadata of the partition Metadata,topic is simpler and contains only topic information, a set of Partitionmetadata, and a errorcode. Its methods include: sizeinbytes--Total bytes = 2 + size (topic) + 4 + all Partitionmetadata bytes in the collection writeto--by error code, topic, partition number, part Itionmetadata the sequential write of the collection element bytebuffertostring--constructs a string for printing output Topicmetadata object also defines only one Readfrom method, Reads to a bytebuffer according to the method written by WriteTo.OK, now it's time to say producerrequest/producerresponse. The code structure of Topicmetadatarequest is similar to the other request, including the WriteTo, sizeInBytes, ToString, and describe methods, and the Readfrom method corresponding to the object definition. This class allows the user to provide a topic collection to obtain the metadata for the specified topic. If an error occurs, the HandleError method creates a set of error response, which is then returned to the customer through the Requestchannel Sendresponse method. Topicmetadataresponse.scalaResponse is much simpler than request, the class defines only the sizeInBytes and WriteTo methods, and object defines the Readfrom method to read response information Seven, Updatemetadatarequest.scala/updatemetadataresponse.scalaSince it is possible to query metadata, there is a natural need to update the metadata API. The request, in addition to the public information, also contains a mapping of the partition State of controller Id,controller_epoch,topic+partition and the currently available broker. This extra information is added to the natural writeto and Readfrom. Eight, Consumermetadatarequest.scala/consumermetaresponse.scalaConsumer meta-data request/response, mostly public request fields, but one need to be aware of IS group, which represents consumer group. In addition, if the host of the broker returned by response is empty, port is-1 means that this is a fake broker--actually means no broker. Nine, Controlledshutdownrequest.scala/controlledshutdownresponse.scalaClose a broker's request and response 10, Heartbeatrequestandheader.scala/heartbeatresponseandheader.scalaThe name looks like a request and response to keep the heartbeat, but it seems that the code is not used 11, Joingrouprequestandheader.scala/joingroupresponseandheader.scalaIt doesn't seem to work in code. 12, Leaderandisrrequest.scala/leaderandisrresponse.scalaThe Leaderandisrrequest.scala defines three groups of associated objects: LEADERANDISR, Partitionstateinfo, and Leaderandisrrequest. LEADERANDISR defines a leader and leader epoch, a set of ISR sets, and a corresponding zookeeper version, which looks like a change to leader or ISR adds a Leader_epoch value.Besides, partitionstateinfo the associated object, which contains the AR collection of the partition and a Leaderisrandcontrollerepoch instance, which simply means preserving the leader, ISR, and Controller_ Epoch Information-the latter will add 1 when the controller state changes. Partitionstateinfo, as part of the leaderandisrrequest, also maintains a code structure similar to Request/response, which provides WriteTo, Readfrom, and sizeInBytes methods. Where the Write/read order is Controller_epoch, leader, Leader_epoch, ISR size, ISR set, zkversion, AR size, AR set. Finally, leaderandisrrequest associated object, in addition to the common request information it includes Controllerid, Controller_ Epoch and a set of leader and a set of partitionstateinfo information. And Leaderandisrresponse returns a map, key is Topic+partition,value is the corresponding error code
13, Stopreplicarequest.scala/stopreplicaresponse.scalaRequest and response to close a copy of a set of partitions. Additional Controllerid, Controller_epoch, and replica partitions and a bool value indicate whether to delete these partitions when the request is submitted. 14, Producerrequest.scala/producerresponse.scalaStarting from this group of Request,response is a more important Kafka request. The client uses the Producer API to submit a send request to send a message collection to the server. Kafka allows messages belonging to multiple topic partitions to be sent at once. The Producer request format is as follows: VersionId (2Byte) + Correlationid (4Byte) + clientId (2Byte + size (clientId)) + requiredacks (2Byte) + A Cktimeoutms (4Byte) + Topics (partitions + messagesetsize + messageset) Several fields focus on the number of times the requiredacks--server needs to receive answers before responding to requests. If it is 0, the server will not respond to the request, and if it is 1, the server will wait for the data to be written to the local log and then send response, if 1, then wait until all the replicas in the ISR have committed before sending response. This value is controlled by the property request.required.acks the maximum time-out for acktimeoutms--waiting to be answered, specified by the property request.timeout.ms, and is 10 seconds by default. This is an approximate value because many elements are not included in this timeout interval. For example, it does not include the latency of the network, nor does it calculate the wait time for the request in the queue. If you want to accurately calculate the time of these parts, or the timeout of using the socket is better than the response class corresponding to Producerrequest is producerresponse--its format is as follows: Correlationid + topic Count + [ Topic + Partition Count + [PartitionID + ErrorCode + nextoffset]*]* each partition has its own errorCode, Nextoffset represents the offset of the first message in the message collection 15, Fetchrequest.scala/fetchresponse.scalaThe fetch API is used to get one or more messages for some topic partitions, requiring only the customer segment code to specify the topic, partition, and starting displacement to start getting. Generally, the displacement of the returned message is generally not less than the given starting displacement. However, if the message is compressed, it may be less than the initial displacement. This message is not too much, because the caller of the fetch API needs to filter out the messages themselves. The fetchrequest format is as follows: VersionId + Correlationid + clientId + replicaId + maxwait + minbytes + topic count + [topic + partition Count + [PartitionID + offset + fetchsize]]*replicaid--The ID of the node that originated the requested replica, which is usually used when you always set it to -1maxwait/minbytes--if Maxwait is set to 100ms, Minbytes is 64KB, Kafka server waits for 100ms to collect 64KB-sized response data. Fetchresponse is somewhat special compared to other response, it defines two classes for different sub-sections: Topicdata and Partitiondata, so the format is as follows: Corrleationid + topic number + [topic + [PartitionID errorCode highwateroffset messagesetsize messageset]*]* It's worth mentioning that since this response could return a lot of data, So Kafka used the sendfile mechanism when building the reponse (to Java. NiO pack FileChannel to do) 16, Offsetrequest.scala/offsetresponse.scalaThis API is primarily used to obtain a valid offset range for a set of topic partitions, and as with the produce and fetch APIs, the offset request must also be sent to the leader broker processing of the partition--of course you can use metadata API to get the leader broker ID. Offsetresponse returns the starting offset for each log segment of the requested partition and the end of the log shift (that is, the next message is appended to the partition)There are two more important concepts in Offsetrequest: Latesttime and EarliestTime, 1 and-2, respectively. They are also closely related to attribute auto.offset.reset and correspond to largest and smallest respectively. Where largest indicates that the displacement is automatically retried to the maximum displacement, and smallest indicates that the displacement is automatically reset to the minimum displacement. The offsetrequest format is as follows:versionId + correlationid + clientId + replicaId + topic count + [topic + partition Count + [PartitionID + partition T IME + maxnumoffsets]*]*Kafka specifically created a Partitionoffsetrequestinfo class to hold partition Time + Maxnumoffset. Specifying partition time (in MS) indicates that you want to request all messages that precede that point, for example, you can specify that Offsetrequest.latesttime represents the request for all current messages.Offsetresponse is relatively simple and has the following format:Correlationid + topic count + [topic + partition Count + [partition Id + error Code + offset array length + each offset]*]*The offset array returned is the starting offset for each log segment under a partition. 17, Offsetcommitrequest.scala/offsetcommitresponse.scalaThis request and the following Offsetfetchrequest API are for centralized management of displacements. The offsetcommitrequest format is as follows: VersionId + Correlationid + clientId + consumer group Id + consumer Generationid (0.8.2 added later) + C Onsumer ID (0.8.2 added later) + topic count + [topic + partition Count + [PartitionID + offset + timestamp (0.8.2 after new plus) + metadata] *] * where offset,timestamp and metadata are the information provided by the Offsetandmetadata class. Offsetcommitresponse is going to be much simpler, it has only one version of the format: Correlationid + topic Count + [topic + partition Count + [PartitionID + errorcode]* ]* 18, Offsetfetchrequest.scala/offsetfetchresponse.scalaAs the name implies, get the request for offset information in the following format: VersionId + correlationid + clientId + consumer group ID + topic count + [topic + partition Count + [PartitionID]*]* Although there is only one format, However, it is important to note that 0.8.2 previously read offset from zookeeper and read the Offsetoffsetfetchresponse format from Kafka after 0.8.2: Correlationid + topic Count + [ Topic + Partition Count + [PartitionID + offset + medata + errorcode]*] * Where offset and metadata are saved in the Offsetandmetadata instance.

"Original" KAKFA API package source code Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Original" KAKFA API package source code Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Original" KAKFA API package source code Analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support