Kafkaspout Analysis: Configuring

Source: Internet
Author: User

 Public kafkaspout (Spoutconfig spoutconf) {        = spoutconf;
}

Spoutconfig inherits from Kafkaconfig. Since Spoutconfig and kafkaconfig all of the instance fields are public, you can set the values for each field directly after using the constructor method.

 Public classSpoutconfigextendsKafkaconfigImplementsSerializable { PublicList<string> zkservers =NULL;//record the host of the zookeeper used for spout Read Progress     PublicInteger Zkport =NULL;//Zookeeper port for recording progress     PublicString Zkroot =NULL;//Progress information is recorded under which path in zookeeper     PublicString ID =NULL;//The ID of the progress record, want a new spout to read the previous record, should set its ID to be the same as before.      Public LongStateupdateintervalms = 2000;//used for metrics, how often the status is updated.      PublicSpoutconfig (brokerhosts hosts, string topic, string zkroot, string id) {Super(hosts, topic);  This. Zkroot =Zkroot;  This. ID =ID; }}
 Public classKafkaconfigImplementsSerializable { Public FinalBrokerhosts hosts;//information to obtain Kafka broker and partition     Public FinalString topic;//which topic to read the message from     Public FinalString clientId;//the client ID used by the Simpleconsumer     Public intFetchsizebytes = 1024 * 1024;//For each fetchrequest sent to Kafka, use this to specify the size of the total message in the desired response     Public intSOCKETTIMEOUTMS = 10000;//The socket timeout for the connection to Kafka broker     Public intFetchmaxwait = 10000;//when the server does not have new messages, the consumer waits for these times     Public intBuffersizebytes = 1024 * 1024;//The read buffer size of the socketchannel used by the Simpleconsumer     PublicMultischeme scheme =NewRawmultischeme ();//byte[] Removed from Kafka, how to deserialize     Public BooleanForcefromstart =false;//whether to force start reading from offset of Kafka     Public LongStartoffsettime = Kafka.api.OffsetRequest.EarliestTime ();//when the offset time begins to read, the default is the oldest offset     Public LongMaxoffsetbehind = 100000;//Kafkaspout The progress of reading and the progress of the target, the difference is too much, spout will discard the middle message     Public BooleanUsestartoffsettimeifoffsetoutofrange =true;//if the requested offset corresponds to a message that does not exist in Kafka, whether to use Startoffsettime     Public intMetricstimebucketsizeinsecs = 60;//how long does it count once metrics     PublicKafkaconfig (brokerhosts hosts, String topic) { This(Hosts, topic, kafka.api.OffsetRequest.DefaultClientId ()); }     PublicKafkaconfig (brokerhosts hosts, string topic, String clientId) { This. hosts =hosts;  This. Topic =topic;  This. ClientId =clientId; }}
The use of zookeeper

There are two places in the Kafkaspout configuration that can be used zookeeper

    1. Use zookeeper to record the processing progress of the kafkaspout, after the topology is resubmitted or the task restarts before proceeding. In Spoutconfig zkservers, Zkport and zkroot are associated with this. If Zkserver and Zkport are not set, then kafkaspout logs the information using the zookeeper used by the storm cluster.
    2. Use zookeeper to get all partition of a topic in Kafka, and partition for each leader. This requires implementing the subclass zkhosts of the brokerhosts. However, this zookeepr is optional. If you use another subclass of Brokerhosts statichosts to hard-code the correspondence between partition and leader, you do not need zookeeper to provide this functionality. Kafkaspout extracts the correspondence between partition and leader from the zookeeper used by the Kafka cluster. And:
      • If you use Statishosts, then Kafkaspout will use Staticcoordinator, which coordinator cannot respond to changes in partition leader.
      • If you use Zkhosts, then Kafkaspout uses Zkcoordinator, and when its refresh () method is called, the Cooridnator checks for leader changes that occurred, and generate a new partitionmanager for it. This allows you to continue reading the message after the leader change.
Configuration items that affect the initial read progress

After a topology is online, which offset does it start reading messages from? There are some configuration items that have an effect on this:

    The ID field in the
    1. spoutconfig. If you want a topology to continue processing before the process progresses from another topology, they need to have the same ID. The Forcefromstart field of the
    2. kafkaconfig. If this field is set to true, it ignores the progress of the previous topology of the same ID and starts processing from the oldest message in the Kafka after it is topology on-line. The Startoffsettime field of the
    3. kafkaconfig. The default is Kafka.api.OffsetRequest.EarliestTime () to begin reading, starting with the oldest message in Kafka. It can also be set to Kafka.api.OffsetRequest.LatestOffset, which is the earliest message to begin reading. You can also specify a specific value yourself. The Maxoffsetbehind field of the
    4. kafkaconfig. This field has an impact on multiple processing processes for kafkaspout. When a new topology is submitted, if there is no Forcefromstart, when Kafkaspout's processing progress to a partition is backward startoffsettime the corresponding offset is more than this value, Kafkaspout will discard the middle message, This forces the progress of the target to catch up. For example, if Startoffsettime is set to Lastesttime, then if the progress is more than Maxoffsetbehind,kafkaspout will be directly from the latesttime corresponding offset start processing. If it is set to Frocefromstart, it will always start reading from EarliestTime when a new task is submitted. The Userstartoffsettimeifoffsetoutofrange field of the
    5. kafkaspout. If set to True, an error occurs when a fetch message occurs, and Fetchresponse displays an error because of Offset_out_of_range, then an attempt is made to start reading from the message kafkaspout the specified startoffsettime. For example, if there is a batch of messages that have been Kafka deleted because they exceeded the retention period, and the messages recorded in ZK are in this batch of deleted messages. If Kafkaspout attempts to continue reading from ZK's record, then a offset_out_of_range error will occur, triggering this configuration.

In fact, Maxoffsetbehind sometimes a bit of a misnomer. When Startoffsettime is a, and the progress in ZK is B, a > Maxoffsetbehind, it should be better to start reading from A-maxoffsetbehind than to skip to Startoffsettime. The logic here is described in the Partitionmanager implementation.

Attached: The meaning of the maxwait of Kafkaconfig, see this article "The Purgatory of Kafka"

Kafka's Refining

Kafkaspout Analysis: Configuring

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.