Storm integrated Kafka

Source: Internet
Author: User

Kafkautil:

Importjava.util.Properties;ImportKafka.javaapi.producer.Producer;ImportKafka.producer.ProducerConfig;ImportOrg.springframework.beans.factory.annotation.Value; Public classkafkautil {@Value ("#{sys[' Connect ']}")    Private StaticString Zkconnect; @Value ("#{sys[' metadata.broker.list '}")    Private StaticString brokerlist; @Value ("#{sys[' Request.required.acks '}")    Private StaticString ack; Private Staticproducer<string, string> Producer =NULL; /*static{Properties p = propertiesutil.getproperties ("kafka.properties");        Zkconnect = (String) p.get ("Zk.connect");        Brokerlist = (String) p.get ("Metadata.broker.list");        ACK = (String) p.get ("Request.required.acks");    TOPIC = (String) p.get ("Topic.imeidata"); }    */     Public StaticProducer<string,string>Getproducer () {if(Producer = =NULL) {Properties P= Propertiesutil.getproperties ("Kafka.properties"); Zkconnect= (String) p.get ("Zk.connect"); Brokerlist= (String) p.get ("Metadata.broker.list"); Ack= (String) p.get ("Request.required.acks"); Properties Props=NewProperties (); Props.put ("Zk.connect", Zkconnect); Props.put ("Metadata.broker.list", brokerlist); Props.put ("Serializer.class", "Kafka.serializer.StringEncoder"); Props.put ("Request.required.acks", ACK); Props.put ("Producer.type", "async");//Sync Sync : Synchronous Async: AsyncProps.put ("Partitioner.class", "Com.kafka.SendPartitioner");//Partitioning algorithm classes sent to multiple partitions for distributed storageProps.put ("Request.timeout.ms", "50000"); Props.put ("Queue.buffering.max.ms", "10000");//The default value is 5000 in asynchronous mode, where buffered messages are submitted once every time intervalProps.put ("Batch.num.messages", "1000");//default 200 in asynchronous mode, the number of batches for a single commit message,//However, if the interval exceeds the value of queue.buffering.max.ms, a commit will be made regardless of whether the set value for the bulk commit is reachedProducerconfig config =Newproducerconfig (props); Producer=NewProducer<string, string>(config); }        returnproducer; }   }
View Code
Kafka the properties of the message-sending class:
1:ZK.CONNECT:ZK Service-Side connection address

2:METADATA.BROKER.LIST:ZK Client Address
3:serializer.class:kafka Message Send serialization format

4:request.required.acks: Confirmation of message consumption mechanism it has three options: 1,0,-1
0, meaning that producer never waits for an ACK from broker, which is the behavior of version 0.7. This option provides the lowest latency, but the guarantee of persistence is the weakest, and some data is lost when the server hangs up. After testing, approximately hundreds of messages are lost per 10K message.

1 means that producer will get an ACK after leader replica has received the data. This option provides better durability because the client will not return until the server confirms that the request has been successfully processed. If you have just written on the leader, and have not had time to copy leader, then the message may be lost.

-1, which means that after all the ISR has received the data, producer only get an ACK. This option provides the best durability, so long as there is one replica surviving, the data is not lost. After testing, the 100W message did not lose a message.

5:request.timeout.ms: Request timed out
Sync it has two options sync: Synchronous   Async: Asynchronous  synchronous mode, each time a message is sent back in asynchronous mode, you can select an asynchronous parameter.
7:queue.buffering.max.ms: Default value, in the  asynchronous mode, the buffered message is submitted once every time interval
8:batch.num.messages: The default value of the number of batches for a  bulk commit message in asynchronous mode, but if the interval time exceeds the value of queue.buffering.max.ms, regardless of whether the set value for the bulk commit is reached, will be committed once
9:partitioner.class: Custom Partitioning algorithm
In a Kafka cluster, each node is called a broker, so into ZK through the/LS command to see the root directory has a brokers directory (Kafka default installation profile is placed in the ZK root directory, I prefer into the custom directory), This saves the current Kafka cluster in the running node name:

The best cluster effect can be achieved only if all messages are sent to each broker on a maximum average. So what does Kafka do to ensure this?
The Kafka message Class Keyedmessae has a method in which the parameters are the queue that will send the message, and the message key,value. By calculating the number of brokers for the hash value of key, the broker value will be obtained, which is the node that will receive the message.
You can customize the partition implementation class and specify in the properties:

ImportKafka.producer.Partitioner;Importkafka.utils.VerifiableProperties; Public classSendpartitionerImplementspartitioner{ PublicSendpartitioner (Verifiableproperties verifiableproperties) {} @Override Public intPartition (Object key,intnumpartitions) {        Try {            returnMath.Abs (Key.hashcode ()%numpartitions); } Catch(Exception e) {returnMath.Abs (Key.hashcode ()%numpartitions); }    }}
View Code
Numpartitions refers to the number of Kafka cluster nodes, which can be obtained by ZK in real time without explicit designation.


Most of the above properties can be specified through the Kafka installation configuration file. But a Kafka cluster may not serve a queue or a project. Different project specific business needs are different, so it is best to set specific parameters in each project.


Storm:

Storm and Kafka are integrated with a third-party framework called Storm-kafka.jar. In short, it actually does only one thing. Is that storm's spout has been written, and we just need to write bolts and commit topology to make storm.

It helps us to achieve the Kafka consumer side is relatively difficult to grasp one thing, is the offset of the problem. If you do not want to read the Kafka message every time you start, try to avoid repeated consumption of messages, it is necessary to ensure a good offset mechanism. Especially in the case of multiple user groups and queues.

Code:

ImportCom.util.PropertiesUtil;Importstorm.kafka.KafkaSpout;ImportStorm.kafka.SpoutConfig;ImportStorm.kafka.StringScheme;Importstorm.kafka.ZkHosts;ImportBacktype.storm.Config;ImportBacktype.storm.LocalCluster;ImportBacktype.storm.StormSubmitter;ImportBacktype.storm.spout.SchemeAsMultiScheme;ImportBacktype.storm.topology.TopologyBuilder;  Public classtopology {Private StaticString topic; Private StaticString Brokerzkstr; Private StaticString Brokerzkpath; Private StaticString offset; Private StaticString app; Static{Properties P= Propertiesutil.getproperties ("Storm.properties"); Brokerzkstr= (String) p.get ("Brokerzkstr"); Brokerzkpath= (String) p.get ("Brokerzkpath"); Topic= (String) p.get ("Kafka.topic"); Offset= (String) p.get ("Kafka.offset"); App= (String) p.get ("Kafka.app"); }             Public Static voidMain (string[] args)throwsinterruptedexception {zkhosts ZK=Newzkhosts (Brokerzkstr,brokerzkpath); Spoutconfig spoutconf=Newspoutconfig (ZK, topic, offset,//the root directory of offset offsetsAPP);//corresponds to an applicationList<String> zkservices =NewArraylist<>();  for(String str:zk.brokerZkStr.split (",") {zkservices.add (Str.split (":") [0]); } spoutconf.zkservers=zkservices; Spoutconf.zkport= 2181; Spoutconf.forcefromstart=false;//true: false from scratch: consumption from offsetSPOUTCONF.SOCKETTIMEOUTMS = 60 * 1000; Spoutconf.scheme=NewSchemeasmultischeme (Newstringscheme ()); Topologybuilder Builder=NewTopologybuilder (); Builder.setspout ("Spout",NewKafkaspout (spoutconf), 4); //builder.setspout ("spout", New Testspout (), 5);Builder.setbolt ("Bolt1",NewGetmsgbolt (), 4). shufflegrouping ("spout"); Config config=NewConfig (); Config.setdebug (false); Config.setnumworkers (4); if(args.length>0){            Try{stormsubmitter.submittopology (args[0], config, builder.createtopology ()); } Catch(Exception e) {e.printstacktrace (); }        }Else{localcluster cluster=NewLocalcluster (); Cluster.submittopology ("Mytopology", config, builder.createtopology ()); }    }  }
View Code

Property:
1:brokerzkpath Kafka Cluster in the ZK root directory, the default is brokers
2:kafka.offset Kafka Message Queue offset record in ZK location
3:kafka.app is actually Kafka.offset subdirectory, the parent directory specifies the total location of the Kafka cluster message offset, which is the offset of each queue or application message, avoiding the case of an offset disorder in multi-user group multiple queues.


Storm integrated Kafka

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.