Now start to introduce the Kafkaspout source code.
Start by doing some initialization in the early open method,
........................ _state = new Zkstate (stateconf); _connections = new Dynamicpartitionconnections (_spoutconfig, Kafkautils.makebrokerreader (conf, _spoutConfig)); Using transactionalstate like this is a hack int totaltasks = Context.getcomponenttasks ( Context.getthiscomponentid ()). Size (); if (_spoutconfig.hosts instanceof statichosts) { _coordinator = new Staticcoordinator (_connections, conf, _ Spoutconfig, _state, Context.getthistaskindex (), Totaltasks, _uuid); } else { _coordinator = new Zkcoordinator (_connections, Conf, _spoutconfig, _state, Context.getthistaskindex (), Totaltasks, _uuid); } ............
Some code is omitted before and after, about metric this series is not introduced for the time being. The main is to initialize the zookeeper connection zkstate, Kafka Partition with the broker relationship (initialize dynamicpartitionconnections), inDynamicpartitionconnections constructor needs to pass in a brokerreader, we are zkhosts, see Kafkautils code to know the use of zkbrokerreader, look at the code
Public Zkbrokerreader (MAP conf, String topic, zkhosts hosts) {try {reader = new Dynamicbrokersreader (conf, hosts.brokerzks TR, Hosts.brokerzkpath, topic); cachedbrokers = Reader.getbrokerinfo (); lastrefreshtimems = System.currenttimemillis () ; refreshmillis = Hosts.refreshfreqsecs * 1000L;} catch (Java.net.SocketTimeoutException e) {Log.warn ("Failed to update Brokers", E);}}
There is a refreshmillis parameter that is timed to update the partition information in ZK,
Zkbrokerreader@overridepublic globalpartitioninformation getcurrentbrokers () {Long currtime = System.currenttimemillis (); if (Currtime > Lastrefreshtimems + refreshmillis) {//The current time is greater than the difference from the last update time is greater than refreshmillistry {Log.info ("Brokers need refreshing because" + Refreshmillis + "Ms has expired"); cachedbrokers = Reader.getbrokerinfo (); l Astrefreshtimems = Currtime;} catch (Java.net.SocketTimeoutException e) {Log.warn ("Failed to update Brokers", E);}} return cachedbrokers;} Here is the code that calls Dynamicbrokersreader/** * Get all partitions with their current leaders */public Globalpartitionin Formation Getbrokerinfo () throws sockettimeoutexception {globalpartitioninformation globalpartitioninformation = new Globalpartitioninformation (); try {int numpartitionsfortopic = getnumpartitions (); String Brokerinfopath = Brokerpath (); for (int partition = 0; partition < Numpartitionsfortopic; partition++) {int leader = Getleaderfor (PArtition); String Path = Brokerinfopath + "/" + leader; try {byte[] Brokerdata = _curator.getdata (). Forpath (path); Broker hp = Getbrokerhost (brokerdata); Globalpartitioninformation.addpartition (partition, HP); } catch (Org.apache.zookeeper.KeeperException.NoNodeException e) {log.error ("Node {} does not exist", path); }}} catch (Sockettimeoutexception e) {throw e; } catch (Exception e) {throw new RuntimeException (e); } log.info ("Read partition info from zookeeper:" + globalpartitioninformation); return globalpartitioninformation; }
Globalpartitioninformation is a iterator class that stores the correspondence between the paritition and the broker,The relationship between Kafka Consumer and Parittion is maintained in Dynamicpartitionconnections, and what consumer information is read for each paritition, This connectioninfo information will be initialized and updated in Storm.kafka.ZkCoordinator, and one thing to mention is that kafkaspout contains a simpleconsumer
Storm.kafka.DynamicPartitionConnectionsstatic class ConnectionInfo { simpleconsumer consumer; set<integer> partitions = new HashSet (); Public ConnectionInfo (Simpleconsumer consumer) { this.consumer = consumer; } }
Look at the Zkcoordinator class again, look at its constructor
Storm.kafka.ZkCoordinatorpublic Zkcoordinator (dynamicpartitionconnections connections, Map stormconf, Spoutconfig Spoutconfig, zkstate State, int taskindex, int totaltasks, String Topologyinstanceid, Dynamicbrokersreader reader) { _spoutconfig = Spoutconfig; _connections = connections; _taskindex = Taskindex; _totaltasks = Totaltasks; _topologyinstanceid = Topologyinstanceid; _stormconf = stormconf; _state = State; Zkhosts brokerconf = (zkhosts) spoutconfig.hosts; _REFRESHFREQMS = Brokerconf.refreshfreqsecs *; _reader = reader; }
_REFRESHFREQMS is a regular update of ZK partition to local operations, in Kafkaspout Nexttuple method will be called every timethe Getmymanagedpartitions method of Zkcoordinator. This method updates the partition information periodically according to the _REFRESHFREQMS parameter .
Storm.kafka.zkcoordinator@override public list<partitionmanager> getmymanagedpartitions () {if (_lastRef Reshtime = = NULL | | (System.currenttimemillis ()-_lastrefreshtime) > _refreshfreqms) {refresh (); _lastrefreshtime = System.currenttimemillis (); } return _cachedlist; } @Override public void Refresh () {try {log.info (taskId (_taskindex, _totaltasks) + "refreshing p Artition Manager Connections "); Globalpartitioninformation brokerinfo = _reader.getbrokerinfo (); List<partition> mine = Kafkautils.calculatepartitionsfortask (Brokerinfo, _totaltasks, _taskIndex); Set<partition> Curr = _managers.keyset (); set<partition> newpartitions = new hashset<partition> (mine); Newpartitions.removeall (Curr); set<partition> deletedpartitions = new hashset<partition> (Curr); Deletedpartitions.removeall (mine); Log.info (TaskId (_taskindex, _totaltasks) + "Deleted partition managers:" + deletedpartitions.tostring ()); for (Partition id:deletedpartitions) {Partitionmanager mans = _managers.remove (ID); Man.close (); } log.info (TaskId (_taskindex, _totaltasks) + "NEW partition managers:" + newpartitions.tostring ()); for (Partition id:newpartitions) {Partitionmanager man = new Partitionmanager (_connections, _topologyi Nstanceid, _state, _stormconf, _spoutconfig, id); _managers.put (ID, man); }} catch (Exception e) {throw new RuntimeException (e); } _cachedlist = new arraylist<partitionmanager> (_managers.values ()); Log.info (TaskId (_taskindex, _totaltasks) + "finished refreshing"); }
Each of the consumer allocation partition algorithm is Kafkautils.calculatepartitionsfortask (Brokerinfo, _totaltasks, _taskindex);
The main task is to get the parallel task number, compared with the current partition, to obtain a consumer to be responsible for which parititons read, the specific algorithm to Kafka document it
The above kafkaspout in the initialization of the operation, the following began to take data transmission data, to see the Nexttuple method
Storm.kafka.kafkaspout@override public void Nexttuple () {list<partitionmanager> managers = _coordinato R.getmymanagedpartitions (); for (int i = 0; i < managers.size (); i++) {try {//) the number of managers decreased _currpartitionindex = _currpartitionindex% managers.size (); Emitstate state = Managers.get (_currpartitionindex). Next (_collector); if (state = emitstate.emitted_more_left) {_currpartitionindex = (_currpartitionindex + 1)% managers. Size (); } if (state! = emitstate.no_emitted) {break; }} catch (Failedfetchexception e) {Log.warn ("Fetch failed", e); _coordinator.refresh (); }} Long now = System.currenttimemillis (); if (now-_lastupdatems) > _spoutconfig.stateupdateintervalms) {commit (); } }
After reading the above code, all operations are performed in Partitionmanager,The message message is read in Partitionmanager and then emitted, and the main logic is in the next method of Partitionmanager
Returns false if it ' s reached the end of current batch public emitstate next (spoutoutputcollector collector) { if (_waitingtoemit.isempty ()) {fill (); } while (true) {Messageandrealoffset toemit = _waitingtoemit.pollfirst (); if (Toemit = = null) {return emitstate.no_emitted; } iterable<list<object>> tups = Kafkautils.generatetuples (_spoutconfig, toEmit.msg); if (tups! = null) {for (list<object> tup:tups) {collector.emit (tup, New Kafkam Essageid (_partition, Toemit.offset)); } break; } else {ack (toemit.offset); }} if (!_waitingtoemit.isempty ()) {return emitstate.emitted_more_left; } else {return emitstate.emitted_end; } }
If the _waitingtoemit list is empty, then go to read MSG, and then launch each, each fired one, break a bit, return Emit_more_left to Kafkaspout Nexttuple method,, It then determines if the paritition read message buffer size has been fired, and if the next partition data is read and emitted after the launch,
Note that it is not the time to send all of the partition's MSG to launch and commit offset to ZK, but to launch a line to determine whether the commit time (set at the start of the timed commit interval), I think the reason for this is to control fail.
Kafkaspout in the Ack,fail,commit operations are all given to the Partitionmanager to do, see the code
@Override public void Ack (Object msgId) {Kafkamessageid id = (Kafkamessageid) msgId; Partitionmanager m = _coordinator.getmanager (id.partition); if (M! = null) {m.ack (id.offset); }} @Override public void fail (Object msgId) {Kafkamessageid id = (Kafkamessageid) msgId; Partitionmanager m = _coordinator.getmanager (id.partition); if (M! = null) {M.fail (id.offset); }} @Override public void deactivate () {commit (); } @Override public void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (_spoutconfig.sc Heme.getoutputfields ()); } private void Commit () {_lastupdatems = System.currenttimemillis (); For (Partitionmanager Manager: _coordinator.getmymanagedpartitions ()) {manager.commit (); } }
So Partitionmanager is the core of kafkaspout, very late, are more than 3 o'clock, follow-up will not partitionmanager analysis, good night
(v) Storm-kafka source of Kafkaspout