(v) Storm-kafka source of Kafkaspout

Source: Internet
Author: User

Now start to introduce the Kafkaspout source code.

Start by doing some initialization in the early open method,

........................        _state = new Zkstate (stateconf);        _connections = new Dynamicpartitionconnections (_spoutconfig, Kafkautils.makebrokerreader (conf, _spoutConfig));        Using transactionalstate like this is a hack        int totaltasks = Context.getcomponenttasks ( Context.getthiscomponentid ()). Size ();        if (_spoutconfig.hosts instanceof statichosts) {            _coordinator = new Staticcoordinator (_connections, conf, _ Spoutconfig, _state, Context.getthistaskindex (), Totaltasks, _uuid);        } else {            _coordinator = new Zkcoordinator (_connections, Conf, _spoutconfig, _state, Context.getthistaskindex (), Totaltasks, _uuid);        } ............

Some code is omitted before and after, about metric this series is not introduced for the time being. The main is to initialize the zookeeper connection zkstate, Kafka Partition with the broker relationship (initialize dynamicpartitionconnections), inDynamicpartitionconnections constructor needs to pass in a brokerreader, we are zkhosts, see Kafkautils code to know the use of zkbrokerreader, look at the code

Public Zkbrokerreader (MAP conf, String topic, zkhosts hosts) {try {reader = new Dynamicbrokersreader (conf, hosts.brokerzks TR, Hosts.brokerzkpath, topic); cachedbrokers = Reader.getbrokerinfo (); lastrefreshtimems = System.currenttimemillis () ; refreshmillis = Hosts.refreshfreqsecs * 1000L;} catch (Java.net.SocketTimeoutException e) {Log.warn ("Failed to update Brokers", E);}}

There is a refreshmillis parameter that is timed to update the partition information in ZK,

Zkbrokerreader@overridepublic globalpartitioninformation getcurrentbrokers () {Long currtime = System.currenttimemillis (); if (Currtime > Lastrefreshtimems + refreshmillis) {//The current time is greater than the difference from the last update time is greater than refreshmillistry {Log.info ("Brokers need refreshing because" + Refreshmillis + "Ms has expired"); cachedbrokers = Reader.getbrokerinfo (); l Astrefreshtimems = Currtime;} catch (Java.net.SocketTimeoutException e) {Log.warn ("Failed to update Brokers", E);}} return cachedbrokers;} Here is the code that calls Dynamicbrokersreader/** * Get all partitions with their current leaders */public Globalpartitionin  Formation Getbrokerinfo () throws sockettimeoutexception {globalpartitioninformation globalpartitioninformation = new        Globalpartitioninformation ();            try {int numpartitionsfortopic = getnumpartitions ();            String Brokerinfopath = Brokerpath (); for (int partition = 0; partition < Numpartitionsfortopic; partition++) {int leader = Getleaderfor (PArtition);                String Path = Brokerinfopath + "/" + leader;                    try {byte[] Brokerdata = _curator.getdata (). Forpath (path);                    Broker hp = Getbrokerhost (brokerdata);                Globalpartitioninformation.addpartition (partition, HP);  } catch (Org.apache.zookeeper.KeeperException.NoNodeException e) {log.error ("Node {} does not exist",                path);        }}} catch (Sockettimeoutexception e) {throw e;        } catch (Exception e) {throw new RuntimeException (e);        } log.info ("Read partition info from zookeeper:" + globalpartitioninformation);    return globalpartitioninformation; }

Globalpartitioninformation is a iterator class that stores the correspondence between the paritition and the broker,The relationship between Kafka Consumer and Parittion is maintained in Dynamicpartitionconnections, and what consumer information is read for each paritition, This connectioninfo information will be initialized and updated in Storm.kafka.ZkCoordinator, and one thing to mention is that kafkaspout contains a simpleconsumer

Storm.kafka.DynamicPartitionConnectionsstatic class ConnectionInfo {        simpleconsumer consumer;        set<integer> partitions = new HashSet ();        Public ConnectionInfo (Simpleconsumer consumer) {            this.consumer = consumer;        }    }


Look at the Zkcoordinator class again, look at its constructor

Storm.kafka.ZkCoordinatorpublic Zkcoordinator (dynamicpartitionconnections connections, Map stormconf, Spoutconfig Spoutconfig, zkstate State, int taskindex, int totaltasks, String Topologyinstanceid, Dynamicbrokersreader reader) {        _spoutconfig = Spoutconfig;        _connections = connections;        _taskindex = Taskindex;        _totaltasks = Totaltasks;        _topologyinstanceid = Topologyinstanceid;        _stormconf = stormconf;        _state = State;        Zkhosts brokerconf = (zkhosts) spoutconfig.hosts;        _REFRESHFREQMS = Brokerconf.refreshfreqsecs *;        _reader = reader;    }
_REFRESHFREQMS is a regular update of ZK partition to local operations, in Kafkaspout Nexttuple method will be called every timethe Getmymanagedpartitions method of Zkcoordinator. This method updates the partition information periodically according to the _REFRESHFREQMS parameter .

Storm.kafka.zkcoordinator@override public list<partitionmanager> getmymanagedpartitions () {if (_lastRef Reshtime = = NULL | |            (System.currenttimemillis ()-_lastrefreshtime) > _refreshfreqms) {refresh ();        _lastrefreshtime = System.currenttimemillis ();    } return _cachedlist; } @Override public void Refresh () {try {log.info (taskId (_taskindex, _totaltasks) + "refreshing p            Artition Manager Connections ");            Globalpartitioninformation brokerinfo = _reader.getbrokerinfo ();            List<partition> mine = Kafkautils.calculatepartitionsfortask (Brokerinfo, _totaltasks, _taskIndex);            Set<partition> Curr = _managers.keyset ();            set<partition> newpartitions = new hashset<partition> (mine);            Newpartitions.removeall (Curr);            set<partition> deletedpartitions = new hashset<partition> (Curr);    Deletedpartitions.removeall (mine);        Log.info (TaskId (_taskindex, _totaltasks) + "Deleted partition managers:" + deletedpartitions.tostring ());                for (Partition id:deletedpartitions) {Partitionmanager mans = _managers.remove (ID);            Man.close ();            } log.info (TaskId (_taskindex, _totaltasks) + "NEW partition managers:" + newpartitions.tostring ()); for (Partition id:newpartitions) {Partitionmanager man = new Partitionmanager (_connections, _topologyi                Nstanceid, _state, _stormconf, _spoutconfig, id);            _managers.put (ID, man);        }} catch (Exception e) {throw new RuntimeException (e);        } _cachedlist = new arraylist<partitionmanager> (_managers.values ());    Log.info (TaskId (_taskindex, _totaltasks) + "finished refreshing"); }

Each of the consumer allocation partition algorithm is Kafkautils.calculatepartitionsfortask (Brokerinfo, _totaltasks, _taskindex);

The main task is to get the parallel task number, compared with the current partition, to obtain a consumer to be responsible for which parititons read, the specific algorithm to Kafka document it

The above kafkaspout in the initialization of the operation, the following began to take data transmission data, to see the Nexttuple method

Storm.kafka.kafkaspout@override public void Nexttuple () {list<partitionmanager> managers = _coordinato        R.getmymanagedpartitions ();                 for (int i = 0; i < managers.size (); i++) {try {//) the number of managers decreased                _currpartitionindex = _currpartitionindex% managers.size ();                Emitstate state = Managers.get (_currpartitionindex). Next (_collector); if (state = emitstate.emitted_more_left) {_currpartitionindex = (_currpartitionindex + 1)% managers.                Size ();                } if (state! = emitstate.no_emitted) {break;                }} catch (Failedfetchexception e) {Log.warn ("Fetch failed", e);            _coordinator.refresh ();        }} Long now = System.currenttimemillis ();        if (now-_lastupdatems) > _spoutconfig.stateupdateintervalms) {commit (); }    } 
After reading the above code, all operations are performed in Partitionmanager,The message message is read in Partitionmanager and then emitted, and the main logic is in the next method of Partitionmanager

Returns false if it ' s reached the end of current batch public emitstate next (spoutoutputcollector collector) {        if (_waitingtoemit.isempty ()) {fill ();            } while (true) {Messageandrealoffset toemit = _waitingtoemit.pollfirst ();            if (Toemit = = null) {return emitstate.no_emitted;            } iterable<list<object>> tups = Kafkautils.generatetuples (_spoutconfig, toEmit.msg); if (tups! = null) {for (list<object> tup:tups) {collector.emit (tup, New Kafkam                Essageid (_partition, Toemit.offset));            } break;            } else {ack (toemit.offset);        }} if (!_waitingtoemit.isempty ()) {return emitstate.emitted_more_left;        } else {return emitstate.emitted_end; }    }
If the _waitingtoemit list is empty, then go to read MSG, and then launch each, each fired one, break a bit, return Emit_more_left to Kafkaspout Nexttuple method,, It then determines if the paritition read message buffer size has been fired, and if the next partition data is read and emitted after the launch,

Note that it is not the time to send all of the partition's MSG to launch and commit offset to ZK, but to launch a line to determine whether the commit time (set at the start of the timed commit interval), I think the reason for this is to control fail.


Kafkaspout in the Ack,fail,commit operations are all given to the Partitionmanager to do, see the code

 @Override        public void Ack (Object msgId) {Kafkamessageid id = (Kafkamessageid) msgId;        Partitionmanager m = _coordinator.getmanager (id.partition);        if (M! = null) {m.ack (id.offset);        }} @Override public void fail (Object msgId) {Kafkamessageid id = (Kafkamessageid) msgId;        Partitionmanager m = _coordinator.getmanager (id.partition);        if (M! = null) {M.fail (id.offset);    }} @Override public void deactivate () {commit (); } @Override public void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (_spoutconfig.sc    Heme.getoutputfields ());        } private void Commit () {_lastupdatems = System.currenttimemillis ();        For (Partitionmanager Manager: _coordinator.getmymanagedpartitions ()) {manager.commit (); }    }

So Partitionmanager is the core of kafkaspout, very late, are more than 3 o'clock, follow-up will not partitionmanager analysis, good night



(v) Storm-kafka source of Kafkaspout

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.