(v) Storm-kafka source of Kafkaspout

Last Update:2014-11-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Now start to introduce the Kafkaspout source code.

Start by doing some initialization in the early open method,

........................        _state = new Zkstate (stateconf);        _connections = new Dynamicpartitionconnections (_spoutconfig, Kafkautils.makebrokerreader (conf, _spoutConfig));        Using transactionalstate like this is a hack        int totaltasks = Context.getcomponenttasks ( Context.getthiscomponentid ()). Size ();        if (_spoutconfig.hosts instanceof statichosts) {            _coordinator = new Staticcoordinator (_connections, conf, _ Spoutconfig, _state, Context.getthistaskindex (), Totaltasks, _uuid);        } else {            _coordinator = new Zkcoordinator (_connections, Conf, _spoutconfig, _state, Context.getthistaskindex (), Totaltasks, _uuid);        } ............

Some code is omitted before and after, about metric this series is not introduced for the time being. The main is to initialize the zookeeper connection zkstate, Kafka Partition with the broker relationship (initialize dynamicpartitionconnections), inDynamicpartitionconnections constructor needs to pass in a brokerreader, we are zkhosts, see Kafkautils code to know the use of zkbrokerreader, look at the code

Public Zkbrokerreader (MAP conf, String topic, zkhosts hosts) {try {reader = new Dynamicbrokersreader (conf, hosts.brokerzks TR, Hosts.brokerzkpath, topic); cachedbrokers = Reader.getbrokerinfo (); lastrefreshtimems = System.currenttimemillis () ; refreshmillis = Hosts.refreshfreqsecs * 1000L;} catch (Java.net.SocketTimeoutException e) {Log.warn ("Failed to update Brokers", E);}}

There is a refreshmillis parameter that is timed to update the partition information in ZK,

Zkbrokerreader@overridepublic globalpartitioninformation getcurrentbrokers () {Long currtime = System.currenttimemillis (); if (Currtime > Lastrefreshtimems + refreshmillis) {//The current time is greater than the difference from the last update time is greater than refreshmillistry {Log.info ("Brokers need refreshing because" + Refreshmillis + "Ms has expired"); cachedbrokers = Reader.getbrokerinfo (); l Astrefreshtimems = Currtime;} catch (Java.net.SocketTimeoutException e) {Log.warn ("Failed to update Brokers", E);}} return cachedbrokers;} Here is the code that calls Dynamicbrokersreader/** * Get all partitions with their current leaders */public Globalpartitionin  Formation Getbrokerinfo () throws sockettimeoutexception {globalpartitioninformation globalpartitioninformation = new        Globalpartitioninformation ();            try {int numpartitionsfortopic = getnumpartitions ();            String Brokerinfopath = Brokerpath (); for (int partition = 0; partition < Numpartitionsfortopic; partition++) {int leader = Getleaderfor (PArtition);                String Path = Brokerinfopath + "/" + leader;                    try {byte[] Brokerdata = _curator.getdata (). Forpath (path);                    Broker hp = Getbrokerhost (brokerdata);                Globalpartitioninformation.addpartition (partition, HP);  } catch (Org.apache.zookeeper.KeeperException.NoNodeException e) {log.error ("Node {} does not exist",                path);        }}} catch (Sockettimeoutexception e) {throw e;        } catch (Exception e) {throw new RuntimeException (e);        } log.info ("Read partition info from zookeeper:" + globalpartitioninformation);    return globalpartitioninformation; }

Globalpartitioninformation is a iterator class that stores the correspondence between the paritition and the broker,The relationship between Kafka Consumer and Parittion is maintained in Dynamicpartitionconnections, and what consumer information is read for each paritition, This connectioninfo information will be initialized and updated in Storm.kafka.ZkCoordinator, and one thing to mention is that kafkaspout contains a simpleconsumer

Storm.kafka.DynamicPartitionConnectionsstatic class ConnectionInfo {        simpleconsumer consumer;        set<integer> partitions = new HashSet ();        Public ConnectionInfo (Simpleconsumer consumer) {            this.consumer = consumer;        }    }

Look at the Zkcoordinator class again, look at its constructor

Storm.kafka.ZkCoordinatorpublic Zkcoordinator (dynamicpartitionconnections connections, Map stormconf, Spoutconfig Spoutconfig, zkstate State, int taskindex, int totaltasks, String Topologyinstanceid, Dynamicbrokersreader reader) {        _spoutconfig = Spoutconfig;        _connections = connections;        _taskindex = Taskindex;        _totaltasks = Totaltasks;        _topologyinstanceid = Topologyinstanceid;        _stormconf = stormconf;        _state = State;        Zkhosts brokerconf = (zkhosts) spoutconfig.hosts;        _REFRESHFREQMS = Brokerconf.refreshfreqsecs *;        _reader = reader;    }

_REFRESHFREQMS is a regular update of ZK partition to local operations, in Kafkaspout Nexttuple method will be called every timethe Getmymanagedpartitions method of Zkcoordinator. This method updates the partition information periodically according to the _REFRESHFREQMS parameter .

Storm.kafka.zkcoordinator@override public list<partitionmanager> getmymanagedpartitions () {if (_lastRef Reshtime = = NULL | |            (System.currenttimemillis ()-_lastrefreshtime) > _refreshfreqms) {refresh ();        _lastrefreshtime = System.currenttimemillis ();    } return _cachedlist; } @Override public void Refresh () {try {log.info (taskId (_taskindex, _totaltasks) + "refreshing p            Artition Manager Connections ");            Globalpartitioninformation brokerinfo = _reader.getbrokerinfo ();            List<partition> mine = Kafkautils.calculatepartitionsfortask (Brokerinfo, _totaltasks, _taskIndex);            Set<partition> Curr = _managers.keyset ();            set<partition> newpartitions = new hashset<partition> (mine);            Newpartitions.removeall (Curr);            set<partition> deletedpartitions = new hashset<partition> (Curr);    Deletedpartitions.removeall (mine);        Log.info (TaskId (_taskindex, _totaltasks) + "Deleted partition managers:" + deletedpartitions.tostring ());                for (Partition id:deletedpartitions) {Partitionmanager mans = _managers.remove (ID);            Man.close ();            } log.info (TaskId (_taskindex, _totaltasks) + "NEW partition managers:" + newpartitions.tostring ()); for (Partition id:newpartitions) {Partitionmanager man = new Partitionmanager (_connections, _topologyi                Nstanceid, _state, _stormconf, _spoutconfig, id);            _managers.put (ID, man);        }} catch (Exception e) {throw new RuntimeException (e);        } _cachedlist = new arraylist<partitionmanager> (_managers.values ());    Log.info (TaskId (_taskindex, _totaltasks) + "finished refreshing"); }

Each of the consumer allocation partition algorithm is Kafkautils.calculatepartitionsfortask (Brokerinfo, _totaltasks, _taskindex);

The main task is to get the parallel task number, compared with the current partition, to obtain a consumer to be responsible for which parititons read, the specific algorithm to Kafka document it

The above kafkaspout in the initialization of the operation, the following began to take data transmission data, to see the Nexttuple method

Storm.kafka.kafkaspout@override public void Nexttuple () {list<partitionmanager> managers = _coordinato        R.getmymanagedpartitions ();                 for (int i = 0; i < managers.size (); i++) {try {//) the number of managers decreased                _currpartitionindex = _currpartitionindex% managers.size ();                Emitstate state = Managers.get (_currpartitionindex). Next (_collector); if (state = emitstate.emitted_more_left) {_currpartitionindex = (_currpartitionindex + 1)% managers.                Size ();                } if (state! = emitstate.no_emitted) {break;                }} catch (Failedfetchexception e) {Log.warn ("Fetch failed", e);            _coordinator.refresh ();        }} Long now = System.currenttimemillis ();        if (now-_lastupdatems) > _spoutconfig.stateupdateintervalms) {commit (); }    }

After reading the above code, all operations are performed in Partitionmanager,The message message is read in Partitionmanager and then emitted, and the main logic is in the next method of Partitionmanager

Returns false if it ' s reached the end of current batch public emitstate next (spoutoutputcollector collector) {        if (_waitingtoemit.isempty ()) {fill ();            } while (true) {Messageandrealoffset toemit = _waitingtoemit.pollfirst ();            if (Toemit = = null) {return emitstate.no_emitted;            } iterable<list<object>> tups = Kafkautils.generatetuples (_spoutconfig, toEmit.msg); if (tups! = null) {for (list<object> tup:tups) {collector.emit (tup, New Kafkam                Essageid (_partition, Toemit.offset));            } break;            } else {ack (toemit.offset);        }} if (!_waitingtoemit.isempty ()) {return emitstate.emitted_more_left;        } else {return emitstate.emitted_end; }    }

If the _waitingtoemit list is empty, then go to read MSG, and then launch each, each fired one, break a bit, return Emit_more_left to Kafkaspout Nexttuple method,, It then determines if the paritition read message buffer size has been fired, and if the next partition data is read and emitted after the launch,

Note that it is not the time to send all of the partition's MSG to launch and commit offset to ZK, but to launch a line to determine whether the commit time (set at the start of the timed commit interval), I think the reason for this is to control fail.

Kafkaspout in the Ack,fail,commit operations are all given to the Partitionmanager to do, see the code

 @Override        public void Ack (Object msgId) {Kafkamessageid id = (Kafkamessageid) msgId;        Partitionmanager m = _coordinator.getmanager (id.partition);        if (M! = null) {m.ack (id.offset);        }} @Override public void fail (Object msgId) {Kafkamessageid id = (Kafkamessageid) msgId;        Partitionmanager m = _coordinator.getmanager (id.partition);        if (M! = null) {M.fail (id.offset);    }} @Override public void deactivate () {commit (); } @Override public void Declareoutputfields (Outputfieldsdeclarer declarer) {Declarer.declare (_spoutconfig.sc    Heme.getoutputfields ());        } private void Commit () {_lastupdatems = System.currenttimemillis ();        For (Partitionmanager Manager: _coordinator.getmymanagedpartitions ()) {manager.commit (); }    }

So Partitionmanager is the core of kafkaspout, very late, are more than 3 o'clock, follow-up will not partitionmanager analysis, good night

(v) Storm-kafka source of Kafkaspout

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

(v) Storm-kafka source of Kafkaspout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

(v) Storm-kafka source of Kafkaspout

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support