Storm starter-rollingtopwords

Source: Internet
Author: User
Tags emit
Document directory
  • Topology
  • Rollingcountbolt
  • Intermediaterankingsbolt
  • Totalrankingsbolt

Calculate the Top N words topology, for example, trending topics or trending images on Twitter.

It is interesting to implement Sliding Window counting and Top N sorting. Analyze the code in detail.

Topology

This is a slightly more complex topology, mainly reflected in the use of different grouping methods, fieldsgrouping and globalgrouping

 String spoutId = "wordGenerator"; String counterId = "counter"; String intermediateRankerId = "intermediateRanker"; String totalRankerId = "finalRanker"; builder.setSpout(spoutId, new TestWordSpout(), 5); builder.setBolt(counterId, new RollingCountBolt(9, 3), 4).fieldsGrouping(spoutId, new Fields("word")); builder.setBolt(intermediateRankerId, new IntermediateRankingsBolt(TOP_N), 4).fieldsGrouping(counterId, new Fields("obj")); builder.setBolt(totalRankerId, new TotalRankingsBolt TOP_N)).globalGrouping(intermediateRankerId);

 

Rollingcountbolt

Rollingcountbolt is used first, and fieldsgrouping is performed according to the word. Therefore, the same word will be sent to the same bolt. This field ID is specified in declareoutputfields of the upper level.

Rollingcountbolt is used for time window-based counting. Therefore, two parameters are required, the length of the sliding window in seconds and the emit frequency in seconds.

New rollingcountbolt (9, 3), meaning output the latest 9 minutes sliding window every 3 minutes

1. Create slidingwindowcounter (slidingwindowcounter and slotbasedcounter can be found below)

Counter = new slidingwindowcounter (this. windowlengthinseconds/This. windowupdatefrequencyinseconds );

How to define the number of slots? For a 9-minute time window with emit data every 3 minutes, 9/3 = 3 slots are required.

In less than 3 minutes, countobjandack (tuple) is constantly called to increase the count of all objects on the slot.

Emitcurrentwindowcounts is triggered every 3 minutes, used to slide the window (through getcountsthenadvancewindow), and emit (Map <OBJ, counting and> in the window, actual use time)

Because the actual emit trigger time cannot be exactly 3 minutes, there will be errors, so the actual use time needs to be given

 

2. tuplehelpers. isticktuple (tuple), ticktuple

What I have not mentioned above is: how to trigger emit? This is worth noting because it uses the ticktuple feature of storm.

This function is very useful, such as database batch storage or time window statistics.

"_ System" component regularly sends "_ tick" stream tuple to the task

The sending frequency is configured by topology_tick_tuple_freq_secs, which can be configured in default. ymal.

You can also use getcomponentconfiguration () in the Code for configuration,

public Map<String, Object> getComponentConfiguration() {     Map<String, Object> conf = new HashMap<String, Object>();     conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, emitFrequencyInSeconds);     return conf;

After the configuration is complete, storm periodically sends ticktuple to the task.

You can use isticktuple to determine whether it is a ticktuple.

public static boolean isTickTuple(Tuple tuple) {    return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID) \\ SYSTEM_COMPONENT_ID == "__system"        && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID); \\ SYSTEM_TICK_STREAM_ID == "__tick"}

Finally, the output of this blot is collector. emit (new values (OBJ, Count, actualwindowlengthinseconds ));

OBJ, count (count and in the window), actual use time

 

Slotbasedcounter

Slot-based counter and template class, which can specify the type T of the object to be counted

This class is actually very simple. It implements a map of counting objects and a group of slots (implemented using the long array), and can perform increment or reset operations on any slots.

The key structure is Map <t, long []> objtocounts. Each OBJ corresponds to a long array of numslots. Therefore, the number of numslots can be calculated for each obj.

Incrementcount: increments a slot of an obj. If this is the first time you need to create a counts Array

Getcount, getcounts, get a slot value of an OBJ, or the sum of all slot values of an OBJ

Wipeslot, resetslotcounttozero, A solt of all objects in the reset is 0, and a slot of an OBJ In the reset is 0.

Wipezeros: delete all OBJ whose total count is 0 to release space.

public final class SlotBasedCounter<T> implements Serializable {    private static final long serialVersionUID = 4858185737378394432L;    private final Map<T, long[]> objToCounts = new HashMap<T, long[]>();    private final int numSlots;    public SlotBasedCounter(int numSlots) {        if (numSlots <= 0) {            throw new IllegalArgumentException("Number of slots must be greater than zero (you requested " + numSlots                + ")");        }        this.numSlots = numSlots;    }    public void incrementCount(T obj, int slot) {        long[] counts = objToCounts.get(obj);        if (counts == null) {            counts = new long[this.numSlots];            objToCounts.put(obj, counts);        }        counts[slot]++;    }    public long getCount(T obj, int slot) {        long[] counts = objToCounts.get(obj);        if (counts == null) {            return 0;        }        else {            return counts[slot];        }    }    public Map<T, Long> getCounts() {        Map<T, Long> result = new HashMap<T, Long>();        for (T obj : objToCounts.keySet()) {            result.put(obj, computeTotalCount(obj));        }        return result;    }    private long computeTotalCount(T obj) {        long[] curr = objToCounts.get(obj);        long total = 0;        for (long l : curr) {            total += l;        }        return total;    }    /**     * Reset the slot count of any tracked objects to zero for the given slot.     *      * @param slot     */    public void wipeSlot(int slot) {        for (T obj : objToCounts.keySet()) {            resetSlotCountToZero(obj, slot);        }    }    private void resetSlotCountToZero(T obj, int slot) {        long[] counts = objToCounts.get(obj);        counts[slot] = 0;    }    private boolean shouldBeRemovedFromCounter(T obj) {        return computeTotalCount(obj) == 0;    }    /**     * Remove any object from the counter whose total count is zero (to free up memory).     */    public void wipeZeros() {        Set<T> objToBeRemoved = new HashSet<T>();        for (T obj : objToCounts.keySet()) {            if (shouldBeRemovedFromCounter(obj)) {                objToBeRemoved.add(obj);            }        }        for (T obj : objToBeRemoved) {            objToCounts.remove(obj);        }    }}

Slidingwindowcounter

Slidingwindowcounter only further encapsulates slotbasedcounter and provides the sliding window concept through headslot and tailslot.

Incrementcount. Only the headslot can be increment, and other slots can be used as the historical data in the window.

The core operation is getcountsthenadvancewindow.

1. Retrieve Map <t, long> counts, map of the object and all slots evaluate and value in the window

2. Call wipezeros to delete unused OBJ and release space.

3. The most important step is to clear the tailslot and advancehead to implement sliding windows.

Advancehead implementation, how to implement a circular sliding window in the array

public final class SlidingWindowCounter<T> implements Serializable {    private static final long serialVersionUID = -2645063988768785810L;    private SlotBasedCounter<T> objCounter;    private int headSlot;    private int tailSlot;    private int windowLengthInSlots;    public SlidingWindowCounter(int windowLengthInSlots) {        if (windowLengthInSlots < 2) {            throw new IllegalArgumentException("Window length in slots must be at least two (you requested "                + windowLengthInSlots + ")");        }        this.windowLengthInSlots = windowLengthInSlots;        this.objCounter = new SlotBasedCounter<T>(this.windowLengthInSlots);        this.headSlot = 0;        this.tailSlot = slotAfter(headSlot);    }    public void incrementCount(T obj) {        objCounter.incrementCount(obj, headSlot);    }    /**     * Return the current (total) counts of all tracked objects, then advance the window.     *      * Whenever this method is called, we consider the counts of the current sliding window to be available to and     * successfully processed "upstream" (i.e. by the caller). Knowing this we will start counting any subsequent     * objects within the next "chunk" of the sliding window.     *      * @return     */    public Map<T, Long> getCountsThenAdvanceWindow() {        Map<T, Long> counts = objCounter.getCounts();        objCounter.wipeZeros();        objCounter.wipeSlot(tailSlot);        advanceHead();        return counts;    }    private void advanceHead() {        headSlot = tailSlot;        tailSlot = slotAfter(tailSlot);    }    private int slotAfter(int slot) {        return (slot + 1) % windowLengthInSlots;    }}
 
Intermediaterankingsbolt

This bolt is used to sort the intermediate results. Why should we add this step because the data volume is large? If it is directly placed on a node, the load will be too heavy.

Therefore, use intermediaterankingsbolt to filter out some

It is still used here. fieldsgrouping for OBJ ensures that the emit statistics for the same OBJ will be sent to the same task in different time periods.

Intermediaterankingsbolt inherits from abstractrankerbolt (refer to the following)

And implements updaterankingswithtuple,

void updateRankingsWithTuple(Tuple tuple) {    Rankable rankable = RankableObjectWithFields.from(tuple);    super.getRankings().updateWith(rankable);}
The logic is simple. tuple is converted to rankable and the rankings list is updated.
Refer to wannactrankerbolt, which regularly releases the ranking list emit.

Rankable

In addition to inheriting the comparable interface, rankable also adds the GetObject () and getcount () interfaces.

public interface Rankable extends Comparable<Rankable> {    Object getObject();    long getCount();}
Rankableobjectwithfields

Rankableobjectwithfields implements the rankable Interface

1. provides the ability to convert tuple into a rankableobject

Tuple is composed of several fields. The first field is used as OBJ, and the second field is used as count. The rest are placed in list <Object> otherfields.

2. Implement the GetObject () and getcount () interfaces defined by rankable

3. Implement the comparable interface, including compareto and equals

public class RankableObjectWithFields implements Rankable
public static RankableObjectWithFields from(Tuple tuple) {    List<Object> otherFields = Lists.newArrayList(tuple.getValues());    Object obj = otherFields.remove(0);    Long count = (Long) otherFields.remove(0);    return new RankableObjectWithFields(obj, count, otherFields.toArray());}
Rankings

Rankings maintains the list to be sorted and provides operations on the list.

The core data structure is as follows, used to store the list of rankable objects

List <rankable> rankeditems = lists. newarraylist ();

Provides some simple operations, such as setting maxsize (list size), getrankings (return rankeditems, sorting List)

The core operation is,

public void updateWith(Rankable r) {    addOrReplace(r);    rerank();    shrinkRankingsIfNeeded();}

At the upper level of the time window (OBJ, count), the order between obj is constantly changing.

1. Replace existing or add a rankable object (including OBJ and count)

2. Sort (collections. Sort)

3. Because you only need topn, You need to delete data larger than maxsize.

Abstractrankerbolt

First, use topn as the parameter to create a rankings object.

private final Rankings rankings;public AbstractRankerBolt(int topN, int emitFrequencyInSeconds) {    count = topN;    this.emitFrequencyInSeconds = emitFrequencyInSeconds;    rankings = new Rankings(count);}

In execute, emit is also triggered on a regular basis, and ticktuple is configured through emitfrequencyinseconds.

Generally, update rankings continuously using updaterankingswithtuple.

Updaterankingswithtuple is the abstract function, and the specific update logic needs to be rewritten by the subclass.

public final void execute(Tuple tuple, BasicOutputCollector collector) {    if (TupleHelpers.isTickTuple(tuple)) {        emitRankings(collector);    }    else {        updateRankingsWithTuple(tuple);    }}

Finally, the entire rankings list is emit.

private void emitRankings(BasicOutputCollector collector) {    collector.emit(new Values(rankings));    getLogger().info("Rankings: " + rankings);}

Totalrankingsbolt

This bolt uses globalgrouping, meaning that all data will be sent to the same task for final sorting.

Totalrankingsbolt also inherits from abstractrankerbolt

void updateRankingsWithTuple(Tuple tuple) {    Rankings rankingsToBeMerged = (Rankings) tuple.getValue(0);    super.getRankings().updateWith(rankingsToBeMerged);}

The only difference is that the updatewith parameter is a rankable list. The implementation in rankings is the same, but the traversal is more.

Finally, we can obtain the global topn rankings list.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.