1. ConceptThe message flow is the most critical abstraction inside storm. A message flow is a tuple sequence without boundaries, and these tuples are created and processed in parallel in a distributed manner. The definition of a message flow is primarily the definition of a tuple within a message flow, and we give each field a name in the tuple. and the corresponding fields of the different tuple types must be the same. That is, the first field of the two tuple must be the same type, the second field must be the same type, but the first field and the second field can have different types.
By default, the field type of a tuple can be: integer, long, Short, Byte, string, double, float, Boolean, and byte array. You can also customize the type-as long as you implement the corresponding serializer.
2. Message Distribution policy: Stream groupings
- Shuffle Grouping: Randomly distribute the tuple within the stream to ensure that each bolt receives the same number of tuples.
- Fields Grouping: Grouped by field, such as by UserID, a tuple with the same userid is divided into the same bolts, and the different userid is assigned to a different bolts.
- All Grouping: Broadcast sent, for each tuple, all bolts will receive.
- Global Grouping: Globally grouped, this tuple is assigned to one of the bolt's tasks in storm. More specifically, the task assigned to the lowest ID value.
- Non Grouping: No grouping, this grouping means that stream does not care who will receive its tuple. At present, this grouping and shuffle grouping is the same effect, a little different is that storm will put this bolt in the same thread as the Subscriber to execute.
- Direct Grouping: A very special grouping method, which means that the sender of the message specifies which task of the message receiver handles the message. Only message flows that are declared as direct stream can declare this grouping method. And this message tuple must use the Emitdirect method to launch. The message processor can get the taskid of the message that handles it by Topologycontext (the Outputcollector.emit method also returns TaskID)
- Local or Shuffle grouping: If the target bolt has one or more tasks in the same worker process, the tuple will be randomly assigned to those tasks. Otherwise, it is consistent with normal shuffle grouping behavior.
From for notes (Wiz)
2. Storm Message Flow