Storm is an open-source distributed real-time computing system that can handle a large amount of data flow simply and reliably. Storm is easy to deploy and operational, and more importantly, you can use any programming language to develop your application .
Storm: Real-time computing systems
Low latency, high performance, distributed, scalable, fault tolerant
Features: Simple programming model, hot deployment, various programming languages, extensible, fault tolerant, reliable message processing, fast, local mode
Storm Basic concepts:
Nimbus: Responsible for resource allocation and task scheduling
Supervisor: Responsible for accepting tasks assigned by Nimbus, starting and stopping worker processes that belong to their own management
Worker: A process that runs a specific processing component logic
Each Spout/bolt thread in the Task:worker is called a task, and after storm0.8, the task does not correspond to the physical thread, and the same Spout/bolt task may share a physical thread called the executor
A real-time application running in Topology:storm, because the flow of messages between components forms a logical topological structure
Spout: The component that produces the source data in a topology. Typically, spout reads data from an external data source and then translates it into topology's internal source data. Spout is an active role with a nexttuple () function in its interface. The storm framework calls this function endlessly, and the user simply generates the source data in it.
Bolt: A component that accepts data in a topology and then executes the processing. Bolts can perform any operation such as filtering, function manipulation, merging, writing the database, and so on. Bolt is a passive role with an execute (tupleinput) function in its interface, which is called when the message is received, where the user can perform the action he or she wants.
Tuple: The basic unit of a single message delivery. It was supposed to be a key-value map, but because the field names of the tuple passed between the components were already defined beforehand, it was a valuelist that the tuple would simply fill in each value in order.
Stream: A stream is formed by a stream of tuples.
Strom Usage Scenarios:
1. Stream aggregation: Aggregating two or more data streams into one data stream-based on some common tuple fields.
2. Batch processing: For performance or for some other reason, a group of tuples is processed together, not one alone.
3.BasicBolt
A. Reading an input tuple
B. Emitting one or more tuple based on one input tuple
C. The last ACK of the method in execute the input tuple bolts that follow such patterns are generally functions or filters, which is too common for storm to encapsulate a single interface for such a pattern: Ibasicbolt
D. In-memory cache +fields grouping combination
E. According to TOPN
F. Use Timecachemap to efficiently save a cache of recently updated objects
G. Distributed RPC
Storm grouping mechanism
1. Random grouping (Shuffle grouping)
2. Field grouping (Fields grouping)
3. All groups (all grouping)
4. Global grouping (Globals grouping)
5. No grouping (none grouping)
6. Direct grouping (directly grouping)
7. Implement the Customstreamgrouping interface to define your own groupings
Storm Basic Knowledge Summary