Apache Storm reads the raw stream of real-time data from one end and passes it through a series of small processing units and outputs processing/useful information at the other end.
Describes the core concepts of Apache storm.
640?wx_fmt=png&wxfrom=5&wx_lazy=1
Now let's take a closer look at the components of Apache storm-
Component description
Tupletuple is the main data structure in storm. It is a list of ordered elements. By default, tuple supports all data types. Typically, it is modeled as a set of comma-separated values and passed to the storm cluster.
Stream streams are unordered sequences of tuples.
The source of the spouts stream. Typically, storm accepts input data from the original data source (such as the Twitter streaming api,apache Kafka queue, Kestrel queue, and so on). Otherwise, you can write spouts to read data from the data source. "Ispout" is the core interface to implement spouts, some specific interfaces are irichspout,baserichspout,kafkaspout and so on.
Boltsbolts is a logical processing unit. The spouts passes the data to the bolts and bolts processes and produces a new output stream. Bolts can perform actions such as filtering, aggregating, joining, interacting with the data source and the database. The bolts receives data and emits it to one or more bolts. "IBolt" is the core interface to implement bolts. Some common interfaces are irichbolt,ibasicbolt and so on.
Let's look at a real-time example of "Twitter analytics" to see how to model in Apache storm. Describes the structure.
0?wx_fmt=png
The "Twitter analytics" input comes from the Twitter www.dfgjpt.com streaming API. Spout will use the Twitter streaming API to read the user's tweets and output as a tuple stream. A single tuple from spout will have a Twitter user name and a single tweet as a comma-separated value. The tuple's steam is then forwarded to the bolt, and the bolt splits the tweet into a single word, calculates the word count, and saves the information to the configured data source. Now we can easily get results by querying the data source.
Topology
Spouts and bolts are joined together to form a topological structure. Real-time application logic is specified in the Storm topology. Simply put, the topology is a forward graph, where vertices are computed and the edges are data streams.
The simple topology starts with spouts. Spouts data is emitted to one or more bolts. Www.jyyl157.com Bolt represents a node with minimal processing logic in the topology, and the output of bolts can be emitted to another bolts as input.
Storm keeps the topology running until you terminate the topology. The main task of Apache Storm is to run the topology and run any number of topologies at a given time.
Task
Now you have a basic idea about spouts and bolts. They are the smallest logical units of the topology, and the topology is built using a single spout and bolt array. They should be executed correctly in a specific order for the topology to run successfully. Each spout and bolt executed by storm is called a "task." In simple terms, the task is spouts or bolts execution. At a given time, each spout and bolt can have multiple instances running in multiple separate threads.
Process
Topologies run on multiple worker nodes in a distributed manner. Storm distributes the tasks on all work nodes evenly. The role of a worker node is a listener job and starts or stops a process when a new job arrives.
Stream grouping
The data flow flows from spouts to bolts, or from one bolts to another bolts. The flow grouping controls how tuples are routed in the topology and helps us understand the tuple flows in the topology. There are four built-in groupings, as described below.
Random grouping
In a random grouping, an equal number of tuples are randomly distributed among all workers who perform bolts. Describes the structure.
0?wx_fmt=jpeg
Field grouping
Fields with the same value in the tuple are grouped together, and the remaining tuples are saved externally. Tuples with the same field values are then sent forward to the same process that performs bolts. For example, if the stream is grouped by the field "word", tuples with the same string "Hello" will be moved to the same worker. Shows how field groupings work.
0?wx_fmt=jpeg
Global grouping
All streams can be grouped and moved forward to a bolts. This grouping sends the tuples generated by all instances of the source to a single target instance (specifically, a worker with the lowest ID).
0?wx_fmt=jpeg
All groups
All groupings send a single copy of each tuple to all instances that receive bolts. This grouping is used to send signals to the bolts. All groupings are useful for connection operations.
Apache Storm reads the raw stream of real-time data from one end