Apache Storm reads the raw stream of real-time data from one end

Source: Internet
Author: User

Apache Storm reads the raw stream of real-time data from one end and passes it through a series of small processing units and outputs processing/useful information at the other end.
  
Describes the core concepts of Apache storm.
  
640?wx_fmt=png&wxfrom=5&wx_lazy=1
  
Now let's take a closer look at the components of Apache storm-
  
Component description
  
Tupletuple is the main data structure in storm. It is a list of ordered elements. By default, tuple supports all data types. Typically, it is modeled as a set of comma-separated values and passed to the storm cluster.
  
Stream streams are unordered sequences of tuples.
  
The source of the spouts stream. Typically, storm accepts input data from the original data source (such as the Twitter streaming api,apache Kafka queue, Kestrel queue, and so on). Otherwise, you can write spouts to read data from the data source. "Ispout" is the core interface to implement spouts, some specific interfaces are irichspout,baserichspout,kafkaspout and so on.
  
Boltsbolts is a logical processing unit. The spouts passes the data to the bolts and bolts processes and produces a new output stream. Bolts can perform actions such as filtering, aggregating, joining, interacting with the data source and the database. The bolts receives data and emits it to one or more bolts. "IBolt" is the core interface to implement bolts. Some common interfaces are irichbolt,ibasicbolt and so on.
  
Let's look at a real-time example of "Twitter analytics" to see how to model in Apache storm. Describes the structure.
  
0?wx_fmt=png
  
The "Twitter analytics" input comes from the Twitter www.dfgjpt.com streaming API. Spout will use the Twitter streaming API to read the user's tweets and output as a tuple stream. A single tuple from spout will have a Twitter user name and a single tweet as a comma-separated value. The tuple's steam is then forwarded to the bolt, and the bolt splits the tweet into a single word, calculates the word count, and saves the information to the configured data source. Now we can easily get results by querying the data source.
  
Topology
  
Spouts and bolts are joined together to form a topological structure. Real-time application logic is specified in the Storm topology. Simply put, the topology is a forward graph, where vertices are computed and the edges are data streams.
  
The simple topology starts with spouts. Spouts data is emitted to one or more bolts. Www.jyyl157.com Bolt represents a node with minimal processing logic in the topology, and the output of bolts can be emitted to another bolts as input.
  
Storm keeps the topology running until you terminate the topology. The main task of Apache Storm is to run the topology and run any number of topologies at a given time.
  
Task
  
Now you have a basic idea about spouts and bolts. They are the smallest logical units of the topology, and the topology is built using a single spout and bolt array. They should be executed correctly in a specific order for the topology to run successfully. Each spout and bolt executed by storm is called a "task." In simple terms, the task is spouts or bolts execution. At a given time, each spout and bolt can have multiple instances running in multiple separate threads.
  
Process
  
Topologies run on multiple worker nodes in a distributed manner. Storm distributes the tasks on all work nodes evenly. The role of a worker node is a listener job and starts or stops a process when a new job arrives.
  
Stream grouping
  
The data flow flows from spouts to bolts, or from one bolts to another bolts. The flow grouping controls how tuples are routed in the topology and helps us understand the tuple flows in the topology. There are four built-in groupings, as described below.
  
Random grouping
  
In a random grouping, an equal number of tuples are randomly distributed among all workers who perform bolts. Describes the structure.
  
0?wx_fmt=jpeg
  
Field grouping
  
Fields with the same value in the tuple are grouped together, and the remaining tuples are saved externally. Tuples with the same field values are then sent forward to the same process that performs bolts. For example, if the stream is grouped by the field "word", tuples with the same string "Hello" will be moved to the same worker. Shows how field groupings work.
  
0?wx_fmt=jpeg
  
Global grouping
  
All streams can be grouped and moved forward to a bolts. This grouping sends the tuples generated by all instances of the source to a single target instance (specifically, a worker with the lowest ID).
  
0?wx_fmt=jpeg
  
All groups
  
All groupings send a single copy of each tuple to all instances that receive bolts. This grouping is used to send signals to the bolts. All groupings are useful for connection operations.

Apache Storm reads the raw stream of real-time data from one end

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.