Apache Storm reads the raw stream of real-time data from one end

Last Update:2018-01-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Apache Storm reads the raw stream of real-time data from one end and passes it through a series of small processing units and outputs processing/useful information at the other end.
　　
Describes the core concepts of Apache storm.
　　
640?wx_fmt=png&wxfrom=5&wx_lazy=1
　　
Now let's take a closer look at the components of Apache storm-
　　
Component description
　　
Tupletuple is the main data structure in storm. It is a list of ordered elements. By default, tuple supports all data types. Typically, it is modeled as a set of comma-separated values and passed to the storm cluster.
　　
Stream streams are unordered sequences of tuples.
　　
The source of the spouts stream. Typically, storm accepts input data from the original data source (such as the Twitter streaming api,apache Kafka queue, Kestrel queue, and so on). Otherwise, you can write spouts to read data from the data source. "Ispout" is the core interface to implement spouts, some specific interfaces are irichspout,baserichspout,kafkaspout and so on.
　　
Boltsbolts is a logical processing unit. The spouts passes the data to the bolts and bolts processes and produces a new output stream. Bolts can perform actions such as filtering, aggregating, joining, interacting with the data source and the database. The bolts receives data and emits it to one or more bolts. "IBolt" is the core interface to implement bolts. Some common interfaces are irichbolt,ibasicbolt and so on.
　　
Let's look at a real-time example of "Twitter analytics" to see how to model in Apache storm. Describes the structure.
　　
0?wx_fmt=png
　　
The "Twitter analytics" input comes from the Twitter www.dfgjpt.com streaming API. Spout will use the Twitter streaming API to read the user's tweets and output as a tuple stream. A single tuple from spout will have a Twitter user name and a single tweet as a comma-separated value. The tuple's steam is then forwarded to the bolt, and the bolt splits the tweet into a single word, calculates the word count, and saves the information to the configured data source. Now we can easily get results by querying the data source.
　　
Topology
　　
Spouts and bolts are joined together to form a topological structure. Real-time application logic is specified in the Storm topology. Simply put, the topology is a forward graph, where vertices are computed and the edges are data streams.
　　
The simple topology starts with spouts. Spouts data is emitted to one or more bolts. Www.jyyl157.com Bolt represents a node with minimal processing logic in the topology, and the output of bolts can be emitted to another bolts as input.
　　
Storm keeps the topology running until you terminate the topology. The main task of Apache Storm is to run the topology and run any number of topologies at a given time.
　　
Task
　　
Now you have a basic idea about spouts and bolts. They are the smallest logical units of the topology, and the topology is built using a single spout and bolt array. They should be executed correctly in a specific order for the topology to run successfully. Each spout and bolt executed by storm is called a "task." In simple terms, the task is spouts or bolts execution. At a given time, each spout and bolt can have multiple instances running in multiple separate threads.
　　
Process
　　
Topologies run on multiple worker nodes in a distributed manner. Storm distributes the tasks on all work nodes evenly. The role of a worker node is a listener job and starts or stops a process when a new job arrives.
　　
Stream grouping
　　
The data flow flows from spouts to bolts, or from one bolts to another bolts. The flow grouping controls how tuples are routed in the topology and helps us understand the tuple flows in the topology. There are four built-in groupings, as described below.
　　
Random grouping
　　
In a random grouping, an equal number of tuples are randomly distributed among all workers who perform bolts. Describes the structure.
　　
0?wx_fmt=jpeg
　　
Field grouping
　　
Fields with the same value in the tuple are grouped together, and the remaining tuples are saved externally. Tuples with the same field values are then sent forward to the same process that performs bolts. For example, if the stream is grouped by the field "word", tuples with the same string "Hello" will be moved to the same worker. Shows how field groupings work.
　　
0?wx_fmt=jpeg
　　
Global grouping
　　
All streams can be grouped and moved forward to a bolts. This grouping sends the tuples generated by all instances of the source to a single target instance (specifically, a worker with the lowest ID).
　　
0?wx_fmt=jpeg
　　
All groups
　　
All groupings send a single copy of each tuple to all instances that receive bolts. This grouping is used to send signals to the bolts. All groupings are useful for connection operations.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Apache Storm reads the raw stream of real-time data from one end

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Apache Storm reads the raw stream of real-time data from one end

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support