Storm is a real-time, distributed and highly fault -Tolerant computing system . Like Hadoop, Storm can handle large volumes of data, but Storm can make processing more real-time with high reliability , which means that all information is processed. Storm also has fault tolerance and distributed computing features, which allows storm to scale to different machines for large batches of data processing. the similarities and differences between Storm and Hadoop1, Strom service has been opened, unless it is considered to be closed, no person will not stop,? 2, real-time: Storm delay low, storm data in memory, Hadoop data using disk as a swap medium.
3.storm delay low storm data in memory, network through balls, memory calculation, eliminating batch processing time. 4, Storm throughput is less than Hadoop. Not suitable for batch processing. The storm cluster consists primarily of a primary node and a group of worker nodes, which are coordinated through zookeeper.
Stormstructure diagram of the system: •
Master node:
• The master node usually runs a background program--NimbusTo respond to nodes distributed across clusters, assign tasks, and monitor failures. It
very similar to the one in Hadoop.Job Tracker.
•
Working node:
• The work node also runs a daemon--supervisor that listens to work assignments and is based onrequire a worker process to run. Each
work nodes are implementations of a subset in topology. Coordination between Nimbus and supervisor is achieved through the zookeeper system or
Cluster of users.
•
Zookeeper
zookeeper is donecoordination between supervisor and Nimbus.service. While application-real-time logic is encapsulated into storm
"Topology" in the. Topology is a set of spouts (data sources) and bolts (data manipulation) via stream
groupings the diagram for the connection. The following is a more profound explanation of the terms that appear.
•
Spout:
• In short, spout reads data from the source and puts it into topology. Spout is divided into reliable and unreliable two; when Storm receives fails
, a reliable spout will re-send a tuple (a list of tuples, data items), while an unreliable spout will not consider receiving
The work or not is only fired once. The most important method in Spout is Nexttuple (), which launches a new tuple to
Topology, if no new tuple is fired, it will simply return.
•
Bolt:
All processing in the topology is done by the bolt. Bolt can do anything, like: filtering, aggregating, accessing files/data
libraries, and so on. Bolt receives data from the spout and processes it, and may send a tuple to another bolt if it encounters complex stream processing
Be processed. The most important method in Bolt is execute (), which is received with the new tuple as a parameter. Whether it's spout or bolt,
If a tuple is fired into multiple streams, these flows can be declared by Declarestream ().
topology– The package of computational logic – a diagram consisting of spouts and bolts that connects spouts and bolts in the diagram via stream grouping
Strom Study Notes (i)