This article is translated from the Official document: Http://storm.apache.org/documentation/Concepts.html.
Topology, topology, or mapreduce-like job. An important difference is that Mr's task usually has an end, but the topology is always running. On the backend, the topology is a thrift struct (structure), so you can write the topology in any language. Java provides the Topologybuilder tool class to help assemble the topology.
Stream, Stream, is a series of tuples that, when defined, specify a schema that defines the names of each field. By default, a tuple can contain data from each native type of Java, and you can define the serialization/deserialization tool yourself. A tuple is implemented in Clojure/java with a named list whose element types can be dynamic, and storm needs only to know how to serialize the Java data into a tuple and its inverse process.
A tuple can consist of any type of object, which is serialized by default using Kryo. This is a flexible and fast serialization library that supports native types, ArrayList, HashMap, HashSet, and Clojure containers by default. Why are dynamic types supported in a tuple? A static type of key and value is required in Hadoop, so a large number of annotations are generated on the client side, which makes the API huge and difficult to use, but only benefits from type safety and is therefore not worthwhile. Instead, dynamic types work better. Further, the use of static types in a tuple is not a convincing strategy: suppose a bolt relies on multiple upstream streams, although some reflection-based magic allows us to know exactly what kind of thing it is, but it is not as convenient as the dynamic type. Finally, this will allow storm to better access the native style and dynamic language such as Clojure.
Spouts is the source of streams. Typically, spout reads a tuple from the outside and then launches it into the topology. The spout can be reliable or unreliable, and the former will replay which processing failed tuple, while the latter will not. Spout can also launch a tuple into multiple streams, which requires the use of the Outputfielddeclarer Declarestream method. The main method of spout is Nexttuple, which is run in a single thread, so don't block it! The ACK and fail two methods are also important, only the reliable type of spout will call them.
Bolts is where the processing logic lies. Can map,filter,aggr,join and so on. Its main method is execute, or you can use the Oputputcollector handle to emit a tuple downstream. ACK must be called after the bolt has processed a tuple complete.
Why should I understand some of the concepts of storm