Storm provides a set of clojure DSL to define spouts, bolts, and topologies. Since clojure DSL can call all exposed Java APIs
Clojure developers can write storm topologys without having to access Java code. The code that defines clojure DSL is in the namespace of backtype. Storm. clojure.
This section describes how to use clojure DSL, including:
1. Define Topologies
2. defbolt
3. defspout
4. Run topologies in local or cluster mode
5. Test Topologies
Define Topologies
Define a topology by using the topology method. The topology method accepts two parameters: a map whose value is spout-spec type and a map whose value is bolt-spec type.
These spout-spec and bolt-spec set the input and parallelism of this topology.
Let's look at an example in the storm-starter project:
(Topology
{"1" (spout-spec sentence-spout)
"2" (spout-spec (sentence-spout-parameterized
["The cat jumped over the door"
"Greetings from a faraway land"])
: P 2 )}
{"3" (bolt-spec {"1": shuffle "2": shuffle}
Split-sentence
: P 5)
"4" (bolt-spec {"3" ["word"]}
Word-count
: P 6 )})
The spout-spec and bolt-spec in maps are a ID ing between ID and corresponding spec. This ID must be unique in the entire map. Similar to using Java, this ID is used
Define the input of all bolts.
Spout-spec
Spout-spec parameters include an instance of spout (achieving irichspout) and optional keyword parameters. Currently, the only option is P. This option specifies the degree of parallelism of spout. If
If this option is not set, spout runs in one thread.
Bolt-spec
Bolt-spec parameters include bolt input instructions, bolt instances (implement irichbolt), and optional keyword Parameters
The input parameter is a ing between Stream IDs and stream groupings. Stream IDS can be in the following two forms:
1: [= component ID === Stream ID =]: specify a specific stream ID for the component
2: = component id =: use the default stream for a component
The stream group must be one of the following types:
1: shuffle: randomly sent to other tasks
2: stores the vector of the field, for example, ["ID" "name"]: tuple with the same field value will go to the same task
3: Global: global group, which is assigned to the task with the lowest id value.
4: All: Broadcast Transmission. All bolts receive each tuple.
5: Direct grouping. The message sender specifies which task of the Message Receiver processes the message.
The https://github.com/nathanmarz/storm/wiki/Concepts has more information about stream groupings, and the following example shows the declaration of different types of input parameters.
{["2" "1"]: shuffle
"3" ["field1" "field2"]
["4" "2"]: Global}
This input declares 3 streaming, customizes the stream of "1" for component "2", and uses shuffle grouping to customize the default stream for component "3", using fields grouping
The component "4" customizes the stream of "2" and uses global
Like spout-spec, bolt-spec provides: P to specify the degree of parallelism of bolts.
Shell-bolt-spec
Shell-bolt-spec is used to define bolts for non-JVM language. It receives input, output configurations, commands, and the name of the file that implements bolts. It also supports the same parameters as bolt-spec.
Here is an example of Shell-bolt-spec:
(Shell-bolt-spec {"1": shuffle "2" ["ID"]}
"Python"
"Mybolt. py"
["Outfield1" "outfield2"]
: P 25)
More detailed syntax description see https://github.com/nathanmarz/storm/wiki/Using-non-JVM-languages-with-Storm
Defbolt
Defbolt is used to define a bolt using clojure, which must be serializable. This is why irichbolt is implemented to instantiate a bolt,
(Closures cannot implement serialization.) defbolt uses some good syntax to implement these features, not just to implement a Java interface.
In addition, defbolt supports parameterized bilts and maintenance status in one bolt implementation. It also provides shortcuts to define bolts without additional methods.
Defbolt is defined as follows:
(Defbolt name output-Declaration * option-map & impl)
The omitted option map is equivalent to the use of {: prepare false }.
Simple Bolts
In the simplest way, here is a bolt that splits the tuple of a sentence into words.
(Defbolt split-sentence ["word"] [tuple collector]
(Let [words (. Split (. getstring tuple 0) "")]
(Doseq [w words]
(Emit-bolt! Collector [w]: Anchor tuple ))
(Ack! Collector tuple)
))
The option map is omitted in this example, so it is a non-prepared bolt. This DSL implements the execute method of irichbolt. This implementation receives two parameters.
One is tuple, the other is outputcollection, and the next is the execute method body. DSL automatically maps parameter types, so you don't have to worry about how to interoperate with Java.
This implementation binds dplit-sentence to a real irichbolt. You can use it in a topologies, like this:
(Bolt-spec {"1": shuffle}
Split-sentence
: P 5)
Parameterized Bolts
In many cases, we want to use other parameters. For example, if we want to implement a bolt with the suffix appended to the received parameters, we can make the defbolt include: Params
Option is in option map, like this:
(Defbolt suffix-appender ["word"] {: Params [suffix]}
[Tuple collector]
(Emit-bolt! Collector [(STR (. getstring tuple 0) suffix)]: Anchor tuple)
)
Unlike the above example, suffix-appender returns an irichbolt instead of implementing an irichbolt. This is because a Params is specified in option map.
Therefore, to use suffix-appender in topology, you need to write as follows:
(Bolt-spec {"1": shuffle}
(Suffix-appender "-suffix ")
: P 10)
Prepared Bolts
A more comprehensive bolts can be used for joins and stream aggregation. This bolt needs to be stored. You can create an option map bolt with {: Prepare true. Join us to make
Example of word statistics:
(Defbolt word-count ["word" count "] {: Prepare true}
[Conf context collector]
(Let [counts (atom {})]
(Bolt
(Execute [tuple]
(Let [word (. getstring tuple 0)]
(Swap! Counts (partial merge-with +) {word 1 })
(Emit-bolt! Collector [word (@ counts word)]: Anchor tuple)
(Ack! Collector tuple)
)))))
The implementation of prepared bolt is a method for topology configuration. Topologycontext, and outputcollector, and an iBOT instance is returned. This design requirement
There is an implementation of execute and cleanup.
In this example, word statistics are stored in the map called counts. The bolt macro is used to create an iBOT instance. Bolt macro is a simple method to implement this interface,
And it automatically maps all parameters in the method. This bolt implements the execute method to update the statistical value and launch a new word count.
Note that the execute method is used as the input tuple in prepared bolts because outputcollector already exists in the closure method.
(The collector of simple bolts is passed to the excute method as the second parameter)
Output declarations
Clojure DSL has a concise syntax to describe bolt output. This simple method is to define a map from Stream ID to stream spec.
For example:
{"1" ["field1" "field2"]
"2" (direct-stream ["f1" "F2" "F3"])
"3" ["f1"]}
This stream ID is a string, while stream spec is a field or a vector using the direct-stream field.
Direct stream indicates that this stream is a direct group.
If this bolt only has one output stream, you can define a default stream through a vector instead of defining a map. For example:
["Word" "count"]
Yes. Use ["word" "count"] to configure the output for the default Stream ID.
Emitting, ACKing, and failing
Compared to the direct use of the Java method outputcollector, clojure DSL provides a more friendly way to use outputcollector:
Emit-bolt !, Emit-direct-bolt !, Ack !, And fail!
1. emit-bolt: receives outputcollector parameters, emit a values, and provides keywords: anchor and: stream,: anchor is a separate tuple or
Tuple list: stream is the ID of the emit stream. If this keyword parameter is ignored, an unreliable tuple will be sent to the default stream.
2. emit-direct-bolt! : As a parameter of outputcollector, the task id is used to send tuple and send values. It also has the keyword parameters anchor and stream.
This method can only send streams in the form of direct grouping.
3. Ack! : Receives outputcollector parameters to maintain tuple reliability.
4. Fail! : Receives the outputcollector parameter and determines whether tuple fails.
Defspout
Defspout is used to define spout using clojure. Like bolts, spout must also be serializable, so it is not only possible to implement iricheat out. Defspout provides
Better implementation than direct implementation of Java API.
Defspout format:
(Defspout name output-Declaration * option-map & impl)
If option map is ignored, its default value is {: Prepare true }. The statement output syntax is the same as that of defbolt.
Here is an atomic storm-starter defspout implementation:
(Defspout sentence-spout ["sentence"]
[Conf context collector]
(Let [sentences ["a little brown dog"
"The man petted the dog"
"Four score and seven years ago"
"An apple a day keeps the doctor away"]
(Spout
(Nexttuple []
(Thread and sleep 100)
(Emit-spout! Collector [(rand-nth sentences)])
)
(Ack [ID]
; You only need to define this method for reliable spouts
; (Such as one that reads off of a queue like Kestrel)
; This is an unreliable spout, so it does nothing here
))))
Topologycontext and spoutoutputcollector are used as the configuration for implementing topology input. This implementation returns an ispout object. The nexttuple method emits a random
Sentence
This spout is unreliable, so the ACK and fail methods will never be called. A reliable spout needs to add the Message ID when transmitting tuple, so that ack and fail
It will be called when tuple is completed or fails.
Emit-spout! A parameter spoutoutputcollector is required, and a new tuple is sent, and the receiving keyword parameter stream and ID.: stream are used to specify where the stream will be sent,
: Id specifies the source ID of the message (callback is used when ACK or fail is used). If you ignore these parameters, an unreliable tuple using the default stream group will be sent.
There is also an emit-direct-spout! Method to launch a directly grouped tuple, and an additional parameter is required as the task ID of the second parameter to launch tuple.
Spout can be parameterized like bolts. In this case, the symbol depends on a method that returns iricheat out instead of irichdpout itself.
You can declare an incomplete spout that only defines the nexttuple method.
Here is an example of random sentence emission during runtime.
(Defspout sentence-spout-parameterized ["word"] {: Params [sentences]: prepare false}
[Collector]
(Thread and sleep 500)
(Emit-spout! Collector [(rand-nth sentences)])
The following example shows how to use this spout in spout-spec.
(Spout-spec (sentence-spout-parameterized
["The cat jumped over the door"
"Greetings from a faraway land"])
: P 2)
Run topology in local mode or cluster mode
Here is all clojuredsl. To submit the topology in remote or local mode, you only need to use the stormsubmitter or localcluster class, just like Java.
To create a topology configuration, you can simply use backtype. Storm. config, which has the default constant configuration.
These configurations are the same as static configurations under the config class, except that the bars are underlined. For example, there is a topology configuration where the number of workers is 15, and
Topology adopts the debug mode:
{Topology-Debug true
Topology-workers 15}