About Storm
Storm is a distributed real-time streaming framework that is mostly used in the following scenarios: real-time analytics, online machine learning, streaming computing, distributed RPC ETL (BL analysis), and more. The same type of framework has Hadoop and spark. Hadoop focuses on offline computing of massive amounts of data, and Spark is better at real-time iterative computing. It is important to note that storm does not directly handle the data, but instead puts our business programs (logic) on many servers concurrently, and the pending messages are distributed across many servers and processed concurrently to extend the load capacity of the program.
Direction
In a nutshell, the Storm framework contains two parts. One is the storm program, the other is the storm cluster.
The storm program consists of two parts. One is spout, and the other is bolt. There can be multiple bolts at the same time.
The storm cluster also contains two roles: Nimbus (cluster master node) and supervisor (cluster slave node).
Only the storm program is described here, the storm cluster is simple to understand and will have a chance to detail later.
Storm Program
The processing logic for storm is as follows:
Next I'll use an example to detail the Storm program's writing, using the Storm framework to import storm jar packages in advance, which can be downloaded on Apache's website.
Business requirements are as follows:
Suppose you want to convert lowercase to uppercase with a suffix, such as converting IPHONE to Iphone_suffix, we do it in two steps. The business process flow is as follows:
1, Spout read the data and encapsulated for the tuple sent out
2. Upper_bolt convert the product name to uppercase
3, Suffix_bolt will be capitalized after the product name plus a suffix
4. Topomain describes the structure of the topology and creates the topology and submits it to the cluster
The processing code for spout is as follows:
For simplicity, spout data is not obtained externally, but is obtained randomly from an internal array. So the class name is named randomspout
1 Public classRandomspoutextendsBaserichspout {2Spoutoutputcollector collector =NULL;3string[] Goods = {"iphone", "Xiaomi", "Meizu", "zhongxing", "Huawei", "Moto", "Sangsung"};4 5 /**6 * The method of getting the message and sending it to the next component is constantly being called by storm7 * 8 * Randomly take a product name from the goods package into a tuple send out9 */Ten @Override One Public voidnexttuple () { A //randomly fetch a product name -Random random =NewRandom (); -String good =Goods[random.nextint (goods.length)]; the - //encapsulated in a tuple sent out -Collector.emit (NewValues (good)); - + //hibernate for some time -Utils.sleep (2000); + } A at //initialized, called only once at the beginning - @Override - Public voidOpen (Map conf, topologycontext context, Spoutoutputcollector collector) { - This. Collector =collector; - } - in /** - * Scheme for defining tuple to */ + @Override - Public voiddeclareoutputfields (Outputfieldsdeclarer declarer) { theDeclarer.declare (NewFields ("Src_word")); * } $}
The processing code for Upper_bolt is as follows:
1 /**2 * Convert the original product name to uppercase and send it out again3 * @authorShy4 *5 */6 Public classUpperboltextendsBasebasicbolt {7 @Override8 Public voidExecute (tuple tuple, basicoutputcollector collector) {9 //get data from a tuple--the original product nameTenString Src_word = tuple.getstring (0); One //Convert to uppercase AString Upper_word =src_word.touppercase (); - //Send it out -Collector.emit (NewValues (Upper_word)); the } - - @Override - Public voiddeclareoutputfields (Outputfieldsdeclarer declarer) { +Declarer.declare (NewFields ("Upper_word")); - } +}
The processing code for Suffix_bolt is as follows:
1 /**2 * Add a suffix to the product name and write the data to the file3 * @authorShy4 *5 */6 Public classSuffixboltextendsBasebasicbolt {7FileWriter FileWriter =NULL;8 Public voidPrepare (Map stormconf, Topologycontext context) {9 Try {TenFileWriter =NewFileWriter ("/home/hadoop" +Uuid.randomuuid ()); One}Catch(IOException e) { A e.printstacktrace (); - } - } the - @Override - Public voidExecute (tuple tuple, basicoutputcollector collector) { - //get the data sent from the previous component from the message tuple . +String Upper_word = tuple.getstring (0); - + //add a suffix to the product name AString result = Upper_word + "_suffix"; at - //Save the results to a file - Try { - filewriter.append (result); -Filewriter.append ("\ n"); - Filewriter.flush (); in}Catch(IOException e) { - e.printstacktrace (); to } + } - the //The field definition of a tuple that declares the component to send out * @Override $ Public voiddeclareoutputfields (Outputfieldsdeclarer arg0) {Panax Notoginseng //TODO auto-generated Method Stub - } the}
The processing code for Topomain is as follows:
1 /**2 * Describe the structure of the topology and create the topology and submit it to the cluster3 * @authorShy4 *5 */6 Public classTopomain {7 8 Public Static voidMain (string[] args)throwsException {9 TenTopologybuilder Topologybuilder =NewTopologybuilder (); One //set the message Source component to Randomspout ATopologybuilder.setspout ("Randomspout",NewRandomspout (), 4); - - //sets the logical processing component Upperbolt and specifies the message that accepts Randomspout theTopologybuilder.setbolt ("Upperbolt",NewUpperbolt (), 4). shufflegrouping ("Randomspout"); - - //sets the logical processing component Suffixbolt and specifies the message to receive Upperbolt -Topologybuilder.setbolt ("Suffixbolt",NewSuffixbolt (), 4). shufflegrouping ("Upperbolt"); + - //Create a topology +Stormtopology topo =topologybuilder.createtopology (); A at //to create a storm configuration Parameter object -Config conf =NewConfig (); - //set the number of processes started by the storm cluster for this topo -Conf.setnumworkers (4); -Conf.setdebug (true); -Conf.setnumackers (0); in - //Submit topo to storm cluster toStormsubmitter.submittopology ("Demotopo", Conf, topo); + } -}
After the storm program is written, it can be packaged into a jar package to be submitted to the storm cluster to run.
Storm cluster
The cluster setup consists of the following steps:
1, cluster installation, first install zookeeper cluster
2. Upload the Storm installation package to the server
3. Unpack the installation package
4. Modify the configuration file
5. Start the cluster
To start a storm cluster:
1. Start Nimbus First
Bin/storm Nimbus 1>/dev/null 2>&1 &
2. Start a Web service process
Bin/storm UI 1>/dev/null 2>&1 &
3. Start the supervisor process on each node
Bin/storm Supervisor 1>/dev/null 2>&1 &
Storm starter (Storm program)