Chapter 1 of Storm development using non-JVM languages

Source: Internet
Author: User
Tags emit
Sometimes you may want to develop a Storm project in a language that is not based on JVM. You may prefer to use other languages or libraries written in a language. Storm is implemented in Java. All the spouts and bolts in this book are written in java. It is possible to use languages such as Python, Ruby, or JavaScript.

Sometimes you may want to develop a Storm project in a language that is not based on JVM. You may prefer to use other languages or libraries written in a language. Storm is implemented in Java. All the spouts and bolts in this book are written in java. It is possible to use languages such as Python, Ruby, or JavaScript.

Sometimes you may want to develop a Storm project in a language that is not based on JVM. You may prefer to use other languages or libraries written in a language.

Storm is implemented in Java.SpoutAndBoltThey are all written in java. It may be written in languages such as Python, Ruby, or JavaScript.SpoutAndBolt? The answer is yes! AvailableMultilingual protocolsTo achieve this goal.

Multi-language protocol is a special protocol implemented by Storm, which uses standard input and outputSpoutAndBoltCommunication channel between processes. Messages are transmitted in the channel in JSON or plain text format.

Let's look at a non-JVM Language DevelopmentSpoutAndBolt. In this exampleSpoutGenerate a number from 1 to 10,000,BoltFilter prime numbers. Both are implemented using PHP.

NOTE :?In this example, we use a very stupid method to verify the prime number. There are better and more complex methods, which are beyond the scope of this example.

There is a php dsl specifically implemented for Storm (Translator's note: domain-specific language). We will show our implementation in the example. First, define the topology.

...TopologyBuilder builder = new TopologyBuilder();builder.setSpout("numbers-generator", new NumberGeneratorSpout(1, 10000));builder.setBolt("prime-numbers-filter", newPrimeNumbersFilterBolt()).shuffleGrouping("numbers-generator");StormTopology topology = builder.createTopology();...

NOTE:There is a way to define the topology using a non-JVM language. Since the Storm topology is a Thrift architectureNimbusIs a Thrift daemon, you can use any language you want to create and submit the topology. However, this is beyond the scope of this book.

There is nothing new here. Let's take a look.NumbersGeneratorSpout.

public class NumberGeneratorSpout extends ShellSpout implements IRichSpout {    public NumberGeneratorSpout(Integer from, Integer to) {       super("php", "-f", "NumberGeneratorSpout.php", from.toString(), to.toString());    }    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("number"));    }    public Map getComponentConfiguration() {        return null;    }}

You may have noticed thatSpoutInheritedShellSpout. This is a special class provided by Storm to help you run and control the programming in other languages.Spout. In this case, it tells Storm how to execute your PHP script.

The NumberGeneratorSpout PHP script distributes tuples to the standard output and reads the confirmation or failure signals from the standard input.

Before implementing the NumberGeneratorSpout. php script, check how the multi-language protocol works.

SpoutAccording to the parameters passed to the constructorFromToToGenerate numbers sequentially.

Next let's take a lookPrimeNumbersFilterBolt. This class implements the shell mentioned earlier. It tells Storm how to execute your PHP script. Storm provides a specialShellBoltClass, the only thing you have to do is to point out how to run the script and declare the attributes to be distributed.

public class PrimeNumbersFilterBolt extends ShellBolt implements IRichBolt {    public PrimeNumbersFilterBolt() {        super("php", "-f", "PrimeNumbersFilterBolt.php");    }    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("number"));    }}

In this constructor, it only tells Storm how to run the PHP script. It is equivalent to the following command.

    php -f PrimeNumbersFilterBolt.php

The PrimeNumbersFilterBolt. php script reads tuples from the standard input, processes them, and distributes, confirms, or fails to the standard output. Before starting this script, we should first learn more about how the multi-language protocol works.

  1. Initiate a handshake
  2. Start cycle
  3. Read/write tuples

NOTE:There is a special way to use Storm's built-in logging mechanism to record logs in your script, so you do not need to implement your own log system.

Next, let's take a look at the details of each step and how to implement it using PHP.

Initiate handshake

To control the entire process (start and end it), Storm needs to know the script process number (PID) it executes ). According to the multi-language protocol, the first thing that occurs at the beginning of your process is that Storm must enter the standard information, the standard input and output mentioned in this chapter are understood from a non-JVM Language Perspective, and the standard input mentioned here is the standard input of PHP) to send a piece of JSON data, it contains Storm configuration, topology context, and a process number directory. It looks like the following:

{    "conf": {        "topology.message.timeout.secs": 3,        // etc    },    "context": {        "task->component": {            "1": "example-spout",            "2": "__acker",            "3": "example-bolt"        },        "taskid": 3    },    "pidDir": "..."}

The script process must bePidDirIn the specified directory, create a file named "process" and write the process number to the standard output in JSON format.

{"pid": 1234}

For example, if you receive/Tmp/example \ nAnd your script process number is 123, you should create/Tmp/example/123?{"Pid": 123} \ n(Note: There is only one n in the original text, and the translator guesses that it is a typographical error.) andEnd \ n. In this way, Storm can continuously track the process number and kill the script process when it is disabled. The following is the PHP implementation:

$config = json_decode(read_msg(), true);$heartbeatdir = $config['pidDir'];$pid = getmypid();fclose(fopen("$heartbeatdir/$pid", "w"));storm_send(["pid"=>$pid]);flush();

You have implementedRead_msgTo process messages read from standard input. According to the multi-language protocol, a message can be a single line or multiple lines of JSON text. A messageEnd \ nEnd.

function read_msg() {    $msg = "";    while(true) {        $l = fgets(STDIN);        $line = substr($l,0,-1);        if($line=="end") {            break;        }        $msg = "$msg$line\n";    }    return substr($msg, 0, -1);}function storm_send($json) {    write_line(json_encode($json));    write_line("end");}function write_line($line) {    echo("$line\n");}

NOTE:The flush () method is very important. It is possible that the character buffer is cleared only when it is accumulated to a certain extent. This means that your script may be suspended forever to wait for an input from Storm, while Storm is waiting for the output from your script. Therefore, it is important to clear the buffer immediately when your script has content output.

Start loop and read/write tuples

This is the most important step in the work. The implementation of this step depends on your developmentSpoutAndBolt.

If yesSpout, You should start distributing tuples. If yesBoltTo read the tuples cyclically, process them, distribute them, and confirm success or failure.

Let's take a look atSpout.

$from = intval($argv[1]);$to = intval($argv[2]);while(true) {    $msg = read_msg();    $cmd = json_decode($msg, true);    if ($cmd['command']=='next') {        if ($from<$to) {            storm_emit(array("$from"));            $task_ids = read_msg();            $from++;        } else {            sleep(1);        }    }    storm_sync();}

Obtain parameters from the command lineFromAndToAnd start iteration. Each time you get an entry from StormNextMessage, which means that you are ready to distribute the next tuples.

Once you send all the numbers and no more groups can be sent, you can sleep for a while.

To ensure that the script is ready to send the next tuples, Storm will wait before sending the next one.Sync \ nText lines. CallRead_msg (), Read a command and parse JSON.

ForBoltsThere are a few differences.

while(true) {    $msg = read_msg();    $tuple = json_decode($msg, true, 512, JSON_BIGINT_AS_STRING);    if (!empty($tuple["id"])) {        if (isPrime($tuple["tuple"][0])) {            storm_emit(array($tuple["tuple"][0]));        }        storm_ack($tuple["id"]);    }}

Read the tuples from the standard input cyclically. Parse and read each JSON message to determine whether the message is a tuples. If yes, check whether the message is a prime number. If the message is a prime number, distribute it again. Otherwise, ignore it, finally, confirm the success regardless of the situation.

NOTE:InJson_decodeTheJSON_BIGINT_AS_STRINGTo solve a data conversion problem between JAVA and PHP. Some large numbers sent by JAVA will lose precision in PHP, which will lead to problems. To avoid this problem, PHP is told to process large numbers as strings, and no double quotation marks are used for outputting numbers in JSON messages. PHP5.4.0 or later requires this parameter.

Emit, ack, fail,AndLogMessages are structured as follows:

Emit

{    "command": "emit",    "tuple": ["foo", "bar"]}

The array contains the tuples you distribute.

Ack

{    "command": "ack",    "id": 123456789} 

The id is the ID of the tuples you process.
Fail

{    "command": "fail",    "id": 123456789} 

AndAck(Translator's note: the original article isEmitIt can be determined from the content of up and down JSON and the function of each method.Ack, May be typographical errors), where id is the ID of the tuples you process.
Log

{    "command": "log",    "msg": "some message to be logged by storm."} 

The following is the complete PHP code.

// Your spout:
 "Emit", "tuple" => $ tuple); storm_send ($ msg);} function storm_send ($ json) {write_line (json_encode ($ json )); write_line ("end");} function storm_sync () {storm_send (array ("command" => "sync");} function storm_log ($ msg) {$ msg = array ("command" => "log", "msg" => $ msg); storm_send ($ msg); flush ();} $ config = json_decode (read_msg (), true); $ heartbeatdir = $ config ['piddir']; $ pid = getmypid (); fclose (fopen ("$ h Eartbeatdir/$ pid "," w "); storm_send ([" pid "=> $ pid]); flush (); $ from = intval ($ argv [1]); $ to = intval ($ argv [2]); while (true) {$ msg = read_msg (); $ cmd = json_decode ($ msg, true); if ($ cmd ['COMMAND '] = 'Next') {if ($ from <$) {storm_emit (array ("$ from"); $ task_ids = read_msg (); $ from ++;} else {sleep (1 );}} storm_sync () ;}?> // Your bolt:
 "Emit", "tuple" => $ tuple); storm_send ($ msg);} function storm_send ($ json) {write_line (json_encode ($ json )); write_line ("end");} function storm_ack ($ id) {storm_send (["command" => "ack", "id" => "$ id"]);} function storm_log ($ msg) {$ msg = array ("command" => "log", "msg" => "$ msg"); storm_send ($ msg );} $ config = json_decode (read_msg (), true); $ heartbeatdir = $ config ['piddir']; $ pid = getmypid (); fclose (fopen ("$ h Eartbeatdir/$ pid "," w "); storm_send ([" pid "=> $ pid]); flush (); while (true) {$ msg = read_msg (); $ tuple = json_decode ($ msg, true, 512, JSON_BIGINT_AS_STRING); if (! Empty ($ tuple ["id"]) {if (isPrime ($ tuple ["tuple"] [0]) {storm_emit (array ($ tuple ["tuple"] [0]);} storm_ack ($ tuple ["id"]) ;}}?>

NOTE:It should be noted that all the script files should be stored in a file namedMultilang/resources. This subdirectory is contained in the jar file sent to the worker process. If you do not include scripts in this directory, Storm cannot run them and throw an error.

(Full text) if you like this article, click like, share, and comment.

  • Source of original article reprinted: Chapter 1 of Storm development in non-JVM languages
  • Small sponsorship site:: I want to sponsor

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.