Storm common mode--distributed RPC

Source: Internet
Author: User
Tags emit static class unique id port number
Distributed RPC (distributed RPC,DRPC) is used to perform parallel computations on a large number of function calls on storm. For each function call, the topology running on the storm cluster receives the parameter information of the calling function as an input stream and emits the result of the calculation as an output stream.


DRPC itself is not a feature of storm, it is a pattern derived from the basic elements of storm: Streams,spouts,bolts,topologies. DRPC can be published separately as a storm-independent library, but because of its importance, it is tied to storm.


General overview
DRPC is implemented through DRPC server, the overall working process of DRPC server is as follows:


Received an RPC call request;
Send the request to the topology on the storm;
Receiving calculation results from Storm;
Returns the result of the calculation to the client.
In the above process, a drpc call seems to be no different than a generic RPC call, as seen in client clients. The following code is the client calling the "Reach" function via DRPC, with the parameter "http://twitter.com":


Drpcclient client = new Drpcclient ("Drpc-host", 3772);
String result = Client.execute ("Reach", "http://twitter.com");
DRPC internal workflow is as follows:






The client sends the DRPC function name and arguments to the DRPC server that are called to execute.
Topology on storm is implementing this function through Drpcspout, and receives a function call stream from DPRC server;
DRPC server generates a unique ID for each function call;
The topology running on Storm starts to calculate the result, and finally connects to Drpc Server via a returnresults bolt, sending the computed result of the specified ID;
DRPC Server associates The result to the client of the corresponding initiating call by using the ID generated for each function call, and returns the result to the client.
Lineardrpctopologybuilder
Storm provides a topology builder--lineardrpctopologybuilder that automates almost all of the drpc steps. Including:


build spout;
Returns results to DRPC server;
Provides a function for the bolt to aggregate the tuples.
The following is a simple example where the DRPC topology simply appends "!" to the input parameters. After return:


Copy Code
public static class Exclaimbolt extends Basebasicbolt {
public void execute (tuple tuple, basicoutputcollector collector) {
String input = tuple.getstring (1);
Collector.emit (New Values (Tuple.getvalue (0), input + "!"));
}


public void Declareoutputfields (Outputfieldsdeclarer declarer) {
Declarer.declare (new fields ("id", "result"));
}
}


public static void Main (string[] args) throws Exception {
Lineardrpctopologybuilder builder = new Lineardrpctopologybuilder ("exclamation");
Builder.addbolt (New Exclaimbolt (), 3);
// ...
}
Copy Code
As the above example shows, we can complete the topology with little effort. When creating Lineardrpctopologybuilder, you need to specify the name "exclamation" of the DRPC function in the topology. A DRPC server can coordinate multiple functions, each with a different function name. The input to the first bolt in the topology is two fields: the first is the ID number of the request, and the second is the requested parameter.


Lineardrpctopologybuilder also needs the last bolt to emit an output stream with two fields: the first field is the request ID; the second field is the result of the calculation. Therefore, all intermediate tuples must contain the request ID as the first field.


example, Exclaimbolt appends "!" after the second field of the input tuple, and Lineardrpctopologybuilder is responsible for the rest of the coordination work: establish a connection with DRPC server and send the results to DRPC server.


Local mode Drpc
DRPC can run in local mode, the following code is how to run the above example in local mode:


Copy Code
Localdrpc drpc = new Localdrpc ();
Localcluster cluster = new Localcluster ();


Cluster.submittopology ("Drpc-demo", conf, Builder.createlocaltopology (DRPC));


System.out.println ("Results for ' hello ':" + drpc.execute ("exclamation", "Hello"));


Cluster.shutdown ();
Drpc.shutdown ();
Copy Code
First, create a Localdrpc object that emulates a drpc Server locally, just as Localcluster simulates a storm cluster locally. Then create a Localcluster object to run the topology in local mode. The lineardrpctopologybuilder contains separate methods for creating local and remote topologies.


In local mode, LOCALDRPC does not bind to any ports, so the topology of storm requires an understanding of the objects to be communicated-that is why the Createlocaltopology method needs to take the Localdrpc object as input.


After the topology is loaded, the DRPC function call can be executed by invoking the Execute method on Localdrpc.


Remote Mode DRPC
Running DRPC on the actual storm cluster is just as easy. Just complete the following steps:


Start Drpc Server (s);
Configure Drpc Server (s) address;
Submit the DRPC topology to the Storm cluster.
First, start Drpc Server with the storm script:


Bin/storm DRPC
Then, configure the DRPC server address in the Storm cluster, which is where drpcspout reads the function call request. The configuration of this step can be done through the configuration of the Storm.yaml file or topology. The Storm.yaml file is configured as follows:


Drpc.servers:
-"drpc1.foo.com"
-"drpc2.foo.com"
Finally, start the Drpc topology with Stormsubmitter. In order to run the above example in remote mode, the code is as follows:


Stormsubmitter.submittopology ("Exclamation-drpc", conf, Builder.createremotetopology ());
The createremotetopology is used to create the appropriate topology for the Storm cluster.


A complicated example.
The above exclamation is just a simple drpc example. Here's a complex example of how to drpc--compute the reach of each URL on Twitter, the number of different people each URL exposes to, within a storm cluster.


To complete this calculation, you need to complete the following steps:


Get all the people who have selected (tweet) the URL;
Get the followers of everyone in step 1 (followers, fans);
Followers of all followers;
Sum the number of followers in step 3.
A simple URL arrival calculation can involve thousands of database calls and millions of followers records, which are computationally significant. With Storm, it will be easy to achieve this computational process. A single machine may take several minutes to complete, and even the hardest-to-compute URLs in a storm cluster can take only a few seconds.


The code for this example is in Storm-starter: click here. Here is the code for how to create the topology:


Copy Code
Lineardrpctopologybuilder builder = new Lineardrpctopologybuilder ("reach");
Builder.addbolt (New Gettweeters (), 3);
Builder.addbolt (New Getfollowers (), 12)
. shufflegrouping ();
Builder.addbolt (New Partialuniquer (), 6)
. fieldsgrouping (new fields ("id", "follower"));
Builder.addbolt (New Countaggregator (), 2)
. fieldsgrouping (new fields ("id"));
Copy Code
The execution of the topology is divided into the following four steps:
Gettweeters: Gets a list of all users who tweet the specified URL, which translates the input stream [ID, url] into the output stream [ID, tweeter], and each URL tuple is mapped to multiple tweeter tuples.
Getfollowers: Get the followers of all user lists in step 1, this bolt converts the input stream [ID, twetter] to the output stream [ID, follower], when a person is simultaneously a multi-person followers follower, And these people all tweet the specified URL, resulting in a duplicate follower tuple.
Partialuniquer: Groups all followers by follower ID so that the same follower is processed in the same task. This bolt receives the follower and does a de-counting.
Countaggregator: Receives the count result of each part from each partialuniquer, and completes the arrival degree calculation after summing up.
Here is the code implementation of this bolt Partialuniquer:


Copy Code
public class Partialuniquer extends Basebatchbolt {
Batchoutputcollector _collector;
Object _id;
set<string> _followers = new hashset<string> ();

@Override
public void Prepare (MAP conf, topologycontext context, Batchoutputcollector Collector, Object ID) {
_collector = collector;
_id = ID;
}


@Override
public void execute (tuple tuple) {
_followers.add (tuple.getstring (1));
}

@Override
public void Finishbatch () {
_collector.emit (New Values (_id, _followers.size ()));
}


@Override
public void Declareoutputfields (Outputfieldsdeclarer declarer) {
Declarer.declare (new fields ("id", "Partial-count"));
}
}
Copy Code
Partialuniquer implements the Ibatchbolt interface by inheriting Basebatchbolt, and batch Bolt provides the API to handle a batch of tuples as a whole. Each request ID creates a new batch bolt instance, and storm is responsible for the cleanup of those instances.


When Partialuniquer receives a follower tuple, executes the Execute method, adding follower to the HashSet collection corresponding to the request ID.


Batch Bolt also provides the Finishbatch method to be called when the task has already processed all tuples. Partialuniquer launches a tuple containing the number of follower IDs that are processed by the current task.


On an internal implementation, Coordinatedbolt is used to detect whether the specified bolt has received all the tuples tuples for the specified request ID. Coordinatedbolt uses direct streams management to achieve this collaborative process.


Other parts of the topology are easy to understand. The calculation process for each step of the arrival degree is done in parallel, and it is very easy to implement it through DRPC.


Non-linear Drpc Topology
Lineardrpctopologybuilder can only handle "linear" DRPC topologies--just as the arrival degree can be done by a series of sequence of steps. It is not difficult to imagine that the DRPC call contains more complex topologies with branching and merging bolts. At present, it is necessary to use Coordinatedbolt directly to complete the calculation of this nonlinear topology.


Lineardrpctopologybuilder Working process
Drpcspout launch [args, Return-info], where Return-info contains the host and port number of DRPC server, and the unique ID number generated by DRPC server for that request;
Constructing a storm topology consists of the following sections:
Drpcspout
Preparerequest (generate a request ID, create a stream for return info, create a stream for args)
Coordinatedbolt wrappers and direct groupings
Joinresult (concatenation of results with return info)
Returnresult (Connecting to Drpc Server, returning results)
Lineardrpctopologybuilder is a high-level abstraction built on the basic elements of storm.
Advanced Step
Keyedfairbolt is used to organize the processing process of multiple requests at the same time;
How to use Coordinatedbolt directly.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.