Hadoop custom RPC protocol

Source: Internet
Author: User

RPC is called a remote process call. As hadoop is a distributed system, underlying communication libraries must implement basic RPC functions. Hadoop RPC plays the role of the underlying communication module in hadoop. For example, communication and coordination between NN and DN, am and RM are completed by hadoop RPC. Familiarity with hadoop RPC can deepen our understanding of the communication process between hadoop modules and implement some small distributed functions we want.

Hadoop RPC is detailed in many hadoop-related books. If you are interested in the principles of RPC, you can go to the source code for better understanding. However, I think we need to use it as the underlying communication library, so here I use the interface provided by hadoop RPC itself to implement a simple communication model.

The version of hadoop in this article is 2.4.1. In fact, RPC is a good module, and hadoop compatibility is very good. Compared with earlier versions, hadoop is now compatible with serialization frameworks such as protocol buffer, Avro, and early writable when the main changes occur for RPC programming library users. This article will create an RPC interface using the org. Apache. hadoop. IPC package (under the hadoop-common project directory) and use the RPC communication model of the writable serialization framework.

 

First, define a communication protocol. All communication protocols that use hadoop RPC must inherit the versionedprotocol interface. It is mainly used to add the version information of the Protocol:

import org.apache.hadoop.ipc.VersionedProtocol;public interface Protocol extends VersionedProtocol{public static final long versionID = 1L;public boolean writeFile(Info statics);}

Now we have defined a protocol for communication between the client and the server. We want the client to initiate a call, and then the remote server will process the call, then, the result is returned. We also defined the version number of this Protocol for version monitoring.

Here we want the client to execute writefile. The client will pass a parameter of the info type to this method. After the server executes the writefile method, the execution result will be returned to the client. Obviously, the network communication problem is designed here. Our parameter info must be transmitted to the server through the network, and the server returns the execution result to the client.

To facilitate object transmission over the network, hadoop RPC uses two main internal frameworks:

1. serialization layer: converts an object to a byte stream for transmission over the network.

2. Java reflection mechanism: transmits byte streams in the past, and uses reflection on the server side to regenerate this object.

Through these two frameworks, for the upper layer, we can't see how objects are transmitted. We only feel that the server side has received completely consistent objects created on the client side.

Through the simple introduction above, we should note that the info statics object must be able to be serialized, and some basic conditions for object creation must be created using reflection.

Now let's take a look at how to define the info class:

import java.io.DataInput;import java.io.DataOutput;import java.io.IOException;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.Writable;public class Info implements Writable {public Text fileName;public LongWritable taskNum;public Info(){this.fileName=new Text();this.taskNum =new LongWritable();}public void setFileName(String str){this.fileName = new Text(str);}public void setTaskNum(Long num){this.taskNum = new LongWritable(num);}public String getFileName(){return fileName.toString();}public Long getTaskNum(){return taskNum.get();}@Overridepublic void write(DataOutput out) throws IOException {fileName.write(out);taskNum.write(out);}@Overridepublic void readFields(DataInput in) throws IOException {System.out.println("Come to invoke my readFields");// TODO Auto-generated method stubfileName.readFields(in);taskNum.readFields(in);System.out.println("Read Success!!!");}}

Note the following points:

1. to serialize an object, it must inherit the writable interface. If this interface is implemented, the object transmission in RPC can be considered transparent to us.

To serialize this object, you must override two methods: Public void write (dataoutput out) throws ioexception and public void readfields (datainput in) throws ioexception.

2. To reflect objects on the server side, you must define a constructor without parameters. On the server side, this method is called to generate an object instance. In addition, it is important to note that if this object contains a member variable of the custom type, be sure to instantiate it in the constructor. Otherwise, an error will be reported when you call readfields later. Because you have not instantiated the member variables of the custom type, the data read by readfields cannot be assigned a value. An error is returned when the following operation occurs:

Exception in thread "Main" Java. Lang. Reflect. undeclaredthrowableexception
At com. Sun. Proxy. $ proxy8.writefile (unknown source)
At SJTU. Client. App. Main (App. Java: 30)
Caused by: org. Apache. hadoop. IPC. RemoteException (Org. Apache. hadoop. IPC. rpcserverexception): IPC server unable to read call parameters: NULL
At org. Apache. hadoop. IPC. Client. Call (client. Java: 1410)
At org. Apache. hadoop. IPC. Client. Call (client. Java: 1363)
At org. Apache. hadoop. IPC. writablerpcengine $ invoker. Invoke (writablerpcengine. Java: 240)
... 2 more

 

Now we have defined a protocol and serialized the parameters of methods in the Protocol. The following work is very simple. The first is to implement the Protocol we previously defined. Note that this implementation will run on the server. You need to note the running environment and context of the server.

The following is the implementation of my protocol:

Import Java. io. bufferedwriter; import Java. io. filewriter; import Java. io. ioexception; import Org. apache. hadoop. hadoopillegalargumentexception; import Org. apache. hadoop. conf. configuration; import Org. apache. hadoop. IPC. protocolsignature; import Org. apache. hadoop. IPC. RPC; import Org. apache. hadoop. IPC. RPC. server; public class writeserver implements Protocol {public server; @ overridepublic long getprotocolversion (string protocol, long clientversion) throws ioexception {return this. versionid ;}@ overridepublic protocolsignature getprotocolsignature (string protocol, long clientversion, int clientmethodshash) throws ioexception {return New protocolsignature (protocol. versionid, null) ;}@ overridepublic Boolean writefile (Info statics) {// create a file on the server and write the info statics information transmitted by the client into filewriter writer; try {writer = new filewriter ("/home/chenershuile/hellowritable"); bufferedwriter BW = new bufferedwriter (writer); BW. write (statics. getfilename (); BW. write (statics. gettasknum (). tostring (); BW. close (); writer. close ();} catch (ioexception e) {// todo auto-generated catch blocke. printstacktrace ();} return true; // after the execution is complete, true} public void Init () {try {This. server = new RPC. builder (new configuration ()). setbindaddress ("master "). setnumhandlers (5 ). setprotocol (protocol. class ). setport (50071 ). setinstance (New writeserver ()). build ();} catch (hadoopillegalargumentexception e) {// todo auto-generated catch blocke. printstacktrace ();} catch (ioexception e) {// todo auto-generated catch blocke. printstacktrace () ;}@ suppresswarnings ("static-access") Public void run () {This. server. start (); system. out. println ("the Server start at:" + this. server. getport ();} public void stop () {This. server. stop ();}}

Here, I not only implemented the Protocol, but also built a server using the builder static method provided to us by the org. Apache. hadoop. IPC. RPC class. The server will enable port 50071 for listening.

If you are interested in each set method, you can go to the org. Apache. hadoop. IPC. RPC class to view it. It is very simple, mainly to set the server configuration.

Note the setprotocol () and setinstance () methods. The former is the class of the Protocol processed by this server (the Protocol is an interface), and the latter is the instance of the class that implements this interface (New impl ()).

So far, we have completed the work on the server side. Now we can use the writeserver method to establish an efficient hadoop RPC server.

public class App {    public static void main( String[] args )    {    WriteServer writeServer = new WriteServer();    writeServer.init();    writeServer.run();        }}

Then, run the jar and you can see that the program is executed as follows:

We can find that the server has enabled running, and now we need to create a client.

 

The process of creating a client is very simple. The Code is as follows:

import java.io.IOException;import java.net.InetSocketAddress;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.ipc.RPC;import SJTU.server.Protocol;import SJTU.server.Info;/** * Hello world! * */public class App {    public static void main( String[] args )    {    InetSocketAddress addr = new InetSocketAddress("master",50071);    Configuration conf = new Configuration();        try {Protocol proxy=RPC.getProxy(Protocol.class,Protocol.versionID,addr,conf);System.out.println("Start Client");String fileName = "hdfs://";Long taskNum = 10L;Info statics = new Info();statics.setFileName(fileName);statics.setTaskNum(taskNum);System.out.println(proxy.writeFile(statics));} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}    }}

The core code is also the external static method getproxy provided to us by org. Apache. hadoop. IPC. rpc. Here I will briefly describe the meaning of the parameters: the first two have nothing to say, it is related to the Protocol. ADDR is an inetsocketaddress, which represents the server address. Of course, you must be consistent with the server to ensure that the server can be accessed. According to my understanding, the main role of conf is the corresponding configuration file. What it does is not comprehensive to me. I also hope that God can answer it for me.

 

To better understand the calling process, I wrote a lot of system. Out. println () in the related classes to view the related process... So my execution results are also strange...

The main classes involved on the server end, except org. Apache. hadoop. IPC. RPC and writablerpcengine, both of which are in the same package and complete writable serialization and comparison.

The real serialized reading of objects is in: objectwritable, which is in the org. apcha. hadoop. Io package.

For the object reflection mechanism, hadoop adds another layer to the underlying java. Lang. reflect layer: reflectionutil, which is in the org. Apache. hadoop. util package.

 

Finally, the execution process of the program is as follows. Many additional outputs on the terminal are added to the modifications of the above classes. This helps us understand reflection, proxy, and communication, serialize the execution of these processes. Of course, you need to analyze the entire RPC process in more detail.

Server:

 

Client:

 

 

Note that the info object will be readfields on the server side. In addition, reflection, instantiation, and other operations are performed on the server side. For details, refer to the hadoop technology and related classes mentioned above.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.