[Read hadoop source code] [6]-org. Apache. hadoop. IPC-IPC overall structure and RPC

Source: Internet
Author: User

1. Preface

Hadoop RPC is mainly implemented through the dynamic proxy and reflection (reflect) of Java,Source codeUnder org. Apache. hadoop. IPC, there are the following main classes:

Client: the client of the RPC service

RPC: implements a simple RPC model.

Server: abstract class of the server

Rpc. SERVER: specific server class

Versionedprotocol: All classes that use the RPC service must implement this interface. It is used to determine whether the proxy object is correctly created when a proxy is created.

 

2.Hadoop RPC simple process

In short, hadoop RPC = dynamic proxy + customized binary stream. If you do not pay attention to the details, the structure is roughly like

The remote object has a fixed interface, which is visible to users only on the server. If the user wants to use that implementation, his call process is as follows: first, a proxy object is generated dynamically based on the interface. When this proxy object is called, the user's call request is captured by RPC, packaged as a call request, serialized as a data stream and sent to the server. The server parses the call request from the data stream, and then according to the interface that the user wants to call, call the real implementation object of the interface, and then return the call result to the client.

 

3.What is in hadoop rpc. Java?

RPC provides a simple RPC mechanism, which provides the following

Static Method:

1) *** proxy

Waitforproxy, getproxy, and stopproxy are proxy-related methods. Wait must ensure that the namenode starts normally and the connection is normal, which is mainly used by secondaynode, datanode, and jobtracker.

Ø function prototype

1 public static versionedprotocol getproxy (

2 class <? Extends versionedprotocol> protocol,

3 long clientversion,

4 inetsocketaddress ADDR,

5 usergroupinformation ticket,

6 configuration Conf,

7 socketfactory factory) throws ioexception;

Ø parameter description

1) Protocol: the RPC server provides the RPC service interface.

2) clientversion: The Client Version.

3) ADDR: RPC server address.

4) ticket

5) conf: configuration item.

6) Factory: Socket factory.

 

The stop method is to stop the agent.

Get is a general method to get the proxy. Create a proxy instance and obtain the versioncode of the proxy instance. Then, compare it with the versioncode passed in by the getproxy method. If the same response proxy is used, different versionmismatch exceptions are thrown.

2) getserver

Create and return a server instance, which is used by tasktracker, jobtracker, namenode, and datanode.

Ø function prototype:

1 public static server getserver (

2 final object instance,

3 final string bindaddress,

4 Final int port,

5 Final int numhandlers,

6 Final Boolean verbose,

7 configuration conf) throws ioexception;

Ø parameter description

1) instance: the object instance of the RPC server, that is, the interface instance called by the RPC client.

2) bindaddress: IP address of the RPC server listener.

3) Port: the port number listened by the RPC server.

4) numhandlers: Number of handler threads that process the call queue.

5) verbose

6) conf: configuration item.

 

3) Call

Static Method: Send a series of requests to a series of servers. The class is not seen in the source code to use this method. However, the Notes mentioned that the expert should be an interface for the system administrator.

 

RPC static class:

The RPC method only mentions the role of the method, but the specific implementation is not mentioned. The specific implementation involves the static RPC class. The RPC class has five static internal classes:

Rpc. clientcache:Used to cache client objects;

Rpc. Invocation:The parameter entity class passed through each RPC call. The invocation includes the call method and configuration file;

Rpc. invoker:The specific call class adopts the dynamic proxy mechanism of Java, inherits from invocationhandler, and has remoteid and client members. ID is used to identify the asynchronous request object, and client is used to call the implementationCode;

Rpc. Server:The specific class of org. Apache. hadoop. IPC. server implements the call method of the abstract class, obtains the call instance of the input parameter, obtains the method, and calls it. The reflection mechanism is used, and the reflection is very powerful. Before we use it, we have no idea how to execute the code;

RPC.Versionmismatch:Version mismatch error.

 

Class graph Structure

 

4. Protocol interface:

Hadoop is a master-slave model. The master only accepts and corresponds to requests. When sending a request, slave may also accept other requests. Other requests come from the slave partner or client.

Versionedprotocol says that all classes that want to use the RPC service must implement this interface. Let's take a look at which interfaces inherit this interface.

1) HDFS Problems

Clientdatanodeprotocol: The interface between the client and datanode. There are not many operations, and there is only one block restoration method. What about other data request methods? The main interaction between the client and datanode is implemented through the stream socket. The source code is in dataxceiver, so I won't talk about it here;

Clientprotocol: Interfaces for client interaction with namenode. All control flow requests are provided here, such as creating and deleting files;

Datanodeprotocol: Interfaces for interaction between datanode and namenode, such as heartbeat and blockreport;

Namenodeprotocol: The interface between secondarynode and namenode.

2) Mapreduce-related

Interdatanodeprotocol: Internal interaction interface of datanode, used to update block metadata;

Innertrackerprotocol: Tasktracker and jobtracker interfaces, similar to datanodeprotocol;

Jobsubmissionprotocol: Interfaces between jobclient and jobtracker, used to submit jobs, obtain jobs, and other job-related operations;

Taskumbilicalprotocol: The interface for interaction between the subprocess and the parent process. The Subprocess is map, reduce, and other operations. The parent process is tasktracker. This interface can return the running status of the subprocess (vocabulary Literacy: umbilical umbilical, closely related ).

3) Other

Adminoperationprotocol: Provides some management operations, such as refreshing the node list of jobtracker;

Refreshauthorizationpolicyprotocol, Refreshusermappingsprotocol:I do not understand it yet.

 

Reference URL

Http://www.wikieno.com/2012/02/hadoop-ipc-rpc/

Http://langyu.iteye.com/blog/1183337

 

Http://jimmee.iteye.com/blog/1201398

Http://jimmee.iteye.com/blog/1201982

Http://jimmee.iteye.com/blog/1206201

Http://jimmee.iteye.com/blog/1206598

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.