As a distributed computing framework, hadoop must involve RPC. Hadoop does not use the RPC technology provided in JDK, but implements an RPC mechanism by itself.
The RPC logic of hadoop can be divided into three parts:
A. Communication Protocol
B. Servers
C. Client
The structure is as follows:
1
Communication Protocol
The communication protocol here is not a network communication protocol, but a client/server communication interface. The client needs to communicate with the server. Different functions require different interfaces. Versionedprotocol is the superclass (Interface) of all communication protocols. It only defines one method.
2
Server
The server listens to client requests through socket, obtains the methods and parameters that the client needs to call, uses the Java reflection mechanism to call the corresponding methods, and returns the results to the client.
The following describes several key services on the server:
2.1 org. Apache. hadoop. IPC. Server
This is an abstract class that implements listening to the client, request processing framework, and return the results to the client. However, the specific processing is implemented by the Implementation class.
2.2 org. Apache. hadoop. IPC. rpc. $ Server
This class is the implementation of org. Apache. hadoop. IPC. server and mainly implements the processing of customer requests.
When the server starts, several threads are started to respond to client requests.
1) Listener thread
This thread is responsible for listening to client requests and receiving data, and then forming a call instance for the received data to be placed in the Request queue.
2) Handler thread
This thread extracts the call request from the request queue and calls the abstract method.
Public abstract writable
Call (class <?> Protocol, writable Param, long receivetime) to process the call request and return the result to the client.
3) Responder thread
The response data is returned to the client by the handler thread, but if there is unfinished data, the responder thread returns the client.
3
Client
The RPC client code of hadoop is actually a class: org. Apache. hadoop. IPC. Client. This class uses the dynamic proxy technology of Java to generate a proxy for the server's business interface, send the called business methods and parameters to the server through socket, and wait for the server to respond.
The client call sequence is shown below:
1)
The RPC client user first calls the waitforproxy method of RPC to obtain the dynamic proxy of the remote service interface. For example, when calling namenode in datanode, the Code is as follows:
| |
| |
| This. Namenode = (datanodeprotocol) RPC.Waitforproxy(Datanodeprotocol.Class, Datanodeprotocol.Versionid, Namenodeaddr, Conf ); |
|
2)
RPC calls Java's dynamic proxy class proxy to obtain the dynamic class, proxy. newproxyinstance.
| |
| |
| Versionedprotocol Proxy = (Versionedprotocol) proxy.Newproxyinstance( Protocol. getclassloader (),NewClass [] { Protocol }, NewInvoker (ADDR, ticket, Conf, factory )); |
|
3)
After obtaining the dynamic class, call the business method. Invoker implements invocationhandler, and all business methods must pass
Public object invoke (Object proxy, Method
Method, object [] ARGs.
| |
| |
| Objectwritable Value = (objectwritable) Client. Call (New Invocation (method, argS), address, Method. getdeclaringclass (), ticket );
|
|
4)
Client. Call assembles the parameters into a call instance, obtains the connection with the server, sends the parameters to the server, and synchronously waits for the server to return results.
| Call call =NewCall (PARAM ); Connection connection = Getconnection (ADDR, protocol, ticket, call ); Connection. sendparam (CALL ); ...... While (! Call. Done ){
Call. Wait (); // wait for Result
}
...... Return call. value; |
|