The concept model of RPC and its implementation analysis

Source: Internet
Author: User

Today, distributed applications, cloud computing, microservices, as one of its technical cornerstones of the RPC you know how much? An RPC technical summary article, the number of the next 5k+ word, slightly longer, may not be suitable for the leisure of the fragmented time to read, you can first collect and then carefully read:

The full-text catalogs are as follows:

    • Defined
    • Origin
    • Goal
    • Classification
    • Structure
      • Model
      • Disassembly
      • Component
    • Realize
      • Export
      • Import
      • Agreement
        • Encode and Decode
        • Message header
        • Message body
      • Transmission
      • Perform
      • Abnormal
    • Summarize
    • Reference

Two years ago wrote two articles about RPC, now look back to find structure and logic slightly messy, special finishing re-integration into an article, want to understand the principle of RPC classmate can see.

In recent years, service and microservices have gradually become the mainstream mode of medium and large scale distributed system architecture, and RPC plays a key role in it. In the usual daily development we are in the implicit or explicit use of RPC, some of the new programmers will feel RPC is more mysterious, and some years of experience in the use of RPC programmers, although the use of experience, but some of the principle is also unclear. Lack of understanding of the principle level often leads to some misuse in development.

Defined

The full name of RPC is Remote Procedure call, which is an inter-process communication method. It allows a program to call a procedure or function of another address space (usually on another machine on a shared network) without the programmer explicitly coding the details of this remote invocation. That is, programmers, whether they call local or remote functions, write essentially the same calling code.

Origin

The term RPC concept was presented in the 80 's by Bruce Jay Nelson (reference [1]). Here we go back to what was the original motive for developing RPC? In Nelson's paper implementing Remote Procedure Calls(reference [2]) He mentioned several points:

Simple: The semantics of RPC concepts are clear and simple, which makes it easier to build distributed computing.
Efficient: Procedure calls look very simple and efficient.
General: In single-machine computing, "process" is often the most important communication mechanism between different algorithms.

In layman's terms, it is common for programmers to be familiar with local procedure calls, so we make the RPC identical to the local call, so it is easier to accept and use without obstacles. Nelson's paper was published 30 ago, and its perspective today seems to be far-sighted, and the RPC framework we use today is basically what we do with this goal.

Goal

The primary goal of RPC is to make it easier to build distributed computing (applications) without losing the semantic simplicity of local calls when providing powerful remote invocation capabilities. to achieve this goal, the RPC framework needs to provide a transparent calling mechanism that allows the consumer not to explicitly differentiate between local calls and remote calls.

Classification

RPC calls are divided into the following two types:

    1. Synchronous Invocation : The client waits for the call to finish and gets the execution result.
    2. Asynchronous Invocation : The client call does not wait for the execution result to return, but it can still get the return result by means of a callback notification. If the client does not care about calling the return result, it becomes a one-way asynchronous call, without returning the result.

The distinction between asynchronous and synchronous is whether to wait for the service side to finish and return the results.

Structure

Below we have a step-by-step cobwebs of the structure of the RPC from the theoretical model to the real component.

Model

In Nelson's paper, it was first pointed out that the program to implement RPC consisted of 5 theoretical model parts:

User
User-stub
Rpcruntime
Server-stub
Server

The 5-part relationship looks like this:

Here the User is the client side. When User wants to initiate a remote call, it actually calls user-stub locally. The user-stub is responsible for encoding the called interfaces, methods, and parameters through the agreed protocol specification and transmitting to the remote instance through the local Rpcruntime instance. The remote Rpcruntime instance receives the request and gives it to server-stub to initiate a call to the local side Server, and the result is returned to the User.

Disassembly

The above gives a coarse-grained RPC implementation theory model conceptual structure, where we further refine what components it should be composed of, as shown in.

The RPC server RpcServer goes through the export remote interface method, and the client goes through the RpcClient import remote interface method. The client invokes the remote interface method as if it were a local method, and the RPC framework provides the proxy implementation of the interface, and the actual invocation is delegated to the proxy RpcProxy . The agent encapsulates the invocation information and transfers the call RpcInvoker to the actual execution. The client RpcInvoker passes through the connector RpcConnector to maintain the channel with the server RpcChannel , and uses the RpcProtocol Execute Protocol encoding (encode) and sends the encoded request message over the channel to the server.

The RPC service-side sink receives RpcAcceptor a call request from the client, using the same RpcProtocol execution protocol decoding (decode).
The decoded call information is passed to the RpcProcessor de-control processing call procedure, and finally the delegate invocation is RpcInvoker actually executed and returns the result of the call.

Component

We have further disassembled the various components of the RPC implementation structure, and we describe in detail the division of responsibilities for each component below.

    1. Rpcserver
      Responsible for exporting the Remote Interface (export)
    2. rpcclient
      The proxy implementation that is responsible for importing (import) the remote interface
    3. RpcProxy
      Proxy implementations of remote interfaces
    4. Rpcinvoker
      Client: Responsible for encoding the call information and sending the call request to the server and waiting for the call result to return
      Server: Responsible for invoking the specific implementation of the server-side interface and returning the result of the call
    5. RpcProtocol
      Responsible for protocol compilation/decoding
    6. Rpcconnector
      Responsible for maintaining the client and server connection channels and sending data to the service side
    7. Rpcacceptor
      Responsible for receiving client requests and returning request results
    8. Rpcprocessor
      Responsible for controlling the call process on the server, including managing the call thread pool, time-out, etc.
    9. Rpcchannel
      Data transmission Channel
Realize

The conceptual model given in Nelson's paper has also become a reference standard for later. More than 10 years ago, my first contact with distributed computing was the Corbar (reference [3]) implementation structure basically similar to this. Corbar to address the RPC of heterogeneous platforms, IDL (Interface definition Language) is used to define the remote interface and map it to a specific platform language.

Later, most of the cross-language platform RPC basically adopted such a way, such as our familiar Web Service (SOAP), in recent years, open-source Thrift and so on. Most of them are defined by IDL and provide tools to map user-stub and server-stub that generate different language platforms and provide rpcruntime support through a framework library. However, it seems that each RPC framework defines a different IDL format, resulting in a further increase in the programmer's learning costs. While Web Service attempts to establish industry standards, rogue standard specifications are complex and inefficient, otherwise the more efficient RPC framework such as Thrift is not necessary to appear.

IDL is an alternative to RPC for cross-platform languages, and solving a wider range of problems naturally leads to more complex scenarios. For RPC in the same platform there is obviously no need to have an intermediate language, such as the Java native RMI, which is more straightforward for the Java programmer, reduce the use of learning costs.

After further dismantling the components and dividing the responsibilities above, the following is an example of implementing the RPC framework conceptual model in the Java platform, in detail analyzing the factors that need to be considered in the implementation.

Export

Export refers to the meaning of exposing the remote interface, only the exported interface can be called remotely, and the non-exported interface is not. The code snippet for exporting an interface in Java might look like this:

DemoService demo   ...;RpcServer   ...;server.export(DemoService.class, demo, options);

We can export the entire interface, or we can only export some of the methods in the interface at a finer granularity, as follows:

// 只导出 DemoService 中签名为 hi(String s) 的方法server.export(DemoService.class"hi"new Class<?>[] { String.class }, options);

In Java there is a more special call is polymorphic, that is, an interface may have multiple implementations, then the remote invocation of which is called exactly? The semantics of this local invocation are implicitly implemented by the reference polymorphism provided by the JVM, and for RPC, cross-process calls cannot be implicitly implemented. If the previous Demoservice interface has 2 implementations, then the interface needs to be specifically labeled with a different implementation, as follows:

DemoService demo   ...;DemoService demo2  ...;RpcServer   ...;server.export(DemoService.class, demo, options);server.export("demo2", DemoService.class, demo2, options);

The above Demo2 is another implementation that we marked as Demo2 to export,
It is also necessary to pass the token on a remote invocation to invoke the correct implementation class, which resolves the semantics of the polymorphic call.

Import

The import is relative to the export, in order for the client code to be able to invoke a method or procedure definition that must obtain a remote interface. At present, most of the cross-language platform RPC framework uses code generator to generate User-stub code based on the IDL definition, in which case the actual import process is done through the compiler at compile time. Some of the cross-lingual platform RPC frameworks I've used, such as Corbar, WebService, ICE, and Thrift, are all of these ways.

The way code is generated is an inevitable choice for cross-language platform RPC frameworks, and RPC for the same language platform can be implemented through shared interface definitions.
The code snippet for importing an interface in Java might look like this:

...;DemoService demo = client.refer(DemoService.class);demo.hi("how are you?");

In Java import is a keyword, so in the code snippet we use refer to express the meaning of the import interface. The import method here is essentially a code generation technique, but it is generated at run time, which is more concise than the code generation at the static compile time. Java provides at least two techniques to provide dynamic code generation, one is the JDK dynamic agent, and the other is bytecode generation. Dynamic proxies are easier to use than bytecode generation, but the dynamic proxy approach is less performance-generated than direct bytecode generation, and bytecode generation is much worse in code readability. The tradeoff is that, as a bottom-up generic framework, individuals are more inclined to choose performance first.

Agreement

Protocol refers to the method of data encapsulation that RPC invokes in network transport, including three parts: codec , message header and message body .

Encode and Decode

The client agent needs to encode the invocation information before initiating the call, which takes into account what information needs to be encoded and in what format to be transmitted to the server to allow the server to complete the call. For efficiency reasons, the less information you encode, the better (less data is transmitted), and the simpler the coding rule is, the better (and the more efficient).

Let's start by looking at what we need to code:

Call Encoding
1. Interface Methods
Include interface name, method name
2. Method Parameters
Include parameter type, parameter value
3. Calling Properties
Includes calling property information, such as calling additional implicit arguments, calling time-outs, and so on

return encoding
1. return results
The return value defined in the interface method
2. Return code
Exception return code
3. return exception information
Calling exception information

Message header

In addition to these necessary invocation information, we may need some meta-information to facilitate program decoding and possible future extensions. In this way our code message is divided into two parts, part of the meta-information, and the other part is the necessary information to invoke. If you design an RPC protocol message, the meta-information is placed in the protocol message header, and the necessary information is placed in the protocol message body. The following is a conceptual design format for the RPC protocol message header:

    • magic
      Protocol magic number, for decoding design
    • header size
      Protocol header length, for extended design
    • version
      Protocol version, designed for compatibility
    • st
      Type of message body serialization
    • hb
      Heartbeat message flag for long connected transport layer heartbeat design
    • ow
      One-way message flag,
    • rp
      Response message token, pail bit default is request message
    • status code
      Response Message Status Code
    • reserved
      Reserved for byte alignment
    • message id
      Message ID
    • body size
      Message body length
Message body

The message body is often serialized, and the following serialization methods are commonly used:

    • xml
      such as Webservie SOAP
    • json
      such as Json-rpc
    • binary
      such as thrift; Hession; Kryo, etc.

Format determined after the codec is simple, because the length of the head must be so we are more concerned about the message body serialization mode. Serialization we care about three areas:

    1. efficiency : The efficiency of serialization and deserialization, the faster the better.
    2. length : The byte length after serialization, the smaller the better.
    3. compatible : Serialization and deserialization compatibility, interface parameter object if the field is added, whether it is compatible.

Above these three points is sometimes the fish and bear paw can not have, which involves the specific serialization library implementation details, not in this article further analysis.

Transmission

After the protocol is encoded, it is naturally necessary to transfer the encoded RPC request message to the service side, after the service party executes the result message or acknowledgment message to the client. The application scenario of RPC is essentially a reliable request-reply message flow, which is similar to HTTP. Therefore, the choice of long-connection TCP protocol is more efficient, unlike HTTP is at the protocol level we define a unique ID for each message, so it is easier to reuse the connection.

With long connections, the first question is how many root connections are needed between the client and the server? In fact, single-connection and multi-connection in the use of no difference, for the low data transmission of the application type, a single connection is basically enough. The biggest difference between single-and multi-connection is that each connection has its own private send and receive buffers, so a large amount of data can be distributed over different connection buffers for better throughput efficiency.

So, if your data transfer volume is not enough to keep a single-connected buffer saturated, then using multiple connections does not create any noticeable elevation, but increases the overhead of connection management.

The connection is established and maintained by the client, and if the client and the server are directly connected, the connection is generally uninterrupted (except for physical link failures, of course). If the client and the server are connected through some load transit devices, it is possible that these intermediate devices will be interrupted when the connection is inactive for a period of time. In order to maintain connectivity it is necessary to periodically send heartbeat data for each connection to maintain the connection uninterrupted. The heartbeat message is an internal message used by the RPC framework library, and there is a dedicated heartbeat bit in the previous protocol header structure that is used to mark the heartbeat message, which is transparent to the business application.

Perform

What the client stub does is simply encode the message and transfer it to the service side, and the real call process takes place on the server side. Server stub from the previous structure disassembly we subdivide RpcProcessor and RpcInvoker two components, one is responsible for the control call process, one is responsible for the real call. Here we also take the implementation of these two components in Java as an example to analyze what they need to do?

Dynamic interface calls to implement code in Java are now generally invoked through reflection. In addition to the native JDK's own reflection, some third party libraries provide better-performing reflection calls, so it RpcInvoker encapsulates the implementation details of the reflection invocation.

What are the factors that need to be considered for the control of the calling process and RpcProcessor what kind of call control services are required? Here are some ideas to enlighten:

    1. Efficiency Improvement
      Each request should be executed as soon as possible, so we cannot create threads for each request to execute and need to provide a thread pool service.
    2. Resource Isolation
      When we export multiple remote interfaces, how to prevent a single interface call from occupying all of the thread resources and throwing other interfaces to execute blocking.
    3. Timeout Control
      When an interface executes slowly, and the client has timed out waiting, the thread on the server continues to execute at this point, which makes no sense.
Abnormal

No matter how RPC tries to disguise remote calls as local calls, they are still very different, and there are some exceptions that are never encountered when called locally. Before we say exception handling, let's compare some of the differences between local calls and RPC calls:

    1. The local call is bound to execute, and the remote call does not necessarily, and the calling message may not be sent to the service party because of network reasons.
    2. A local call throws only the exception that is declared by the interface, and the remote call also runs out of other exceptions when the RPC framework runs.
    3. The performance of local and remote calls can vary greatly, depending on the proportion of RPC intrinsic consumption.

It is these differences that determine the need for more consideration when using RPC. When calling the remote interface to throw an exception, the exception could be a business exception, or it could be a run-time exception thrown by the RPC framework (such as a network outage, etc.). A business exception indicates that the service party has made a call, possibly due to a failure to perform properly for some reason, while the RPC runtime exception may not be executed at all, and the exception handling policy for the caller naturally needs to be differentiated.

Because RPC inherently consumes several orders of magnitude higher than local calls, the intrinsic consumption of local calls is nanosecond, while the intrinsic consumption of RPC is at the millisecond level. It is not appropriate for an overly lightweight computing task to export a remote interface that is serviced by a separate process, and only the time spent on the computation task is much higher than the intrinsic consumption of RPC, which is worth exporting as a service to the remote interface.

Summarize

At this point we present a conceptual framework for RPC implementations and detailed analysis of some of the implementation details that need to be considered. No matter how elegant the concept of RPC, but "there are still a few snakes in the grass hidden", only a deep understanding of the nature of RPC, can be better applied.

See the students here may want to follow this conceptual model and implementation of the analysis can really be developed to implement an RPC framework library? I can definitely answer this question, really. Because I've developed a minimal RPC framework library for this model to learn validation, and the associated code is on Github, and interested students can read it for themselves. This is one of my own. Experimental validation with open source project, address is Https://github.com/mindwind/craft-atom, which craft-atom-rpc is implemented by this model micro RPC Framework Library, code volume relative to the industrial level used RPC Frame library is much less, easy to read and learn.

Finally, read here is certainly strive classmate, thank you for your time, let me write a bit more meaning:).

Reference

[1] Bruce Jay Nelson. Bruce Jay Nelson
[2] Birrell, NELSON. Implementing Remote Procedure Calls. 1983
[3] Corbar. Corbar
[4] Dubbo. Dubbo

Write a program of the world's text, draw a picture of life moment, the public number " wink ", met may wish to look at the attention.

The concept model of RPC and its implementation analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.