How to design an RPC system

Last Update:2016-11-07 Source: Internet

Author: User

Tags object serialization hosting ftp protocol

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Copyright notice: This article by Hanwei original article, reprint please indicate source:
Article original link: https://www.qcloud.com/community/article/162

Source: Tengyun https://www.qcloud.com/community

RPC is a convenient programming model of network communication, because of its high integration with programming language, it greatly reduces the complexity of processing network data, and makes the readability of code more noticeable. However, the composition of RPC itself is more complex, because of the programming language, network model, the use of the constraints of habits, there are a lot of compromises and tradeoffs. This article is through the analysis of several popular RPC implementation cases, provide everyone in the design of RPC system reference.

Because the RPC underlying network development is generally related to the specific use of the environment, and the programmatic means of implementation is very diverse, but does not affect the user, so this article basically involves how to implement an RPC system.

Recognize RPC (Remote call)

We have been exposed to the concept of "remote invocation" in various operating systems, programming language ecosystems. In general, they mean using a simple line of code to call a program on another computer over the network. Like what:

Rmi--remote method Invoke: Calls the remote methods. The "method" is usually attached to an object, so RMI refers to the invocation of its method function on an object on a remote computer.
Rpc--remote Procedure Call: Remote procedure invocation. Refers to a call to a specific function code on another computer on the network.

The remote call itself is a concept of network communication, which is characterized by encapsulating network traffic as a function-like invocation. Network communication outside the remote call, there are generally other concepts: packet processing, Message Queuing, stream filtering, resource pull wait. Here's a comparison of their differences:

Programme	Programming Method	Information Encapsulation	Transport Model	Typical Applications
Remote Call	Call the function, enter the parameter, and get the return value.	variables, types, functions that use programming languages	Make a request, get a response	Java RMI
Packet processing	Call Send ()/recv (), use bytecode data, encode and decode, process content	Constructs the communication content into a binary protocol package	Send/Receive	UDP programming
Message Queuing	Call put ()/get (), use the package object to process what it contains	Messages are encapsulated into language-usable objects or structures	To a queue, deposit a message; take out a message	ActiveMQ
Stream filtering	Reads a stream, or writes out a stream in which the cell package is processed instantly	Uniform data structures with very small cell lengths	Connect; Send/receive; process	Network Video
Resource Pull	Enter a resource ID to get the resource content	The request or response contains: Header + body	Wait for response after request	Www

For the feature of remote invocation--call function. The industry has developed similar programs in a variety of languages, while others are trying to be cross-lingual. Although remote calls seem to be the easiest to use programmatically, they also have obvious drawbacks. So understanding the advantages and disadvantages of remote invocation is a key issue in deciding whether to develop or use remote invocation of this model.

The advantages of remote invocation are:

Shielded the network layer. Therefore, we can choose different schemes on the transmission protocol and the coding Protocol. For example, the WebService scheme is the HTTP Transport protocol +soap the Encoding Protocol, while the rest scheme often uses the Http+json protocol. Facebook's thrift can even customize any of the different transfer protocols and encoding protocols, you can use Tcp+google Protocol Buffer, or you can use Udp+json ... By shielding the network layer, you can independently optimize the network parts according to actual needs without involving the processing code of the business logic, which is valuable for programs that need to run in various network environments.
function mapping protocol. You can write data structures and function definitions directly in a programming language, instead of writing a large number of encoding protocol formats and packet processing logic. For systems with very complex business logic, such as online games, you can save a lot of time defining message formats. and the function call model is very easy to learn, does not need to learn communication protocols and processes, so that less experienced programmers can easily start using network programming.

Disadvantages of remote invocation:

Increased performance consumption. Because the network communication is packaged as a "function", a lot of extra processing is required. For example, you need to pre-produce code, or use a reflection mechanism. These are additional CPU-and memory-intensive operations. And in order to express complex data types, such as the variable-length type string/map/list, which adds more descriptive information to the packet, it consumes more network packet lengths.
Unnecessary complications. If you are only for specific business needs, such as transferring a fixed file, then you should use the HTTP/FTP protocol model. In order to do monitoring or IM software, it is faster and more efficient to send and receive messages with simple message encoding. If it is to do proxy server, the flow of processing will be very simple. In addition, if you want to do a data broadcast, then Message Queuing will be easy to do, and remote call this can hardly be done.

Therefore, the most suitable scenario for remote invocation is: The business needs are changeable, the network environment is changeable.

Core issues with RPC scenarios

Since the use of the remote Call interface is "function", so how to build this "function", there are three aspects need to decide the problem:

How to represent "remote" information. The so-called remote, refers to another location on the network, then the network address is the part that must be entered. Under the TCP/IP network, the IP address and port number represent a portal to the running program. So specifying an IP address and port is required to initiate a remote call. However, a program may run many functions and can receive multiple remote calls with different meanings. So how to let the user specify these different meanings of the remote invocation portal becomes another problem. Of course, the simplest is a call per port, but one IP supports a maximum of 65,535 ports, and other network functions may also require ports, so this scenario may not be enough, and a number represents a function is not very good understanding, you must look up the table to understand. So we have to think of another way. Under the object-oriented thought, some schemes put forward the following: To summarize different functional combinations with different objects, specify the object first, and then specify the method. This idea is very much in line with how programmers understand it, and EJB is the solution. Once you have identified the address of the remote call using this model of the object, you need to have a way to specify the remote object, and in order to specify the object, you must be able to transfer some information from the callee (the server side) to the caller (the client). The simplest scenario is that the client enters a string as the "name" of the object, sends it to the server, finds the object that registered the name, and, if found, uses a technique to "transfer" the object to the client, and the client can invoke his method. Of course, this kind of transmission can not be the entire server to copy the object data to the client, but with a number of symbols or flags to represent the object on the server, and then sent to the client. If you are not using an object-oriented model, then a remote function must be positioned and transmitted, because the function you call must first be found and then become an interface on the client side before it can be called. The first important issue in the remote invocation design is the "remote object" (the object in question, which includes object-oriented objects or just functions), and how the expression can be positioned on the network, and in what form the client calls after successful positioning.
How the interface form of the function should be represented. Remote invocation is constrained by network communication, so it is not always possible to fully support all the features of a programming language. such as the pointer type parameter in the C language function, cannot pass through the network. Therefore, the function definition of the remote call, can be used in the language of what characteristics, can not use what characteristics, is required in the design is defined. If this is too restrictive, it will affect the user's ease of use and, if too broad, may result in poor performance of remote calls. How to design a way to describe a function in a programming language as a function of remote invocation is also a problem to be considered. Many scenarios use a generic approach to configuration files, while others can add special annotations directly to the source code. Generally speaking, a compiled language, such as C + + can only use the source code according to the configuration file generation scheme, the virtual model language such as C#/java can adopt the reflection mechanism with the configuration file (set is in the source code with special comments in place of the configuration file), if the scripting language is simpler, Sometimes even the configuration file is not needed because the script can act on its own. In short, the remote invocation of the interface to meet what kind of constraints, but also a need to carefully consider the problem.
How to implement network communication. The most important implementation detail of remote invocation is about network communication. What kind of communication to host the problem of remote call, refinement down is two sub-problem: What kind of service program to provide network functions? What communication protocols are used? Remote Call system can be directly to the TCP/IP programming to achieve communication, you can also delegate some other software, such as Web server, Message Queuing server, etc... You can also use different network communication frameworks, such as Netty/mina these open-source frameworks. Communication protocols generally have two layers: one is the transmission protocol, such as TCP/UDP or high-level HTTP, or its own definition of the transport protocol, and the other is the encoding protocol, is how to put a programming language object, serialization and deserialization into a binary byte stream scheme, the popular scheme has JSON, Google Protocol buffer and so on, many development languages also have their own serialization programs, such as java/c# are brought in. These technical details, which should be chosen, are directly related to the performance and environmental compatibility of the remote call system.

The above three problems, is the remote call system must consider the core selection. Depending on the constraints faced by each scenario, they will make a choice on these three issues to suit their constraints. But there is no "universal" or "universal" solution, the reason: In such a complex system, the more features you need to take care of, the more costs you will have to pay (ease of use, performance overhead). Below, we can look at the various remote invocation scenarios that exist in the industry to see how they are balanced and selected in these three areas.

Examples of industry solutions:

CORBA is an "old", ambitious program that tries to accomplish the task of cross-lingual communication while completing a remote call, and therefore has the highest degree of complexity, but the idea of its design is further studied by other programs later on. In the location of the communication object, it uses the URL to define a remote object, which is very easy to accept in the internet age. The content of its object is limited to the C language type and can only pass values, which is also very easy to understand. In order to be able to communicate in different languages, it is necessary to design a language that is only used to describe the remote interface outside of various programming languages, which is called the Idl:interface Description Language Interface Description Language. In this way, you can define interfaces in a language that is detached from all languages, and then use tools to automatically generate code for various programming languages. This approach is almost the only option for compiled languages. CORBA does not have any agreement on communication problems, but is left to the implementation of the specific language to deal with, which may be one of the reasons why he is not widely popular. In fact, CORBA has a very well-known successor, and he is the thrift framework of the Facebook company. Thrift is also using an IDL compiler to generate multiple languages of the remote invocation scheme, and with C++/java and other languages complete implementation of the communication bearer, so in the open source framework is a particularly appealing one. Thrfit communication bearer also has a feature, is able to combine the use of a variety of different transmission protocols and coding protocols, such as Tcp/udp/http with JSON/BIN/PB ... This makes it possible to select almost any network environment. Thrift model similar, here some stub means "pile code", is the client directly use the function form of the program, skeleton means "skeleton code", is to require programmers to write specific to provide remote service function template code, usually to fill in the blanks or inheritance (extension). This Stub-skeleton model is almost standard for all remote invocation scenarios.
Java RMI is a remote invocation scenario that comes with a Java virtual machine. It is also possible to use URLs to locate remote objects and pass the parameter values using Java's own serialized encoding protocol. On the interface description, because this is a Java-only scenario, the interface type of the Java language is used as the definition language directly. The user provides the remote service by implementing this interface type, while Java automatically generates the client's calling code for use by the caller based on the interface file. The implementation of his underlying communication, or the TCP protocol. Here, the interface file is the Java language IDL, but also the skeleton template for developers to fill out the remote service content. The stub code is directly arranged by the virtual machine because of the reflection function of Java. This scheme is very simple to use because of the support of Java Virtual machine, it is easy to solve the problem by using the Java programming method of the flag, but it can only run in the Java environment, which limits its scope of application. You can't have your cake and eat it. Ease of use and suitability are often conflicting. This differs greatly from the applicability of corba/thrift in pursuing the widest range, and also leads to a difference in ease of use.
RPC support in Windows Rpc:windows is earlier and more complete. First it queries the object with a GUID, and then uses the C language type as the pass-through for the parameter value. Because the Windows API is primarily C-language, for RPC functionality, an IDL is used to describe the interface, and finally the. h and. c files are generated to produce the stub and skeleton code for the RPC. and the communication mechanism, because it is the operating system comes with, so the use of kernel LPC mechanism to carry, this is still more convenient for users. But it also limits the calls that can be made between Windows programs only.
WebService & REST: In the Internet age, programs need to call each other through the Internet. The most popular protocol on the Internet is the HTTP protocol and the WWW service, so Web services using the HTTP protocol are naturally the most popular scenarios for cross-system calls. Because most internet infrastructures are available, Web service development and implementation is almost no difficulty. In general, it uses URLs to locate remote objects, while parameters are passed through a series of predefined types (primarily the C language base type) and object serialization. In terms of interface generation, you can parse HTTP yourself directly, or you can use specifications such as WSDL or soap. In the rest scenario, only put/get/delete/post four operation functions are qualified, others are parameters.

Summarizing these RPC scenarios above, we found that the general industry has the following options for the three core issues of remote invocation:

Remote Object location: Use a URL, or use the name service to find it.
Remote invocation of parameter passing: using the basic type definition of C, or serialization (deserialization) scheme using some sort of subscription
Interface definition: Use a technique of a particular format to directly contract an interface definition file, or use some kind of descriptor protocol IDL to generate these interface files.
Communication hosting: There are servers that use specific TCP/UDP, there are also custom communication models that can be developed by users themselves, and more advanced transport protocols that use HTTP or Message Queuing.

Solution Selection

After we have identified a few viable options for the remote call system scenario, it is natural to define the pros and cons of each scheme so that the design that really fits the requirements can be selected:

Description of the remote object: the use of URLs is the standard of Internet access, more convenient for users to understand, but also easy to add later need to extend to the content, because the URL itself is a combination of multiple parts of the string, and the name service is old-fashioned, but still have his advantage is that the name service can be loaded with load balancing, A series of features, such as disaster tolerance expansion and custom routing, are easy to implement for the complex positioning of requirements.
The interface description of the remote invocation: if it is restricted to a language, operating system, platform, directly using the "metaphor" of the interface description, or the "annotation" type annotation means to label the source code, the realization of the definition of remote call interface is most convenient. However, if you need a compatible compiled language, such as C + +, you must use some kind of IDL to generate the source code for these compiled languages.
Communication bearer: To customize the communication module to the user, can provide the best applicability, but also allows users to increase the complexity of the use. and http/message Queue This kind of load-carrying way, in the system deployment, operation and maintenance, programming will be relatively simple, the disadvantage is that the performance, transmission characteristics of the custom space is relatively small.

After analyzing the core issues, we also need to consider some of the applicability scenarios:

Object-oriented or process-oriented: if we just consider making a process-oriented remote call, we just need to navigate to the function. If you are object-oriented, you need to navigate to objects. Since the function is stateless, the positioning process can be as simple as a name, while the object needs to dynamically find its ID or handle.
Cross-language or single language: In a single language scenario, the header file or interface definition is fully processed in one language, and if it's cross-lingual, it's less about IDL.
Hybrid traffic is hosted or hosted using an HTTP server: Hybrid hosting may be able to use the underlying technologies, such as tcp/udp/shared memory, to provide optimal performance, but must be cumbersome to use. The use of HTTP server, it is very simple, because the WWW service open source software, many libraries, and the client using a browser or some JS page can be debugged, the disadvantage is that its performance is low.

Suppose we are now going to design a remote call system for a domain that is very variable in business logic, such as the enterprise business application, or the game server side, we should probably choose the following:

Locating remote objects using the name service: Because Enterprise Services require high availability, using the name service can identify and select an Availability service object when querying a name. The EJB (Enterprise JavaBean) in the Java EE scheme is a name service.
Use IDL to generate interface definitions: Because of Enterprise services or game services, their development language may not be unified or require high-performance programming languages such as C + +, so only IDL can be used.
Use hybrid communication hosting: Although Enterprise Services does not seem to be running under a very complex network, but the network environment of different enterprises may be very diverse, so to do a common system, it is best not afraid of trouble to provide hybrid communication bearer, this can be in TCP/UDP and other protocols to choose.

How to design an RPC system

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More