Translation by supermmx
Read the entire "design for performance" series:
Part 1: Interface
Part 2: Reduce object Creation
Part 3: remote interfaces (March 23,200 1)
Part 3: remote interfaces
Overview
Many Java performance problems often come from class design ideas in the early stages of the design process, before developers began to consider performance issues. in this series, Brian Goetz discusses some common Java performance adventures and explains how to avoid them at design time. in this article, it examines specific performance issues in remote applications.
Remote Call Concept
In distributed applications, an object running in one system can call the method of an object in another system. this is achieved through a lot of help that shows remote objects as local structures. to access a remote object, you must first find it, which can be implemented by using directories or naming services, such as RMI registration, JNDI, or the name service of CORBA.
When you get a remote object reference through the directory service, you do not get the actual reference of that object, but a reference of the stub object that implements the same interface as the remote object. when you call a method of a stub object, the object aggregates all the parameters of the method-converts them into a byte stream representation, similar to the serialization process. this stub object passes the collected parameters to a skeleton object over the network, breaking down the parameters and calling the actual object method you want to call. then, this method returns a value to the skeleton object. The skeleton object sends it to the stub object, and the stub Object breaks it down and passes it to the caller. phew! A single call has to do so much work. Obviously, apart from the surface similarity, a remote method call is larger than a local method call.
The above description browses some details that are very important to program performance. When a remote method returns a non-original class? But what happens to an object? Not necessarily. if the returned object is a type that supports remote method calls, it creates a stub object and a skeleton object. In this case, you need to find a remote method in the registry, this is obviously a high-cost operation. (remote objects support a distributed form of garbage collection, including communication between every JVM maintenance thread involved and other JVM maintenance threads, and passing reference information back and forth ). if the returned object does not support remote calls, all the fields and referenced objects of this object must be aggregated, which is also a cost operation.
Performance Comparison between remote and local method calls
The performance of Remote Object Access is different from that of local objects: Remote Object creation costs higher than local object creation. it is not only created when it does not exist, but also created for stub and skeleton objects, but also known to each other.
Remote method calls also include network transmission. aggregated parameters must be sent to the remote system, and responses must also be collected and sent back before the caller obtains control again. the latency caused by aggregation, decomposition, network latency, and actual remote calls are all combined. Clients usually wait for all these steps to complete. A remote call also relies heavily on the latency of the underlying network.
Different data types have different collection expenses. collection of the original type is relatively less costly; collection of Simple objects, point or string is more; collection of remote objects is more, the collection of objects that reference a lot of objects (such as collections) requires more. this is totally in conflict with local calls, because it takes much more to pass a simple object than to reference a complex object.
Interface Design is the key
Poor remote interfaces may completely eliminate the performance of a program. unfortunately, for local objects, good interface features may not be suitable for remote objects. the creation of a large number of temporary objects, as discussed in the first and second sections of this series, can also impede distributed applications, but a large amount of transmission is a performance issue. therefore, calling a method that returns multiple values in a time object (such as a point) may be more effective than multiple calls.
Some important performance guidance for real remote applications:
Beware of unnecessary data transmission. If an object needs to get several related items at the same time, it may be easier to implement it in a remote call if possible.
When the caller may not need to keep a remote object reference, he or she is wary of returning remote objects. When the remote object does not need a copy of an object, he or she is wary of passing complex objects.
Fortunately, you can find out all the problems by simply viewing the remote object interface. the sequence of method calls that require any high-level action can be clearly seen from the class interface. if you see that a common high-level operation requires many consecutive remote method calls, this is a warning signal, you may need to review the class interface.
Tips for reducing the cost of remote calls
For example, consider the following example of an application that manages an organizational Directory: a remote directory object contains a reference to the directoryentry object, which represents the phone book entry.
Public Interface directory extends remote {
Directoryentry [] getentries ();
Void addentry (directoryentry entry );
Void removeentry (directoryentry entry );
}
Public interface directoryentry extends remote {
String getname ();
String getphonenumber ();
String getemailaddress ();
}
Now suppose you want to use directory in a GUI email program. the program first calls getentries () to obtain the list of entries, and then calls getname () in each entry to calculate the list of results. When you select one, the application calls getemailadress () at the corresponding entry to obtain the email address.
How many remote method calls must occur before you can write an email? You must call getentries () once. Each entry in the address book calls getname () and getemailaddress () once (). therefore, if there are n entries in the address, you must perform n + 2 Remote calls. note that you also need to create n + 1 remote object reference, which is also a very expensive operation. if your address book has many entries, it is not only slow to open the email window, but also causes network congestion, causing high load to your directory service program, leading to scalability problems.
Now we want to enhance the Directory Interface:
Public Interface directory extends remote {
String [] getnames ();
Directoryentry [] getentries ();
Directoryentry getentrybyname (string name );
Void addentry (directoryentry entry );
Void removeentry (directoryentry entry );
}
How much will your email program cost be reduced? Now you can call directory. getnames () can get all the names at the same time. You only need to call getentrybyname () to the container you want to send an email (). this process requires three remote method calls, not n + 2, and two remote objects, rather than N + 1. if the address book has more names, the reduction of this call varies greatly in the program response, network load, and system load.
The technology used to reduce the cost of remote call and reference transfer is called the use of secondary object identifiers. use the standard attribute of an object -- in this example, It is name -- instead of returning a remote object as a lightweight vertex of the object? The secondary identifier contains enough information about the object it describes, So that you only need to obtain the remote object you actually need. in this example, a person's name is a good secondary identifier. in another example, in a security bag management system, a purchase ID number may be a good secondary identifier.
Another way to reduce the number of remote calls is to obtain blocks. You can add a method to the Directory Interface to obtain multiple required directoryentry objects at a time:
Public Interface directory extends remote {
String [] getnames ();
Directoryentry [] getentries ();
Directoryentry getentrybyname (string name );
Directoryentry [] getentriesbyname (string Names []);
Void addentry (directoryentry entry );
Void removeentry (directoryentry entry );
}
Now you can not only obtain the required remote directoryentry, but also use a single remote method to call all the required entries. although this does not reduce the collection cost, it greatly reduces the number of network trips. if network latency is important, a system with faster response can be generated (which can also reduce the usage of this network ).
The third technique for illuminating the path to the RMI hierarchy is to define a common object with an access name, address, instead of using directoryentry as a remote object, email address and other domain access functions. (In the CORBA system, I may want to use a similar object-by-value mechanism .) then, when the email application calls getentryname (), it will get the value of an entry object-no need to create a stub object or skeleton object, getemailaddress () is also a local call rather than a remote call.
Of course, all these skills depend on the understanding of how remote objects are actually used, but for this understanding, you don't even need to look at the implementation of remote classes to find out some potential serious performance problems.
Conclusion
The performance of distributed applications is essentially different from that of local applications. many operations that are costly for local programs are very costly for remote applications. Poor remote interfaces lead to severe scalability and problems for a program.
Fortunately, it is easy to design high-cost operations (such as remote calls and remote object creation) through common use cases and analysis, identify and solve many common Distributed performance problems, correctly use the techniques mentioned here, secondary object identifiers, block acquisition and return-by-Value -- improves the user response time and throughput of the entire system.
About the author
Brian Goetz is a professional software developer with more than 15 years of experience. He is a principal consultant at quiotix, a software development and consulting firm located in Los Altos, Calif.