How do service best practices be done under a distributed service framework?

Source: Internet
Author: User

"After upgrading the service framework, performance, reliability and other issues are becoming increasingly apparent. After the service of the many challenges, how to analyze to give practical best solution?

Before the service, the business is usually a local API call, and local method calls have less performance loss. After service, the service provider and consumers adopt remote network communication, which increases the additional performance loss, the delay of the business call will increase, and the risk of the distributed call failure is increased due to the network flash-off and other reasons. If the service framework does not have sufficient fault tolerance, the business failure rate will increase significantly.

In addition to the performance, reliability and other issues, cross-node transactional consistency problems, the failure of distributed calls to the constraints of the difficulty, the large number of micro-service operation and maintenance costs are also the problem of distributed service framework must be solved. This article will analyze the challenges faced after service and provide solutions and business best practices.

1 Performance and latency issues

Before the service, the business is usually a local API call, and local method calls have less performance loss. After service, the service provider and consumer use remote network communication, which adds additional performance loss:

    1. The client needs to serialize the message, which mainly consumes CPU compute resources.

    2. Serialization requires creating a binary array that consumes the JVM heap memory or the out-of-heap memory.

    3. The client needs to send a serialized binary array to the server, consuming network bandwidth resources.

    4. After the server reads the code stream, the request datagram needs to be deserialized into the request object, consuming CPU compute resources.

    5. The service side invokes the service provider implementation class by reflection, and the reflection itself has a greater impact on performance.

    6. The server serializes the response result and consumes CPU compute resources.

    7. The server sends the answer stream to the client and consumes the network bandwidth resources.

    8. The client reads the answer stream, deserializes it into a response message, and consumes CPU resources.

Through analysis, we found that a simple local method call, after switching to a remote service invocation, added a lot of processing process, not only occupied a lot of system resources, but also increased the latency. Some complex applications will be broken into multiple services, forming a service call chain, if the performance of the service framework is poor, service call latency is also relatively large, the performance and latency after the service will not meet the performance requirements of the business.

1.1 RPC Framework High performance design

There are three main factors that affect RPC framework performance.

1) I/O scheduling model: Synchronous blocking I/O (BIO) or non-blocking I/O (NIO).

2) Selection of serialization framework: Text protocol, binary protocol, or compressed binary protocol.

3) thread scheduling model: Serial scheduling or parallel scheduling, lock competition or non-locking algorithm.

1. I/O scheduling model

During I/O programming, multi-threaded or I/O multiplexing techniques can be used to process multiple client access requests at the same time. I/O multiplexing technology enables multiple client requests to be processed simultaneously in single-threaded situations by reusing multiple I/O blocking to the same select block. Compared with the traditional multi-threaded/multi-process model, the maximum advantage of I/O multiplexing is that the system overhead is small, the system does not need to create new additional processes or threads, and it does not need to maintain the running of these processes and threads, reduces the maintenance workload of the system and saves the system resources.

The JDK1.5_UPDATE10 version uses Epoll instead of the traditional select/poll, which greatly improves the performance of NIO communication, as it works in 1-1.

Figure 1-1 How non-blocking I/O works

Netty is an open source high performance NIO communication framework: its I/O thread nioeventloop because of the aggregation of the multiplexer selector, can concurrently process hundreds of client channel. Because read and write operations are non-blocking, this can improve the efficiency of the I/O thread and avoid thread hangs due to frequent I/O blocking. In addition, because Netty uses asynchronous communication mode, an I/O thread can concurrently handle n client connection and read and write operations, which fundamentally solves the traditional synchronous blocking I/O connection-first-thread model, and the performance, elasticity and reliability of the architecture have been greatly improved.

Netty is carefully designed to provide a number of unique performance enhancement features that enable it to rank first in the various NIO frameworks, and its performance optimization measures are summarized as follows:

  1. 0 Copy: (1) The Netty receive and send Bytebuffer uses direct buffers to read and write the socket using out-of-heap direct memory, without having to make two copies of the byte buffer. If the socket is read and written using traditional heap memory (heap buffers), the JVM copies the heap memory buffer into direct memory before it is written to the socket. Compared to out-of-heap direct memory, the message has a memory copy of the buffer one more time during the sending process. (2) Netty provides a combo buffer object that aggregates multiple Bytebuffer objects, allowing the user to easily manipulate the combo buffer as if it were a buffer. Avoids the traditional way to merge several small buffer into a large buffer by means of a memory copy. (3) Netty file transfer uses the Transferto method, which can directly send the data of the file buffer to the target channel, avoiding the traditional memory copy problem caused by the cyclic write mode.

  2. Memory pools: With the development of JVM virtual machines and JIT-on-the-fly compilation technology, object allocation and recycling is a very lightweight task. But for buffer buffers, the situation is slightly different, especially for the allocation and recycling of out-of-heap direct memory, which is a time-consuming operation. In order to reuse buffers as much as possible, Netty provides a buffer reuse mechanism based on the memory pool. The performance test shows that the bytebuf of the memory pool is about 23 times times higher than that of the bytebuf, which is strongly correlated with the performance data.

  3. Non-locking serial design: In most scenarios, parallel multithreading can improve the concurrency of the system. However, if concurrent access to shared resources is handled improperly, it can lead to severe lock contention, which can eventually result in degraded performance. In order to avoid the performance loss of lock competition as much as possible, it can be accomplished by serialization design, that is, the processing of message is done in the same thread as possible, and the thread switching is not done, so the multi-threading competition and the synchronous lock are avoided. To maximize performance, the Netty adopts a serial, non-locking design that performs serial operations within the I/O thread to avoid performance degradation caused by multi-threaded contention. On the surface, the serialization design seems to be low CPU utilization and not enough concurrency. However, by adjusting the thread parameters of the NIO thread pool, you can simultaneously start multiple serialized threads running concurrently, a locally unlocked serial threading design that performs better than one queue-multiple worker threading models.

  4. Efficient concurrent programming: the massive and correct use of volatile, the widespread use of CAs and atomic classes, the use of thread-safe containers, and the promotion of concurrency performance through read-write locks.

2. High-performance serialization framework

The key factors that affect serialization performance are summarized below.

    1. The size of the stream after serialization (network bandwidth occupancy).

    2. Serialization & Deserialization Performance (CPU resource consumption).

    3. Whether to support cross-language (heterogeneous system docking and development language switching).

    4. Performance of concurrent calls: stability, linear growth, even Shiyanmao spikes.

Compared to text protocols such as JSON, the binary serialization framework performs better, using Java native serialization and Protobuf binary sequences as examples for performance test comparisons, as shown in result 1-2.

Figure 1-2 Serialization performance test comparison data

In the technical selection of the serialization framework, if no special requirements, as far as possible to select a better binary serialization framework, the code stream is compressed, you need to make flexible selection according to the communication content, for pictures, audio, there is a large number of duplicate content of text files (such as fiction) can be used to compress the code stream, commonly used compression algorithm including Zig-zag and so on.

3. High-performance reactor threading model

The characteristics of the model are summarized as follows.

    1. There is a dedicated NIO thread: The acceptor thread is used to listen to the server and receive TCP connection requests from the client.

    2. Network I/O operations: Read, write, etc. by a nio thread pool, the thread pool can be implemented with a standard JDK thread pooling, which contains a task queue and n available threads, which are responsible for reading, decoding, encoding, and sending messages.

    3. 1 NiO threads can handle n links at the same time, but 1 links only correspond to 1 NIO threads, preventing concurrent operations.

Because reactor mode uses asynchronous non-blocking I/O, all I/O operations do not cause blocking, in theory a thread can handle all I/O-related operations independently, so in most scenarios, the reactor multithreaded model can fully meet the business performance requirements.

The reactor thread scheduling model works as shown in Figure 1-3.

Figure 1-3 High-performance reactor thread scheduling model

1.2 Business Best Practices

To ensure high performance, relying solely on the distributed service framework is not enough, but also need to apply the cooperation, application Service high performance practice summarized as follows:

    1. Can asynchronously use asynchronous or parallel service invocation, improve the service throughput, effectively reduce the service call latency.

    2. Whether it is the thread pool of the NIO communication framework or the backend business thread pool, the configuration of the thread parameters must be reasonable. If you use the JDK default thread pool, the maximum number of threads is recommended to be no more than 20. Because the JDK thread pool defaults to n threads contention 1 synchronous blocking queues, when the number of threads is too large, it can lead to fierce lock competition, when performance will not be improved, but will fall.

    3. Minimize the size of the stream to be transferred and improve performance. When called locally, the parameter size has no effect on performance due to access in the same heap memory. When communicating across processes, a complex object is often passed, and if you explicitly use only a few of these fields or an object reference, do not pass the entire complex object to the past. For example, object A holds 8 basic types of fields, 2 complex objects B and C. If the explicit service provider only needs to use the C object of a aggregation, the request parameter should be C, not the entire object A.

    4. Set the appropriate client time-outs to prevent traffic spikes because the server response is slow, causing the business thread to be blocked when answering, which in turn causes messages from subsequent services to queue in queues, causing a failure to spread.

    5. For important services, they can be deployed separately into separate service thread pools, isolated from other non-core services, and ensure the efficient operation of core services.

    6. Use lightweight OS container deployment services such as Docker to isolate the service from the physical resource layer, avoiding more than 20% of the performance loss resulting from virtualization.

    7. Set a reasonable priority for service scheduling and make real-time adjustments based on on-line performance monitoring data.

2 Transactional Consistency Issues

Before service, the business takes a local transaction, multiple local SQL calls can be encapsulated with a large transaction block, and if a database operation exception occurs, the previous SQL operation can be rolled back, and only all SQL operations are successful before they are finally committed, which guarantees transactional strong consistency, 2-1.

After servicing, three database operations may be split into separate three database access services, when the original local SQL call evolved into a remote service invocation, and transactional consistency was not guaranteed, as shown in 2-2.

Figure 2-2 Introduction of distributed transaction problems after service

If service A and service B are successful, the SQL of A and B will be committed, the last execution of service C, its SQL operation fails, and for the application of the 1 consumer, the related SQL operations for service A and service B have been committed, and service C has been rolled back, which results in inconsistent transactions. As you can tell from figure 2-2, transactional inconsistencies are primarily caused by service distributed deployment, and are therefore referred to as distributed transaction issues.

2.1 Distributed Transaction Design scheme

Typically, a distributed transaction is based on a two-phase commit implementation, which works as shown in 2-3.

Figure 2-3 Schematic diagram of two-phase commit

Phase 1: The global transaction manager sends a prepare request to all transaction participants, and the transaction contributor replies to the global transaction manager if it is ready.

Phase 2: After the global transaction manager receives a reply from all the transactional participants, if all the transaction participants can commit, the submission request is sent to all the transaction submitter, otherwise it is rolled back. Transaction participants commit or rollback operations based on the instructions of the global transaction manager.

The distributed transaction rollback principle is shown in Figure 2-4.

Figure 2-4 Schematic diagram of a distributed transaction rollback

The two-phase commit is a pessimistic locking strategy, and performance is poor because the individual transaction participants need to wait for the slowest-responding participants. The first problem is the cost of the protocol itself: the entire protocol process needs to be locked, such as locking a record in the database, and it needs to persist a large number of transaction state-related operations logs. More troubling is the vulnerability of two-phase locks in the event of a failure, such as a fatal flaw in a two-phase lock: When the coordinator fails, the entire transaction waits until the coordinator resumes, and if the coordinator has errors such as a disk failure, the transaction will be permanently discarded.

For a distributed service framework, it is necessary to support distributed transactions from functional characteristics. In the actual business use process, if can solve the problem through the final consistency, do not need to do strong consistency, if can avoid the distributed transaction, try to avoid the use of distributed transaction at the business layer.

2.2 Distributed Transaction Optimization

Since distributed transactions have many drawbacks, why are we still using them? Is there a better solution to improve or replace it? If we are only optimizing for distributed transactions, we find that there is little room to improve, after all, the bottleneck is in the distributed transaction model itself.

So let's go back to the root of the problem: why do we need distributed transactions? Because we need to maintain consistency across resource data, do all business scenarios really need to be consistent with the strong consistency of distributed transactions? Most business scenarios tolerate short-term inconsistencies, and different businesses have different tolerance times for inconsistencies. Like a bank transfer business, there are a few minutes of inconsistencies in the middle, and users are generally understandable and tolerant.

In most business scenarios, we can use eventual consistency instead of traditional strong consistency to try to avoid the use of distributed transactions.

The final consistency scheme used in practice is to use MQ with transactional capabilities as the man-in-the-middle role, and it works as follows: Before doing a local transaction, send an prepare message to MQ, then execute the local transaction, send a commit message to MQ if the local transaction commits successfully, Otherwise, a rollback message is sent, and the previous message is canceled. MQ will only post messages when a commit acknowledgement is received, so this form ensures that local transactions and MQ can achieve consistency in all normal situations.

However, there are many unusual scenarios for distributed calls, such as network timeouts, VM outages, and so on. If the system executes the LOCAL_TX () success, there is no time to send the commit message to MQ, or sent out due to network timeouts and other reasons, MQ did not receive a commit, a commit message is lost, then MQ will not send the prepare message out.

MQ will check whether the message should be posted or discarded according to the policy to try to ask (callback) the message system (CHECKCOMMIT), and after the system is confirmed, MQ will post or discard, which will guarantee the consistency of MQ and message system. Thus ensuring the consistency of the receiving message system.

3 Research and development team collaboration issues

After the service, especially after the introduction of micro-service architecture. The research and development team will be split into multiple service groups, such as AWS's two Pizza team, each of which is responsible for developing, testing, deploying, operating, and running the services.

With the expansion of the number of services, research and development teams, and cross-team collaboration will become a constraint on research and development efficiency improvement factors.

3.1 Shared Services Registry

To facilitate development testing, a service registry that shares all services is often shared online, and a service that is being developed is published to the service registry, which may cause some consumers to be unavailable.

Solution: You can let the service provider developer, subscribe only to the service (the development of the service may depend on other services), instead of registering the service being developed, test the service being developed by direct connection.

It works as shown in 3-1.

Figure 3-1 Subscription only, not published

3.2 Direct Connect Provider

In the development and testing environment, if the public service registry is not set up, consumers will not be able to get the address list of the service provider, only local unit testing or use of simulated pile testing.

Another scenario is that in real-world testing, service providers tend to deploy multiple instances, and if there are bugs in the service provider, you need to do a remote breakpoint debug, which brings two problems:

    1. Service provider Multi-instance deployment, remote debugging address can not be determined, debugging inefficient.

    2. Multiple consumers may share a set of test-setting environments that may be accidentally interrupted by other consumers during the debugging of breakpoints.

Workaround: Bypass the registry and test only the specified service provider, which may require a point-to-point direct connection, and the peer-to-peer approach ignores the provider list of the registry in the service interface.

3.3 Multi-Team Progress collaboration

If the front-end Web portal relies on the backend A, B, C, and D 4 Services, respectively, by 4 different research and development teams, the portal requires a new feature within 2 weeks to go online. A and b internal requirements prioritization is a high priority for the portal to meet the delivery point of time. However, the C and D service teams, which need to develop other higher priority services at the same time, have a relatively low priority and cannot meet the 2-week delivery.

Before C and D provide versions, the portal can only complete mock tests by hitting a test pile, but the requirements cannot be delivered on schedule because there is no real test of the C and D services.

The more services the application relies on, the less efficient the feature delivery, and the speed of delivery depends on the service that depends on the latest delivery. If the Web portal relies on 100 services in the background, as long as 1 core services are not delivered on schedule, the overall progress will be delayed.

Solution: The call chain can string and show dependencies between applications, services, and middleware, based on the delivery date of the first invocation of the chain as input, and using the dependency management tool to automatically calculate the late delivery point for each service on the call chain. By invoking chain analysis and standardizing the dependency calculation tools, you can avoid the demand delay caused by the human demand ordering error.

3.4 Service downgrade and mock test

In the actual project development, due to the inconsistent development rhythm between the groups and individual developers, it is often the case that consumers wait for the service provider to provide the linked version, and wait for each other to reduce the project development progress.

Solution: The service provider first sets up the interface and provides it to the consumer, the consumer can combine the service downgrade with the mock test, and implement the business logic of fault-tolerant demotion in the mock test Code (business Pass-through), so that both the mock test and the service degradation business logic development, double benefit.

3.5 Collaborative debugging issues

In the actual project development process, the progress of each research and development team is normal. If the consumer waits for the service provider to provide the version on time, it often leads to waste of human resources and affects the project progress.

Solution: The Distributed Services Framework provides a mock pile management framework that, when the perimeter service provider has not yet completed the development, switches the route to the simulated test mode, automatically invokes the mock piles, and when the business integration test and on-line can be automatically switched to the real service provider, which is implemented in conjunction with the service demotion feature.

3.6 Interface Forward Compatibility

Because of online bug fixes, internal refactoring, and requirements changes, service providers often modify internal implementations, including but not limited to: interface parameter changes, parameter field changes, business logic changes, and data table structure changes.

In the actual project often occurs the service provider modifies the interface or the data structure, but does not know in time to all consumers, causes the service invocation to fail.


    1. Develop and strictly implement the pre-service compatibility specification to avoid incompatible changes or unauthorized modification without notifying the surrounding area.

    2. Interface Compatibility Technical Assurance: For example, thrift IDL, support new, modify and delete fields, field definition location independence, code flow support disorderly order and so on.

4 After the service is summed up, whether it is a service-based framework or business services are facing many challenges, this article extracts some of the more important issues, and gives solutions and best practices. For issues not listed in this article, it is necessary for service framework developers and users to explore in practice to find a service best practice for their products.

How do service best practices be done under a distributed service framework?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.