Design guidelines for Distributed systems

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

the article needs to be organized

2015.3.13 Revision

Distributed systems typically serve large requests, maintain big data, respond quickly, and are available for long periods of time. There are a lot of things to consider when designing a distributed backend service, this article gives some common design guidelines for viewing.

Availability: The system can serve the time normally, some online systems often require 99% availability. High availability is often achieved through critical component backup redundancy (e.g. cold, hot spare) Performance: Fast response, low latency reliability: System reliability is the same request to return the same data; the updates persist; data does not lose scalability: How much traffic the system can handle, can easily increase storage capacity and computing capacity to cope with more work manageability: Easy to operate, the entire system is easy to maintain and change, often considered-the problem occurs when the timely and convenient detection and processing, system modification and update is easy to operate cost: hardware and software costs, deployment and maintenance costs, learning costs The common solution to these requirements in reality:

Criteria	Tactics
Availability of	• Highly available clusters • lossy services: By carefully splitting the product process, you selectively sacrifice part of the data consistency and integrity to ensure that the core functions are most operational. Demote or turn off some of the secondary policies or unnecessary services. • Asynchronous Messages • Request current limit (drops of traffic exceeding the processing capacity) • Isolation: Service isolation, data isolation, resource isolation • Capacity Planning (QPS) = maximum number of processing threads/performs request average response time, system capacity (QPS) = Single machine capacity * Number of machines *r (capacity factor, 30% redundancy reserved for emergencies) • Fast rejection: Early rejection of overload requests Large system small do: the complex large system into a number of independent, highly autonomous small systems to achieve high cohesion and low coupling, to avoid holding a full body
Performance	• Load Balancing • Caching, data pre-loading • Data read/write separation, data fragmentation, sub-Library, sub-table/Index • Concurrent processing • Asynchronous processing • Timeout control (discard timeout response) • Blocking fault tolerance
Scalability	• Elastic cloud computing Platform • Data plane: read/write separation, data partitioning, application level: Vertical scaling, horizontal replication, functional grading (service stateless), application sharding

In the above table, whether the service or data refers to partitions or shards, or vertical/horizontal, is actually the two direction of the segmentation:

Vertical expansion: A single server increases the speed of the CPU, more memory or disk, and upgrades the machine to extend the scale: adding more nodes requires software-level, system architecture to support this extension

To explain the common strategy: Cache up: Caches often exist on the upper level of the schema to enable the return of data as quickly as possible without having to undergo heavy downlevel processing. Distributed cache and global cache proxies are used: for receiving requests originating from clients and for delivery to back-end servers. Features include filtering requests, logging, coordinating resources, request conversions, and other load balancing: processing concurrent requests and routing them to a single node, making the system scalable, resources fully utilized, and responsive

Add:

The way to improve responsiveness is ultimately kick: queue, Cache, and partition (sharding):
Queue: Can alleviate the pressure of concurrent write operation, improve the system scalability, but also is the most common means of implementation of the asynchronous system;
Cache: From the file system to the database to the memory of the various levels of cache module, to solve the data near the demand for reading;
Partitioning: The scale of the data set that is frequently manipulated is reasonable when the scale of the system expands and the long-term data accumulates.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Design guidelines for Distributed systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Design guidelines for Distributed systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support