Design guidelines for Distributed systems

Source: Internet
Author: User

the article needs to be organized

2015.3.13 Revision

Distributed systems typically serve large requests, maintain big data, respond quickly, and are available for long periods of time. There are a lot of things to consider when designing a distributed backend service, this article gives some common design guidelines for viewing.

Availability: The system can serve the time normally, some online systems often require 99% availability. High availability is often achieved through critical component backup redundancy (e.g. cold, hot spare) Performance: Fast response, low latency reliability: System reliability is the same request to return the same data; the updates persist; data does not lose scalability: How much traffic the system can handle, can easily increase storage capacity and computing capacity to cope with more work manageability: Easy to operate, the entire system is easy to maintain and change, often considered-the problem occurs when the timely and convenient detection and processing, system modification and update is easy to operate cost: hardware and software costs, deployment and maintenance costs, learning costs The common solution to these requirements in reality:

Criteria

Tactics

Availability of

• Highly available clusters • lossy services: By carefully splitting the product process, you selectively sacrifice part of the data consistency and integrity to ensure that the core functions are most operational. Demote or turn off some of the secondary policies or unnecessary services. • Asynchronous Messages • Request current limit (drops of traffic exceeding the processing capacity)
• Isolation: Service isolation, data isolation, resource isolation
• Capacity Planning (QPS) = maximum number of processing threads/performs request average response time, system capacity (QPS) = Single machine capacity * Number of machines *r (capacity factor, 30% redundancy reserved for emergencies) • Fast rejection: Early rejection of overload requests
Large system small do: the complex large system into a number of independent, highly autonomous small systems to achieve high cohesion and low coupling, to avoid holding a full body

Performance

• Load Balancing • Caching, data pre-loading • Data read/write separation, data fragmentation, sub-Library, sub-table/Index
• Concurrent processing • Asynchronous processing • Timeout control (discard timeout response)
• Blocking fault tolerance

Scalability

• Elastic cloud computing Platform • Data plane: read/write separation, data partitioning, application level: Vertical scaling, horizontal replication, functional grading (service stateless), application sharding

In the above table, whether the service or data refers to partitions or shards, or vertical/horizontal, is actually the two direction of the segmentation:

Vertical expansion: A single server increases the speed of the CPU, more memory or disk, and upgrades the machine to extend the scale: adding more nodes requires software-level, system architecture to support this extension

To explain the common strategy: Cache up: Caches often exist on the upper level of the schema to enable the return of data as quickly as possible without having to undergo heavy downlevel processing. Distributed cache and global cache proxies are used: for receiving requests originating from clients and for delivery to back-end servers. Features include filtering requests, logging, coordinating resources, request conversions, and other load balancing: processing concurrent requests and routing them to a single node, making the system scalable, resources fully utilized, and responsive

Add:

The way to improve responsiveness is ultimately kick: queue, Cache, and partition (sharding):
Queue: Can alleviate the pressure of concurrent write operation, improve the system scalability, but also is the most common means of implementation of the asynchronous system;
Cache: From the file system to the database to the memory of the various levels of cache module, to solve the data near the demand for reading;
Partitioning: The scale of the data set that is frequently manipulated is reasonable when the scale of the system expands and the long-term data accumulates.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.