the article needs to be organized
2015.3.13 Revision
Distributed systems typically serve large requests, maintain big data, respond quickly, and are available for long periods of time. There are a lot of things to consider when designing a distributed backend service, this article gives some common design guidelines for viewing.
Availability: The system can serve the time normally, some online systems often require 99% availability. High availability is often achieved through critical component backup redundancy (e.g. cold, hot spare) Performance: Fast response, low latency reliability: System reliability is the same request to return the same data; the updates persist; data does not lose scalability: How much traffic the system can handle, can easily increase storage capacity and computing capacity to cope with more work manageability: Easy to operate, the entire system is easy to maintain and change, often considered-the problem occurs when the timely and convenient detection and processing, system modification and update is easy to operate cost: hardware and software costs, deployment and maintenance costs, learning costs The common solution to these requirements in reality:
Criteria |
Tactics |
Availability of |
• Highly available clusters • lossy services: By carefully splitting the product process, you selectively sacrifice part of the data consistency and integrity to ensure that the core functions are most operational. Demote or turn off some of the secondary policies or unnecessary services. • Asynchronous Messages • Request current limit (drops of traffic exceeding the processing capacity) • Isolation: Service isolation, data isolation, resource isolation • Capacity Planning (QPS) = maximum number of processing threads/performs request average response time, system capacity (QPS) = Single machine capacity * Number of machines *r (capacity factor, 30% redundancy reserved for emergencies) • Fast rejection: Early rejection of overload requests Large system small do: the complex large system into a number of independent, highly autonomous small systems to achieve high cohesion and low coupling, to avoid holding a full body
|
Performance |
• Load Balancing • Caching, data pre-loading • Data read/write separation, data fragmentation, sub-Library, sub-table/Index • Concurrent processing • Asynchronous processing • Timeout control (discard timeout response) • Blocking fault tolerance |
Scalability |
• Elastic cloud computing Platform • Data plane: read/write separation, data partitioning, application level: Vertical scaling, horizontal replication, functional grading (service stateless), application sharding |
In the above table, whether the service or data refers to partitions or shards, or vertical/horizontal, is actually the two direction of the segmentation:
Vertical expansion: A single server increases the speed of the CPU, more memory or disk, and upgrades the machine to extend the scale: adding more nodes requires software-level, system architecture to support this extension
To explain the common strategy: Cache up: Caches often exist on the upper level of the schema to enable the return of data as quickly as possible without having to undergo heavy downlevel processing. Distributed cache and global cache proxies are used: for receiving requests originating from clients and for delivery to back-end servers. Features include filtering requests, logging, coordinating resources, request conversions, and other load balancing: processing concurrent requests and routing them to a single node, making the system scalable, resources fully utilized, and responsive
Add:
The way to improve responsiveness is ultimately kick: queue, Cache, and partition (sharding):
Queue: Can alleviate the pressure of concurrent write operation, improve the system scalability, but also is the most common means of implementation of the asynchronous system;
Cache: From the file system to the database to the memory of the various levels of cache module, to solve the data near the demand for reading;
Partitioning: The scale of the data set that is frequently manipulated is reasonable when the scale of the system expands and the long-term data accumulates.