Distributed System design Concept

Source: Internet
Author: User
Tags app service hadoop mapreduce

Distributed System design Concept

The following describes in detail the author's understanding of several distributed system design concepts:

1. The distributed system has very low requirements for server hardware

This is mainly now in the following two areas:

Server hardware reliability does not require, to allow server hardware failure, hardware failure by software to fault tolerance. So the high reliability of distributed system is guaranteed by software.

The performance of the server does not require the use of high-frequency CPUs, high-capacity memory, high-performance storage and so on. Because the performance bottleneck of distributed system lies in the network overhead of communication between nodes, the performance of single server hardware is better, and also waits for network IO.

In general, Internet companies ' large data centers reduce data center costs by using a large number of inexpensive PC servers rather than building distributed clusters with several high-performance servers. For example, Google's cost control of the data center is the ultimate: All servers are not chassis, motherboard fully customized, as long as the most basic components, the early customization motherboard connected power switch and USB interface do not, install the partition on the motherboard to separate the CPU, so that the cold wind only blowing the CPU, do not blow the memory, Components that do not require cooling, such as hard drives, minimize cooling power consumption.

2. Distributed systems emphasize horizontal scalability

Horizontal scalability (scale out) is the overall performance of the cluster by increasing the number of servers. Vertical scalability (scale up) refers to improving the overall performance of the cluster by increasing the performance of each server. The upper limit of vertical scalability is very obvious, the performance of a single server can not be infinitely improved, and compared with the server performance, the network overhead is the biggest bottleneck of the distributed system. The upper space for horizontal scalability is large, and the cluster can always easily add servers. Moreover, distributed systems will ensure that scale-out increases the overall performance of the cluster (quasi-linear) as far as possible. For example, a cluster of 10 servers, scaled horizontally to 100 servers of the same cluster, will increase the overall distributed system performance to nearly 10 times times the original one.

The internet company's data center, generally a distributed system horizontally scaled up the upper limit on the million servers around. The basic unit of the Google Data Center, cell, consists of 20,000 or so servers, each cell by a set of distributed management System, BORG, unified management, each data center is composed of multiple cells.

3. The distributed system does not allow single points of failure (no one point Failure)

Single point of failure is that an application service only one instance running on a server, once the server is suspended, then the application service must also be affected and hung off, resulting in the entire service is not available. For example, if a Web site is running only one copy of a server, the Web service will inevitably be affected and unavailable if the server is down. If, for example, all the data exists on a single server, all data is inaccessible once the server is broken.

Because the distributed System servers are inexpensive PC servers, the hardware is not guaranteed to be 100% reliable, so the distributed system by default each server may fail to hang at any time. At the same time, the distributed system must provide a high reliable service, not allow the single point of failure, so the distributed system running every application service has multiple running instances run on multiple nodes, each data point has multiple backups exist on different nodes. As a result, multiple nodes fail at the same time, causing all instances of an app service to hang up, or the probability of a data point's multiple backups being unreadable, effectively preventing a single point of failure.

In general, do not let the server run full load, the server for a long time to run, the probability of failure significantly increased. So distributed system uses a lot of low-performance PC server, as far as possible to the load on all servers, so that each server load is not high, to ensure the overall stability of the cluster.

4. Distributed Systems minimize inter-node communication overhead

As mentioned earlier, the overall performance bottleneck for distributed systems is internal network overhead. At present, the speed of network transmission is not enough for the CPU to read memory or hard disk speed, so reduce network traffic overhead, so that the CPU as much as possible to deal with memory data or local hard disk data, can significantly improve the performance of distributed systems. The typical example is Hadoop MapReduce, which allocates the compute task to the node on which the data is to be processed, thus avoiding the transmission of data over the network.

5. Distributed System application services are best made stateless

The state of an app service refers to the data that the runtime program has in memory because it handles service requests. Distributed Application services are best designed to be stateless. Because if the application is stateful, then once the server outage causes the application service program to be affected, then the memory data is lost, which is obviously not a highly reliable service. The application service is designed to be stateless, so that the program saves the data that needs to be saved on the dedicated storage, so that the application service program can restart without losing data, so that the distributed system will be able to recover the application service after the server outage.

For example, in the design of the site backstage, for the user login request, the login user can be stored in the session related information in Redis or memcache cache service, so that the background of each site does not save the user login status, In this way, even if you restart the Web daemon does not lose the user's login status information, if the user's session related information is stored in the site daemon memory, then once the user login to the site daemon instance hangs, must have the user's login status information will be lost.

All in all, distributed systems are the preferred platform for enterprise applications in the Big Data era, with good scalability, especially horizontal scalability (scale out), enabling distributed systems to be flexible, responsive to ever-changing enterprise-class requirements, and reducing server hardware requirements for enterprise customers. Can really achieve the application service level of elastic expansion (auto-scaling).

Distributed System design Concept

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.