Characteristics and design concept of distributed system

Source: Internet
Author: User
Tags app service hadoop mapreduce

Distributed systems are not new words, and in the 780 's there have been a variety of distributed systems. Only in the era of the Internet, distributed systems to shine, especially Google is the use of distributed systems to the extreme. Google's entire software architecture is based on a wide variety of distributed systems, such as Borg, MapReduce, BigTable, and so on. It is these distributed systems that allow Google to handle high concurrent request responses and massive data processing. Apache's Hadoop, Spark, Mesos and other distributed systems, the big data processing related technologies become very pro-people, so that more enterprise customers realize the convenience of distributed systems.


first, the characteristics of distributed systems


The most important feature of distributed systems is extensibility, which can be extended to meet demand changes. Enterprise application requirements are constantly changing over time, which also puts a high demand on enterprise application platforms. Enterprise Application platform must be able to adapt to the change of requirements, that is, scalability. For example, mobile internet 2C applications, with the increasing business size of Internet enterprises, the business becomes more and more complex, more and more concurrent user requests, to deal with more and more data, this time the enterprise application platform must be able to adapt to these changes, support high concurrent access and mass data processing. Distributed system has good scalability, can increase the number of servers to enhance the overall processing capacity of distributed systems to meet the business growth of the computing needs.


The core idea of a distributed system is to have multiple servers work together to accomplish tasks that a single server cannot handle, especially high concurrency or large data volumes. Distributed systems consist of loosely coupled networks of independent servers. Each server is a separate PC, the server is connected through the internal network, the internal network speed is generally relatively fast. Because the server in the distributed cluster is loosely coupled through the internal network, the communication between the nodes has some network overhead, so the distributed system can reduce the communication between nodes as much as possible. In addition, because of network transmission bottleneck, the performance of single node has little effect on the overall performance of distributed system. For example, for distributed applications, the performance differences of individual application services brought by the development of different programming languages can be negligible compared to the network overhead. Therefore, the distributed system of each node generally does not adopt high-performance servers, but relatively normal performance of the ordinary PC server. Improving the overall performance of a distributed system is done by scaling out (adding more servers) rather than scaling up (improving server performance per node).


The biggest feature of distributed systems is cheap and efficient: a cluster of inexpensive PC servers that can meet or exceed mainframe processing performance in terms of performance, at a much lower cost than mainframes. This is also the most attractive of distributed systems. Low-cost PC server in hardware reliability is far from mainframe, so distributed system by software to fault-tolerant hardware, through software to ensure the high reliability of the overall system.

The biggest benefit of distributed system is to realize the elastic extension of enterprise application service level. Elastic scaling at the application service level is relative to the elastic expansion of the computational resource level. General public Cloud Service (IaaS) vendors provide elastic scaling at the computational resource level, such as the ability to easily add or remove virtual hosts, increase or decrease the performance configuration of virtual hosts, and so on. But what the enterprise customer really needs is the elastic extension of the Application service layer, that is, with the fluctuation of the traffic, the instance of the background application service can change dynamically, which is not available to the IaaS manufacturers. For example, a mobile Internet short video sharing application, in the evening 11 to 1 o'clock in the morning is the peak of access, while the number of online up to hundreds of thousands of, when the background application services to expand to thousands of instances to cope with such high concurrent access requests; After a peak period, the Background app service can shrink to dozens of instances. With the distributed system, it is easy to dispatch application service instances, from dozens of to hundreds of or even thousands, to truly implement the elastic extension of application services.


second, the concept of distributed system design


The above is a brief introduction of the basic situation of distributed systems, the following details the author understands several distributed system design concepts:


1. The distributed system has very low requirements for server hardware


This is mainly now in the following two areas:


Server hardware reliability does not require, to allow server hardware failure, hardware failure by software to fault tolerance. So the high reliability of distributed system is guaranteed by software.


The performance of the server does not require the use of high-frequency CPUs, high-capacity memory, high-performance storage and so on. Because the performance bottleneck of distributed system lies in the network overhead of communication between nodes, the performance of single server hardware is better, and also waits for network IO.


In general, Internet companies ' large data centers reduce data center costs by using a large number of inexpensive PC servers rather than building distributed clusters with several high-performance servers. For example, Google's cost control of the data center is the ultimate: All servers are not chassis, motherboard fully customized, as long as the most basic components, the early customization motherboard connected power switch and USB interface do not, install the partition on the motherboard to separate the CPU, so that the cold wind only blowing the CPU, do not blow the memory, Components that do not require cooling, such as hard drives, minimize cooling power consumption.


2. Distributed systems emphasize horizontal scalability


Horizontal scalability (scale out) is the overall performance of the cluster by increasing the number of servers. Vertical scalability (scale up) refers to improving the overall performance of the cluster by increasing the performance of each server. The upper limit of vertical scalability is very obvious, the performance of a single server can not be infinitely improved, and compared with the server performance, the network overhead is the biggest bottleneck of the distributed system. The upper space for horizontal scalability is large, and the cluster can always easily add servers. Moreover, distributed systems will ensure that scale-out increases the overall performance of the cluster (quasi-linear) as far as possible. For example, a cluster of 10 servers, scaled horizontally to 100 servers of the same cluster, will increase the overall distributed system performance to nearly 10 times times the original one.


The internet company's data center, generally a distributed system horizontally scaled up the upper limit on the million servers around. The basic unit of the Google Data Center, cell, consists of 20,000 or so servers, each cell by a set of distributed management System, BORG, unified management, each data center is composed of multiple cells.


3. The distributed system does not allow single points of failure (no one point Failure)


Single point of failure is that an application service only one instance running on a server, once the server is suspended, then the application service must also be affected and hung off, resulting in the entire service is not available. For example, if a Web site is running only one copy of a server, the Web service will inevitably be affected and unavailable if the server is down. If, for example, all the data exists on a single server, all data is inaccessible once the server is broken.


Because the distributed System servers are inexpensive PC servers, the hardware is not guaranteed to be 100% reliable, so the distributed system by default each server may fail to hang at any time. At the same time, the distributed system must provide a high reliable service, not allow the single point of failure, so the distributed system running every application service has multiple running instances run on multiple nodes, each data point has multiple backups exist on different nodes. As a result, multiple nodes fail at the same time, causing all instances of an app service to hang up, or the probability of a data point's multiple backups being unreadable, effectively preventing a single point of failure.


In general, do not let the server run full load, the server for a long time to run, the probability of failure significantly increased. So distributed system uses a lot of low-performance PC server, as far as possible to the load on all servers, so that each server load is not high, to ensure the overall stability of the cluster.


4. Distributed Systems minimize inter-node communication overhead


As mentioned earlier, the overall performance bottleneck for distributed systems is internal network overhead. At present, the speed of network transmission is not enough for the CPU to read memory or hard disk speed, so reduce network traffic overhead, so that the CPU as much as possible to deal with memory data or local hard disk data, can significantly improve the performance of distributed systems. The typical example is Hadoop MapReduce, which allocates the compute task to the node on which the data is to be processed, thus avoiding the transmission of data over the network.


5. Distributed System application services are best made stateless


The state of an app service refers to the data that the runtime program has in memory because it handles service requests. Distributed Application services are best designed to be stateless. Because if the application is stateful, then once the server outage causes the application service program to be affected, then the memory data is lost, which is obviously not a highly reliable service. The application service is designed to be stateless, so that the program saves the data that needs to be saved on the dedicated storage, so that the application service program can restart without losing data, so that the distributed system will be able to recover the application service after the server outage.


For example, in the design of the site backstage, for the user login request, the login user can be stored in the session related information in Redis or memcache cache service, so that the background of each site does not save the user login status, In this way, even if you restart the Web daemon does not lose the user's login status information, if the user's session related information is stored in the site daemon memory, then once the user login to the site daemon instance hangs, must have the user's login status information will be lost.


All in all, distributed systems are the preferred platform for enterprise applications in the Big Data era, with good scalability, especially horizontal scalability (scale out), enabling distributed systems to be flexible, responsive to ever-changing enterprise-class requirements, and reducing server hardware requirements for enterprise customers. Can really achieve the application service level of elastic expansion (auto-scaling).


Turn from (http://www.infoq.com/cn/articles/features-and-design-concept-of-distributed-system/)

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Characteristics and design concept of distributed system

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.