This is a creation in Article, where the information may have evolved or changed.
First, preface
In the computer field, when the single-machine performance to reach the bottleneck, there are two ways to solve the performance problem, one is the heap hardware, further improve the configuration, the second is distributed, horizontal expansion. Of course, both are the same as burning money.
Talk today about the architecture of the distributed system I understand.
Two ways of Distributed system
There are many kinds of distributed systems, such as Distributed File system, distributed database, distributed webservice, distributed computing and so on, which are different from each other, but whether the distributed thinking is the same?
1. A simple example
Suppose we have a server that can afford a request of 1 million/s, which can be accessed via HTTP, the download of the file via TCP, the JDBC execution Sql,rpc calling interface ..., now we have a data request is 2 million/s, it is clear that the server hold, will be a variety of denial of access, even crashes, downtime, how to do it. A machine can not solve the problem, then two. So we add a machine, each bearing 1 million. If the request continues to increase, two can not solve the problem, then three chant. This way we call it horizontal expansion . How to achieve the average allocation of requests is load balancing .
Another chestnut, we now have two data requests, data 1 900,000, data 2 800,000, the above machine also hold, we add a machine to load balance, each machine processing 450,000 data 1 and 400,000 data 2, but the split is too troublesome, rather a processing data 1, a processing data 2, The same can solve the problem, which we call vertical splitting .
Horizontal Scaling and vertical splitting are two ways of thinking about distributed architectures, but not a two-choice issue, but more of a merger. A practical scenario is described below. This is also the corporate structure of many internet ideas.
2. Practical examples
I am in the company's computer system is very large, naturally a whole distributed system, in order to facilitate organizational management, the company will be the entire technical department according to business and platform split into departments, orders, members, merchants and so on, each department has its own Web server cluster, database server cluster, Links through the same site may come from different servers and databases, the site and the underlying access to the database is assigned to different server clusters, this is a typical vertical split by business, each department of the server can not hold, there will be elastic expansion, this is Horizontal Expansion .
In the database layer, some tables are very large, the amount of data in billion, if only a pure level of expansion is not necessarily the best, if the table is split, such as can be split horizontally by User ID, by the way the ID modulo, the user is divided into multiple tables, These tables can also be on different servers. Vertical split by business and split by user horizontally is a common solution in a distributed database.
Third, load balancing
Before we talked about the distribution to solve the performance problem, the problem that came with it was how to distribute it, that is, how to load balance. The problem here is that when the client requests it, it should ask which server in the distributed system, and it is common practice to assign the target server to the customer service side through an intermediary server.
This also takes two different distributed systems to explain, the left is distributed File System Fastdfs, the right is a distributed RPC middleware.
- Fastdfs a file download request process is like this
1.client Inquiry Tracker can download the storage of the specified file;
2.tracker returns a usable storage;
3.client Direct and Storage communication complete file download.
Where tracker is the load Balancer server, storage is the server that stores files and processes upload download requests.
- Another RPC middleware, Hedwig, is also class-like
1.client asks zookeeper which server can execute the request;
2.zookeeper returns an available server;
3.client completes the RPC directly with the service.
Zookeeper is a load-balancing framework in distributed systems, and Google's chubby is an open source implementation that is an important component of Hadoop and HBase.
Also in HTTP, the commonly heard Nginx is also a load Balancer server, it is oriented to a distributed Web server. As for the specific load balancing algorithm polling, hash and so on here will not go deep.
Iv. Synchronization
In distributed system, after solving the problem of load balancing, another problem is the consistency of data, which needs to be ensured by synchronization. Depending on the scene and requirements, the way of synchronization is also selective.
In the Distributed file system, such as the picture of the product page, if modified, the synchronization requirements are not high, even if a few seconds or even a few minutes of delay is acceptable, because generally does not have the impact of loss, so you can simply through the file modification of the timestamp, a certain time scan synchronization, Consistency can be sacrificed to increase efficiency.
But the distributed database in the bank is not the same, a little bit of a different step is unacceptable, and even through the locking and other performance-sacrificing way to ensure full consistency.
The Paxos algorithm is recognized as the best algorithm in the consistency algorithm, and Paxos in chubby and zookeeper is the core of the consistency. This algorithm is difficult to understand, I do not understand, here is not in-depth.
V. Conclusion
After exposure to so many distributed systems, it is found that their design ideas are so similar, this may be the law of one.
Extended Reading
- Shop 1th The practice and key steps of the horizontal sub-Library of order system
- Load Balancing scheduling algorithm Daquan
- Paxos algorithm of Distributed system
Author: First Dragon
Original link: Https://chulung.com/article/architecture-of-distributed-system
This article was automatically synced to Cnblogs by Metaclblog on 2017-07-17 09:05:11
This article is based on knowledge sharing-attribution-non-commercial use-Prohibit deduction of 4.0 International License Agreement issued, reprint must retain the signature and link.