Chen Yu (Giantchen_at_gmail)
Blog.csdn.net/solstice T.sina.com.cn/giantchen
Chen Yu series of articles on distributed systems: http://blog.csdn.net/Solstice/category/802325.aspx
This work is licensed by the "Creative Commons signature-non-commercial use-Prohibition of Deductive 3.0 Unported License Agreement (CC by-nc-nd)".
http://creativecommons.org/licenses/by-nc-nd/3.0/
Convention: This article only consider the Linux system, the article refers to the "service Program" is written in C + + or Java, compiled into a binary executable (binary or jar), the program will normally start reading the configuration file (or other ways to obtain configuration information), The same program may have a slightly different configuration file for each service process. The word "server" has multiple meanings, in order to avoid confusion, this article with host refers to server hardware, to "server-side program/process" refers to the service software (or specifically WEB server and Sudoku Solver, both are service software).
Before you get to the point, look at a fictional but typical example: Sudoku solver. (Sudoku solver is a homogeneous stateless service, the State migration of processes in distributed systems is not the subject of this article.) )
Suppose your company's distributed system has a dedicated Sudoku (Sudoku) service program that your team has developed and maintained. Typically, Web server uses the service provided by this Sudoku solver, where the user submits a Sudoku puzzle via a Web page, and the Web server turns to Sudoku solver to find the answer. Each Web Server is contacted with multiple Sudoku solver to achieve load balancing. The message structure of the system is roughly as follows, and each rounded rectangle is a process that runs on the respective host:
The Web Server in the image above is not simply understood as HTTPD + CGI, which in fact refers to all clients and may itself be a stateful service program.
Of course, the system is not the beginning of this, it experienced a multi-step evolution.
In the Beginning (a), only one Sudoku solver, and only one Web Server, is a simple one-to-one (1:1) usage relationship;
subsequently (b), as the volume of business increased, a host overwhelmed, and then deployed a number of Sudoku solver, into a one-to-many (1:n) use of the relationship;
Later (c), a Web server couldn't hold up, so several Web servers were deployed to form the Many-to-many (m:n) usage we saw at first;
(d) The situation in is left to the end of the text.
To deploy and run Sudoku Solver in a distributed system, the following issues need to be considered: Sudoku solver How to deploy to multiple host computers. Do you want to handcuff the executable file to the past? What about the library that the program uses. What to do with the configuration file. How to start the service program Sudoku Solver. If each Solver configuration file is slightly different (for example, each solver has its own service name), is the profile automatically generated? Sudoku Solver How the listening port is configured. How to ensure that it is not duplicated with other service programs. If the program crash, who will reboot. can be automatically restarted. Can the development/Operation personnel receive alert in a timely manner? If you want to actively restart Sudoku solver, do not log on to the host to kill. Or it can be controlled remotely. If you want to upgrade the Sudoku Solver program, how to redeploy it. How to (as much as possible) do not interrupt the service. How Web Server knows the addresses of those Sudoku solver. is not statically written to the Web Server's configuration file. If a hardware failure occurs in the host where the Sudoku solver resides, does the manager immediately learn about the situation? Can Web Server automatically fail over to other alive solver. After deploying the new Sudoku Solver, can Web Server automatically start using the new solver without rebooting. (Restarting the Web Server does not seem to be a big problem, and here we consider that the client is a stateful service and should avoid rebooting as much as possible.) ) Can the procedure be safely decommissioned. For example, the company no longer does solve Sudoku business, then close all Sudoku solver will not affect other business.
These problems can be roughly summed up in several areas: deployment (including upgrades) executable files and configuration files, monitoring process status, management service process, together can be called the operation of the operation.
According to the size and technical level of the company, the distributed system is divided into several realms, the following is my brief description of the realm.
Realm 1: Full manual operation
This is probably the level of college laboratories, the scale of distributed systems is not large, possibly a dozen machines up and down. The implementation of distributed system is for students in school.
The system is completely hand-built, host's IP address static configuration.
deployment : Manually copy the executable files to each machine after compiling, or put them in a common NFS directory. The configuration files are also manually modified and copied to each machine (or placed in a separate NFS directory for each Sudoku solver).
Management : Manually start the process by manually specifying the path to the configuration file on the command line. You need to log on to the host and kill the process when you restart the process.
Upgrade : If you need to upgrade Sudoku solver, you need to manually log multiple hosts, you can copy the new executable file overwrite the original, and restart.
configuration : The Ip:port of Sudoku Solver is written in the configuration file for the Web Server. If you deploy a new Sudoku solver, you will most likely have to restart your Web Server to do its work.
monitoring : None. The system is not a real business application, just used for learning research, found something wrong on the landing to the host to see, hand-resolved problems.
This level can be regarded as "over House", when the system is not the spirit, you can run test, hair paper. Realm 2: Using fragmented automation scripts and Third-party components
This is probably the level of the start-up company, and the system has been put into commercial applications. The company's development focus on the realization of the core business, add new features, temporarily forget the efficient operation of the dimension, perhaps the system's operation and maintenance tasks by the developer or network manager concurrently. The company already has the basic development process, the code uses the centralized version management tool (for example SVN), has the more formal QA sign-off process.
Company intranet has DNS, you can resolve the hostname to IP address, host IP address by DHCP configuration. The company's host hardware and software configuration is more unified, such as hardware are x86-64 platform, operating system unified use of Ubuntu 10.04 LTS, the daily installation of the package and the third party library is exactly the same (version number is the same), so that any one program in any You can start on any host, and you do not need a separate configuration.
Assume that each host has already configured SSH authentication key or GSSAPI, do not need to manually enter a password. If you want to run the md5sum command on Host1, Host2, Host3, HOST4, see if the contents of the SudokuSolver executable on each machine are the same, you can do this on your computer:
For h in host1 host2 host3 host4; do ssh $h md5sum/path/to/sudokusolver/version/bin/sudoku-solver; Done
The company's technicians have the ability to automate some of the operational tasks by using standard Linux tools such as Cron, at, Logrotate, and RRDtool.
deployment : The executable must be signed and released by QA before it can be deployed to the production environment (QA will sign the MD5 of the executable if necessary). For reliability, the executable file may not be placed on NFS (if NFS fails, the entire system is paralyzed). It is possible to use Rsync to copy the executable files to the native directory (considering that the executable file is large, it is not suitable to be placed directly in the version Management Library), and use md5sum to check whether the file after the copy is the same as the source file. The step of deploying an executable file should be done automatically with a script (for example, SSH $host rsync/path/to/source/on/nfs/path/to/local/copy/). In order for C + + executables to be copied to the host, static links are usually used to avoid the failure of the. So version.
Sudoku Solver configuration files are placed in the version management tool, each Solver instance may have its own branch, each modification must be stored. The configuration file used when the program is started must be check-out from SVN and cannot be modified manually (reduce human error).
Management : The first time the process is started, the configuration file from the SVN check-out, the configuration file can be read from local working copy when the process is restarted (to avoid the impact of the SVN server failure on the system), only after the configuration file has been changed to request SVN Update The service process is managed using daemon (/sbin/init or upright tools) and automatically restarts immediately after crash (using the Respawn feature). Service processes are typically started with host startup (put in/ETC/INIT.D) and can be remotely operated via SSH if you want to restart the service process on HostA (for example, to run SSH hosta/etc/init.d/sudoku-solver on this computer restart )。 Process management is decentralized, and each host runs which service is entirely determined by the/ETC/INIT.D directory of the Local machine. Migrating a service from one host to another requires logging in to both host to do some manual configuration.
Upgrade : Executables are also available in a set of versioning (not necessarily through SVN), and when a new version is released it is strictly forbidden to overwrite the existing executable file . For example, it is now running
/path/to/sudokusolver/1.0.0/bin/sudoku-solver
Then the new version of Sudoku Solver will be released to
/path/to/sudokusolver/1.1.0/bin/sudoku-solver
The reason for this is that for C + + service programs, if the program is running when the original executable file is overwritten, then may appear after a period of bus error, the program due to Sigbus and crash. In addition, if a core dump occurs in a program, the autopsy (post mortem) must be performed with the core file with the executable file producing the core dump. If the original executable file is overwritten, the post mortem cannot be performed.
configuration : The configuration file for the Web Server is written in the Sudoku Solver host:port (more than realm 1, which relies on DNS, usually DNS has a primary standby, high reliability). However, the configuration file for the WEB server and the Sudoku Solver configuration file are separate, if the Sudoku solver is added or the host is migrated, in addition to modifying the Sudoku Solver profile and modifying all Web servers used to it 's configuration file. This is possible when the system is smaller and the system is large and the dependencies between the services become obscure. If you close a service program, you may accidentally cause a service in another group to fail. As Meng in the example of "understanding SOA regulation through a true story".
monitoring : The company uses some Open-source monitoring tools (Monit, for example) to monitor resource usage (memory, CPU, disk space, network bandwidth, and so on) for each host. If necessary, you can write a few plug-ins to enable us to monitor our own written service program (Sudoku solver). But these monitoring tools are usually just observers, and they are independent of the process management tools and can only be seen and not moved. These monitoring tools have their own configuration files that need to be modified synchronously with the configuration of the Sudoku solver. Monit can manage processes, but it determines whether a service process works correctly through a timed poll, and does not necessarily detect the problem immediately (a few seconds).
In this realm, the distributed system has been basically available, but there are some hidden dangers.
Configuration Fragmented
Each service has its own separate configuration, but the entire system does not have a global deployment profile (for example, which service program should run on which hosts).
The configuration file for the service program and the client program used for this service are independent, and if you migrate Sudoku solver to another host, not only modify the configuration of the Sudoku solver, but also modify the configuration of the Web Server used in Sudoku solver , as well as monitoring the monit configuration of Sudoku solver. If you forget to modify one of these, you will cause a system failure.
The dependency of a service program in a distributed system is a headache, and "dependency" is fine (the program's author knows what other services my service program depends on), and "dependency" is tricky (how to know if stopping my program will cause other systems of the company to crash.) )。 This also proves the necessity of using the TCP protocol as the only IPC means, and if TCP communication is used to find out which programs are using my Sudoku solver (assuming that listening port is 9981), then I just run NETSTAT-TP N |grep 9981 will be able to find the current customer, or let Sudoku solver print accept (2) log, a continuous check for a week or one months to know which programs are used Sudoku solver.
Process Management Decentralization
If a hardware failure occurs HostA, how can you quickly replace it with a standby server hardware. Can you first migrate the Sudoku solver that ran above it to a free hostb, and then notify the Web Server to use Sudoku solver on HostB. "Notify Web Server" step to restart the Web server. Realm 3: Self-made cluster management system, centralized configuration
This may be the level of a more mature large company.
The decentralized process management in realm 2 has been unable to meet the needs of business agility, the company began to integrate the existing operational tools to develop a set of its own cluster management software. I have not yet found an Open-source cluster management software that meets my requirements, the following fictitious set named Zurg (the name is from the sci-fi movie "The Fifth Element", the spelling is slightly different; Zurg is also a villain in Toy Story.) ) Distributed System Management software.
Zurg's architecture is simple, typical master slave architecture, see Chen Yu's description of "Managing a Linux server cluster" in the application of multithreaded servers.
The functional requirements of Zurg are discussed in the engineering development method of distributed systems:
To this realm, day-to-day management operation and maintenance work no longer need to repeatedly perform SSH, common tasks can be completed through Zurg.
deployment : Simply send an instruction to master, and Master will command slaves to rsync the new executable file to the local directory from the specified location.
process Management and monitoring : Zurg is the main function of process management and monitoring, compared to the general open source tools, Zurg has some advantages. Since the Sudoku solver is obtained by the Zurg Slave Fork (), Sudoku Solver will receive the crash immediately, so that the status and restart can be reported to the administrator immediately. This is much quicker than munit polling. (You can also do some hands and feet before fork () so that Zueg Slave can more easily obtain Sudoku Solver survival status. )
For security reasons, Zurg Slave can validate its MD5 when it starts an executable file, so that the wrong version of the service program runs in the production environment.
Zurg Master can provide a Web page to see if the various service programs in the cluster are functioning properly. and provides an interface (can be HTTP) so that we can write scripts to control Zurg master.
Upgrade : If you want to actively restart Sudoku solver, you can send instructions to Zurg master without SSH & kill. Zurg will save the startup record of the service process on each host for subsequent analysis. If you use the manual/etc/init.d management method in realm 2, you need to collect log on each machine to know when Sudoku solver restarted.
In addition, you can develop GUI programs, run on the desktop of the operator, restart multiple host Sudoku solver only a few mouse points.
configuration : A fragmented profile is replaced by a centralized Zurg configuration file.
The Zurg profile will determine which service runs on which host, Zurg Master reads the configuration file, and then orders each Zurg Slave to start the appropriate service program. For example, the configuration file specifies that Sudoku Solver run on host1, Host2, and Host3, then Zurg notifies host1 host2 on Host3, Zurg, Slave. (Of course, the Zurg Slave on each host needs to be started by/ETC/INIT.D, and other service programs are started by it.) )
More importantly, the dependencies between service programs are directly reflected in the Zurg configuration file. For example, the configuration file in the Zurg configuration file that indicates that the Web server relies on Sudoku solver,web server is generated by Zurg master (which may use the template engine to read a configuration template for a Web server), where the Sud The host:port of the Oku solver is automatically filled in by Zurg master so that if you migrate Sudoku solver from HostA to HostB, you only need to change one place (Zurg configuration), while Sudoku solver and Web S The Olver configuration is automatically generated by Zurg master. This greatly reduces the chance of making mistakes.
In this realm, the day-to-day management of distributed systems has been basically mature, but there is a greater room for improvement in fault tolerance and load balancing.
At present the biggest obstacle is DNS, which limits the fast Failover. For example, if a hardware failure occurs in HostA, Zurg Master can immediately start Sudoku solver on HostB, but how to inform the Web Server to enjoy service on HOSTB. Modifying DNS entry (resolving HostA domain names to HostB IP) may take several minutes to complete because DNS does not have a push mechanism.
If the idea is limited to host:port, it will take some highly available (high availability) solutions that appear to be advanced, but clumsy. For example, in the kernel to do tricks, try to let the two machines share the same IP, and then through a dedicated heartbeat link to control which host to provide services, which is the standby machine. If the "host" fails, you can quickly (seconds) switch to the standby because the hostname and IP address are the same, and the client does not have to reconfigure or reboot, as long as the TCP is reconnected to complete the failover. If you go a little further on the wrong path, you may also try to migrate the TCP connection to the standby so that the client does not need to disconnect and reconnect.
The Load balance is also restricted to DNS.
If you find that the existing 4 Sudoku solver overwhelmed and deployed 4 Sudoku solver, how to inform each Web Server to add the new Sudoku solver to the connection pool.
There are some ad hoc means, for example, that each WEB Server has a management interface that can dynamically add and subtract Sudoku solver addresses to it through this interface. With this management interface, we can also do some planned online migrations. For example, to proactively migrate a Sudoku solver from HostA to HostB, we can start HostB Sudoku on Solver and then add hostb:9981 to the Web server via the Web server's admin interface Connection pool, and then remove the hosta:9981 from the connection pool, and finally stop the HostA on the Sudoku solver. This is possible for the planned Sudoku Solver upgrade to avoid disrupting the Web Server service. For failover, this seems a bit inconvenient, as Zurg Master understands the management interface of the Web Server, which brings a circular dependency to the system. (Normally, Zurg master should not know/access the interface details of the service program it manages, so that Sudoku Solver upgrade without upgrading Zurg Master.) )
This approach requires Web Server to leave an appropriate service probing channel when it is developed, as recommended in Chen Yu's building an easily maintainable distributed program.
Another method of ad hoc, each Sudoku solver in the start of their own initiative to insert or update the program in a database table Host:port. The configuration of the Web Server is written not host:port, but a SELECT statement that identifies the Host:port,web Server that it relies on Sudoku solver can also be notified in time by database triggers Sudoku solver The change of address list. This increases or decreases the Sudoku server, and the WEB server can respond almost immediately, and does not need to manually add or subtract Sudoku Solver addresses through the management interface. The database plays the role of naming service here, and its availability directly affects the availability of the entire system.
Realm 3 is the darkness before the dawn, as long as the unified introduction of naming service, put aside the DNS, fault tolerance and load balancing problem solved. Realm 4: Combination of cluster management and naming service
This is the level of the industry's leading companies.
The previous analysis, the use of Zurg cluster management software can greatly simplify the day-to-day operation of distributed systems, but it also has a lot of defects-can not achieve fast failover. If the system is large to a certain extent, the frequency of the machine fault will increase significantly, this time the rapid automatic failover is necessary, otherwise the operation and maintenance personnel to fight fire.
Simple and fast failover do not require special programming skills, and do not need to kernel hands, as long as the traditional concept of the DNS, to get rid of the shackles of Host:port, the use of a special distributed system Naming service instead of DNS.
The function of the naming service is to parse a service_name into the list of ip:port. For example, query "Sudoku_solver", Return to host1:9981, host2:9981, host3:9981.
The biggest difference between naming service and DNS is that it can push new address information to the client. For example, Web server subscribes to "Sudoku_solver", and whenever sudoku_solver changes, the Web server receives updates immediately. The Web Server does not need to poll, but rather waits for notification.
Naming Service who is responsible for updating.
In Realm 2, Sudoku solver will voluntarily go to naming server registration. To the realm of 3, because Sudoku Solver is Zurg responsible for starting, then Zurg know Sudoku Solver run in which hosts, it will actively update naming service, do not need Sudoku solver their own hands.
How Naming service availability (availability) and consistency are guaranteed.
There is no doubt that once this scenario is adopted, the naming service is the key to the normal functioning of the system, and its availability determines the availability of the system. Naming Service must not run on a single server, for reliability, you should use a set of (usually 5) servers to provide services at the same time, which, of course, requires resolution of consistency issues. The current accepted solution for highly available naming service is the Paxos algorithm, as well as some open source implementations (zookeeper, Keyspace, Doozer).
The impact on program design.
If the company's network library is designed to consider the naming service, then the program is transparent. The configuration file is not written host:port, but Service_Name, to the network library to resolve into Ip:port address list.
Why the Muduo network library does not encapsulate DNS resolution.
On the one hand because gethostbyname () and getaddrinfo () Do DNS parsing is blocked, I do not have time to write a non-blocking DNS library, on the other hand, because the role of DNS in large-scale distributed systems is not significant, I would rather take time to achieve a naming serv Ice, and write the name resolve library for it.
In Realm 3, each project team has its own hosts, which only run the service programs in this project, and the TCP ports for each service program can be statically allocated (such as Sudoku solver fixed use of 9981 ports) and do not worry about port conflicts. If the size of the company continues to grow, and sooner or later it will run out of 16-bit's port namespace, assigning a port number to the new project will be a problem.
To realm 4, this limit will be broken, and the service program can run on any host in the company without worrying about port conflicts, because Zurg will select the current host's free port to start Sudoku Solver and save the selected port in naming service In As a result, TCP port is dynamically configured, and Web Server is fully adaptable to run Sudoku solver in different port.
(To be continued, next I intend to talk about the design of heartbeat protocols in distributed systems.) )