Build a distributed program that is easy to maintain

Source: Internet
Author: User

Chen Shuo (giantchen_at_gmail)

Blog.csdn.net/solstice

Abstract: This blog does not have any new things. It only talks about one of the slides in a speech last year at the Pearl River Delta technology Salon.

"Easy to maintain" in the title of this article refers to supportability, not maintainability. From the O & M personnel perspective,ProgramIt is easy to manage and has a low daily labor burden. The latter is from the perspective of developers,CodeRead and modify.

In the previous article "process identification in Distributed Systems", I mentioned the following points:Each process in a distributed system that runs for a long time and will deal with other machines should provide a management interface to provide a maintenance probe channel for external users to view the full status of the process.. A specific approach is to build an HTTP server in the program.

Let's talk about the necessity of doing so today. It is divided into two aspects: 1) necessity of built-in monitoring interface in the service program; 2) Convenience of HTTP protocol.

Necessity

The built-in monitoring interface in the program can be said to be inspired by the Linux procfs. In Linux, you don't need any special tools to view the kernel status. You just need to use LS and cat to view the file in the/proc directory. You need to know which processes are running in the current system, what files are opened by each process, how the memory and CPU usage of the process are, how many threads are started by each process, and what TCP connections are there currently, the number of bytes sent and received by each Nic can be found in/proc. Linux Kernel uses procfs as a procfs interface to fully expose the status, making it easy to monitor the operating system.

However, procfs has two obvious shortcomings:

    • It can only expose system-wide data and cannot view internal data of each process;
    • It is a local file system and must be logged on to this machine for viewing. If you want to manage many machines, it will inevitably increase the workload.

For example, I want to know the running status of a self-compiled service process:

    • How many TCP connections have been accepted so far
    • How many active connections are there currently (this can be viewed through procfs)
    • Total number of responses
    • The average length of input and output data for each request is several bytes.
    • What is the average response time of each request in milliseconds?
    • Average number of active requests of processes (concurrent requests)
    • What is the peak value of the number of concurrent requests?
    • Number of active requests on a connection
    • Number of instances of the xxxrequest object in the process
    • Number of database connections opened in the process, and the survival time of each connection
    • The program has a hashmap that stores the current activity request. I want to print it out.
    • A request seems stuck in a certain step. I want to print the status of the request in the process.

These legitimate needs can only be met by actively exposing the state of the program. Otherwise, even if you log on to this machine through SSH, you will not be able to see the useful internal process information. (GDB attach is not always supported, right? Then let the service process pause the response. Not to mention how troublesome it is for GDB to print a hashmap .)

Convenience

If the program needs to actively expose the internal status, which method is the most convenient? Of course it is HTTP. HTTP has the following benefits:

    • It is a TCP server that can be accessed remotely without having to log on to this machine.
    • Another advantage of TCP server is that it can safely and conveniently prevent repeated startup of programs.
    • The implementation of the most basic HTTP protocol is very simple and does not impose much burden on the server program. For details, see the muduo: Net: httpserver example.
    • You do not need to use a specific client program to access it in a common web browser.
    • It is easy to use scripting language to implement the client, facilitating automatic State collection and analysis.
    • HTTP is a text protocol and can be accessed through telnet or even wget in the command line in an emergency (for example, you can connect to the company server through SSH at home to solve an online problem, and there is no web browser available at this time)
    • With the help of the http url path, it is easy to selectively view some information, instead of dumping all the states of the process. See the example of muduo: Net: Inspector.
    • HTTP supports aggregation by nature. A browser page can have multiple built-in IFRAME, allowing you to see the status of multiple processes at a glance.
    • Besides the get method, if necessary, you can also implement put/post/delete, control and modify the Process status through the HTTP protocol, allow the program to "be able to control" ("be able to view" and "be able to control" is a term in the field of automatic control, which can be used here)
    • If necessary, you can use rest to achieve advanced aggregation. For more information, see "Rest-style monitoring" in my speech"

In addition, we discuss that the distributed system is the infrastructure that runs within the enterprise firewall, and the security of HTTP should be ensured by the firewall. Just like your hadoop master and memcached won't be exposed to the Internet, it's okay to use HTTP in the company as long as no one intentionally destroys it.

Instance

At that time, I gave an example of Google:

Of course, we can't see what the status page of Google's internal server looks like, but let's look at other examples, such as hadoop. Hadoop has four main services: namenode, datanode, jobtracker, and tasktracker. Each service has an HTTP status page built in. The default HTTP ports are:

    • Namenode 50070
    • Datanode 50075
    • Jobtracker 50030
    • Tasktracker 50060

If a machine runs datanode and tasktracker, We can query the running status through http: // hostname: 50075 and http: // hostname: 50060.

Exceptions

If the built-in HTTP service is not convenient, it is not difficult to build a simple Telnet service, just like the stats command of memcached.

If the service program itself provides services in RPC mode, you do not need to build the HTTP service, but add an RFC call to implement the same function. This RPC can be named admin (). The input content is similar to a URL, and the page content corresponding to the URL is returned, either in text format or in native RPC packaging format.

Summary

It is necessary to provide a maintenance channel when writing a distributed program. It can help with routine O & M and troubleshoot faults. On the contrary, if you do not reserve these maintenance channels during program development, the O & M will be blind-every process is black box, and you have to search logs to try to recover any situations (guess) the status of the process, and the work efficiency is extremely low.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.