Support distributed monitoring system for thousands of servers original manuscript

Source: Internet
Author: User

Support distributed monitoring system for thousands of servers original manuscript

Shangwei Super

If you were going to do something special, you didn't do it for whatever reason, but you were cool enough !

Requirements Analysis: with the current enterprises continue to grow and develop, most of the enterprises have appeared branch offices, branches of this kind of branch, because the head office also requires the following subsidiaries of network equipment, host and other resources of the State have a relevant understanding, it is required IT operations and maintenance departments on the network not in the same region, The host and other resources should be monitored.

Functional Analysis:

1. a monitoring system often needs to integrate asset management, can logically display the information of business and function, through the analysis of its data, so as to the investment and return of a feedback display, for the rational planning and use of assets to provide a basis.

2. This monitoring project feels very detailed, try to go to achieve!

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/79/C0/wKiom1aaUEWSX2XzAADCoRMsgk4587.png "title=" ' E0c]c %k~x8) seyn9ol3 (6g.png "alt=" Wkiom1aauewsx2xzaadcormsgk4587.png "/>

3. we can use socket socket implementation on the Web side can be different places of the host implementation SSH operation. Similar to the implementation of Xshell functions, for some failed hosts we can log in to perform related operations

4. Visualization: RRDtool , this plugin can be implemented, many cattle break open source monitoring software is implemented by it

5. according to the business needs of different companies, customized monitoring templates, of course, there are Web interface can be modified, because we want to be large-scale monitoring, assuming that there are four computer rooms in Guangzhou, Shanghai, Shenzhen, Beijing, each room 1000 host, you certainly do not want users to modify it, we can make a template, batch modification, for special requirements of the server we are specifically modified.

6. we need to have an alarm: sms, mail, IM, and other interfaces. Programmable interface for third-party alarm media with customizable functionality.

The following is the implementation of the schema:

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M00/79/C0/wKiom1aaUCuR8w-wAAHYWNunyzE204.png "title=" IH (TXQ ]5_cy]0~l7ne2pr@8.png "alt=" Wkiom1aaucur8w-waahywnunyze204.png "/>

So I would like to talk about the specific ideas: assuming that our headquarters in Shenzhen, Guangzhou, Shanghai, Beijing has our branches have 1000 servers. First, there will be a master monitoring server at Headquarters, we can provide a web interface, each place has a monitoring server, in the local monitored servers appear around the monitoring server is the primary server, we think, we have a total of 4000 servers, Each server has different monitoring content, we can not always go to configure it, so not crazy!!! , in Shenzhen headquarters we have a master database, our web interface directly with the database, we made a batch modification of the server, in each monitored by the host to go to the database to read the configuration information, of course, in order to increase the high availability and load balancing, We need to perform a master-slave backup of the primary monitoring server and the local monitoring server, the primary database, and the local databases.

As you can see from the picture,

The MySQL database is only responsible for storing monitoring configuration information,

Redis is responsible for storing large amounts of monitoring data, and of course, log information.

Tirhandle

1. is to monitor the data processing program, we append to the alarm dictionary of the Redis database after we have processed the monitored data from all over the world.

2. Each 10s informs the trunk server to write the monitoring data to Redis, if it is the primary monitoring server that writes to the master Redis, and if it is everywhere, the monitoring server is written across the trunk Redis

3. read the newly written in the Redis database and do the monitoring processing

Action Center: Read the data that needs to be alerted from the master Redis and then follow certain rules to alert

I'll write down the way I think it's achieved.

First of all,

First, client sends monitoring data to server side

Are we sending every second? Or every 10 seconds? Every minute? That's the frequency! We can make a template, set in the configuration information of MySQL database, the client will be read when booting, and then send monitoring data according to certain rules.

After we have processed the client, we only send the data that need the alarm to the server side. Or does it not handle direct delivery? This can be implemented in the configuration information of the MySQL database.

Second, instructions for receiving the server

You can use sockets to implement SSH-like, of course, the Linux system has SSH software by default, so the socket implementation of SSH is what we need to provide a Web page

Third, tirhandle monitoring Data handler, notifies the monitoring server to write to the Redis database every 10 seconds

This we can assemble scheduled tasks on the monitoring server to execute

Four, monitoring server sends its client monitoring data to Redis

Execute scripts at a certain frequency in the monitoring server, also can be scheduled tasks, Redis and monitoring server if on a server will be faster? If the amount of data being monitored is large, it must be deployed on a different server.

Five, tirhandle Monitor Data processing program, read the monitoring information in Redis, and do monitoring and processing.

If you read Redis, you can set up a network shared folder and write to the network shared folder while writing to the database, and our Tirhandle monitoring data handler can manipulate the data directly.

Of course, it can also be copied to the Tirhandle Monitor data processing program on the same server

Define some rules in Tirhandle, which is the threshold of the alarm

If the threshold is reached, do (stored in a file, at a certain frequency sent in the past, it may be how often sent once, may also be sent several time)

Six, Monitoring Server reads the monitoring configuration information of the MySQL server

Given some permissions, you can get it directly from the database.

Vii. Trunk Server synchronizes monitoring data to master server every minute

We can set up scheduled tasks on the trunk server

Eight, according to the above content, there are at least two files on the master Redis, one to put all the monitoring data, one to put Tirhandle Monitor data processing program to call the alarm, that is to say we can look at all the monitoring data, of course, the file will be very large, We can define a certain rule for only three months.

Nine, Action Center reads the data that needs to be alerted from the master Redis in accordance with a certain share of the alarm, if (to reach the alarm condition) do (in accordance with a certain protocol alarm), such as mail alarm, you can open SMTP mailbox settings.

We have to solve some dependency problems, if the network is different, then the alarm information will be many, there are many solutions, we can define some rules on the TIRHANDLE Monitoring data processing program, for example

IF (network does not pass) do (no alarm)

If (host bad) do (I/O service ah, temperature ah, memory ah are not, so do not call the police)

Basic ideas like up!

A time stamp: 2015/7/28 10:37


This article is from the "Make a few" blog, be sure to keep this source http://9399369.blog.51cto.com/9389369/1735682

Support distributed monitoring system for thousands of servers original manuscript

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.