Cook looked through solution monitoring System (IV)

Source: Internet
Author: User
Tags snmp

Let's talk about the architecture of the monitoring system. 365534424, This article only authorizes release on 51reboot, 51cto.

Architecture The word is too big, here we narrow down, only to talk about the macro monitoring system of the overall structure. Within this range, the Web is reduced to a single module because it is responsible for unified system management and operational functions.

The simplest architecture, such as


This is the first layer of the Monitoring system architecture. In contrast to Baidu map, we can think of this as a national map. The most coarse-grained modules are these three. Web, data acquisition, data processing.

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M02/71/16/wKiom1XEhkqCAA9CAADAnDMfnGs137.jpg "alt=" Wkiom1xehkqcaa9caadandmfngs137.jpg "height=" width= "201"/>

PUSH Pull

Let's focus on the data acquisition module to the data processing and alarm module.

Push and pull, technology selection is often encountered in a choice problem. In the CLIENT/SERVER structure, the information is obtained by pulling the model: the server processes the service request sent by the user terminal and returns the results required by the user. In the push model, the server "pushes" the information to the client. Although the direction of both data transmission is from the server to the client, but the initiator of the operation is different. From the relationship between "source" and "user", the flow of information can be divided into two modes, that is, information push and information pull mode.

The comparison of the two models is shown in the table

650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/71/12/wKioL1XEhfHDYYtgAAKY8-8CPjQ088.jpg "title=" Push.png "alt=" Wkiol1xehfhdyytgaaky8-8cpjq088.jpg "/>

One of the benefits of push is good timeliness. But the disadvantage is that the service side should have more complex state management. At the same time in terms of arrival rate and so there will be some tangled place. The advantage of pull is that the server is simple, the state management is simple, but the drawback is the timeliness is not controllable. In the monitoring system, if all the monitoring items to be monitored are required server-side push to the client, assuming that the client server is shut down, then push is not reached. The server side has to find a way to record, and then retry and other failure processing. And if it is the client side active to pull to do, the server boot, after the client immediately to withdraw. The arrival rate is certainly better, and the management of the server is simplified. But the disadvantage is that you want to take effect a monitoring item, can only wait for the client to pull, but not immediately effective.

Here is a more classic example, but also when I interview other people always like to ask a question. Of course, when I asked the interviewer, I mainly wanted to see TA's logical thinking ability.

Title: Weibo has been used by everyone. Inside you can focus on a person, you can also be concerned about. When you send a tweet, people who follow you will receive a hint. When someone you care about sends a tweet, you get a hint. Is this the tip of push or pull to your Weibo client (browser or mobile phone Weibo)?

Interviewer: someone will say, push.

Interviewer: OK, then I will ask, Chen Yao on Sina Weibo fan number is more than 50 million, she sent a micro Bo, is not to push 5000多万个 message to each account?

Interviewer: Well, that's timing pull.

Interviewer: Are you sure? Tens of millions of clients all pull?

Interviewer: The amount ... The interviewer started the black line on his forehead.

Interviewer: What should I do?

Push words, Chen Yao of a micro Bo, in the system will produce 50 million messages to deal with. If she sends 100 a day, the Sina microblog is estimated to be crazy. This has not been considered many clients do not log on, the message must be cached. There are many clients that are not notified at once and have to deal with failures.

Pull, if a large number of users in the production system used, storage and caching is a big challenge.

Specifically, we can go to Google again, this thing actually has a lot of solutions.

more experienced research and development will certainly agree with one of my arguments: two controversial technical solutions, and finally a third solution that blends the two. It is as if two particularly opposing negotiators, to the final outcome of the negotiations, is a solution of integration or compromise. Push and pull can also be combined to complement each other. According to the combination of push, pull and combination of the difference between the following four different types of push-pull mode:

    • First pull-push by the service side, and then by the client side of the targeted pull;

    • Pull back first--------------------------based on client-side pulls.

    • Push-pull-in the data push process, allows the client to interrupt at any time and pulled more targeted information;

    • Push in pull--server proactively pushes relevant latest information based on client-side pulls


Push pull selection for several open source monitoring systems

Zabbix: With Agent mode. The agent actively pushes the data to the server. From the client's point of view, it is the push data to the server

CACTI:SNMP protocol, no client, or client is an SNMP client. From the client's point of view, pull

Ganglia: From the client point of view, it is push

In the monitoring system which I used in the production environment, we adopted push and pull combination to achieve the timeliness and the arrival rate of the solution. We stand in the client's perspective to describe the solution. For the monitoring entry to take effect, the web-side changes immediately after the use of push to notify the client. But there must be a problem with the rate of attainment. For example, the client server crashed, restarted, at that time the network has problems can not reach and so on. So we are on the client side and support timed pull. Periodically to actively contact the server side, to obtain their own monitoring content should be effective.


HASH

Why do you suddenly talk about hash? Hash first a concept to popularize it. After reading the concept or do not know the classmate, self-wall to go, your computer data structure must not be studied well.

I say hash because it is related to the later introduction of the High availability architecture.

Hash you do not directly to search, with the result of Baidu is husky.

The keyword can be a hash.

Hash, the general translation to do "hash", there is a direct transliteration of "hash", is the arbitrary length of the input (also known as pre-mapping, pre-image), through the hash algorithm, transformed into a fixed-length output, the output is the hash value. This conversion is a compression mapping, that is, the space of the hash value is usually much smaller than the input space, different inputs may be hashed to the same output, so it is not possible to determine the input value from the hash value. Simply, a function that compresses messages of any length to a message digest of a fixed length.

Hash in the algorithm is very basic but very extensive use. Especially in the case of large data volumes.

I emphasize hash here, is to say that one of its functions is to hash. Scatter the input to a few places. Referring to the hash has to mention a word called consistent hash, which is a great advantage to solve the cache hit ratio. Often used in memory caches, CDN, and other storage systems.

One of the essence of hash is to hash the input into different output channels according to some calculation rules.


Stateless and stateful

Let's take a stateless protocol to experience what a stateless concept is.

The state of the Protocol refers to the ability of the next transmission to "remember" the transmission of this information. The typical HTTP protocol is not to maintain the information transmitted by this connection for the next connection,

Due to the Web server's concurrent access to many browsers, in order to improve the processing power of the Web server for concurrent access, when designing the HTTP protocol, the Web server is required to send HTTP response messages and documents without saving any state information from the Web browser process that made the request. It is possible for a browser to access the same object two times within a few seconds, and the server process will not accept a second service request because it has already sent a reply message to it. Because the Web server does not save any information about the Web browser process that sent the request, the HTTP protocol is a stateless protocol (stateless Protocol).


The hash and state of the monitoring system

Monitoring system to the data processing, mainly filtering abnormal data out and alarm. For example, a server has more than 95% CPU utilization and needs to be alerted. But this time suddenly the data processing module is located on the server outage. Then this anomaly data is likely to be lost.

Monitoring System common Alarm condition is: CPU utilization exceeds 95%, calculate an exception. If there are 3 exceptions within 5 minutes, the alarm is given to OPS.

Here are a few numbers that need to be processed, 5 minutes, 3 times. The downtime mentioned earlier can result in the loss of an abnormal data. Assuming that there were 3 times in 5 minutes and lost one time, it would not come out of the police. This is a stateful scene.

In the case of a state, to do automatic switching or load balancing, you need to bring the state to the past.

More typical is the question of the session. If the web is a load balancer for multiple hosts, there will be problems when the session is stored locally. Because the user is likely to load-balanced scheduling, multiple requests fall on different hosts. Originally the HTTP protocol is stateless and supports load-balancing scheduling. But because of this stateful product of the session, the session must be placed on public storage.

Combined with the architecture diagram mentioned earlier. Data is entered into the data calculation and alarm module. How do we ensure that this data calculation and alarm module is a highly available architecture.

The answer is to hash the input monitoring data into different data calculation and alarm module instances, preferably in a stateless or weakly-state computing process.


to be continued, please Welcome to join the Operation Development Discussion Exchange group to Exchange, group number 365534424




This article from "Reboot operation and Maintenance development" blog, declined reprint!

Cook looked through solution monitoring System (IV)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.