Monitoring system of discovering (II.)

Last Update:2015-07-24 Source: Internet

Author: User

Tags snmp

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Welcome to join the DevOps development Discussion Exchange Group to Exchange, group number 365534424

Definition of extensibility

Scalability (scalability) is a software system computing processing capacity of the design indicators, high scalability represents a kind of elasticity, in the system expansion and growth process, the software can guarantee exuberant vitality, through little change or even just hardware equipment, can achieve linear growth of the entire system processing capability, achieving high throughput and low latency performance.

Scalability and pure performance tuning are fundamentally different, scalability is a combination of high performance, low cost and maintainability, and many other factors such as a comprehensive consideration and balance, scalability of the smooth linear performance improvement, more focused on the horizontal scaling of the system, through the low-cost server to achieve distributed computing While general performance optimization is only the performance index optimization of a single machine. All they have in common is a choice between throughput and latency based on the application's characteristics, and of course the cap theorem is constrained by the horizontal scaling partition.

The contradiction between extensible and over-design

Specifically, we discuss the scalability of the monitoring system, which we refer to as the system can be scaled up with the size of the monitored object without having to change the architecture. 1000 servers, it is this architecture, 10,000 servers when the architecture, the best 100,000 can only need to increase the server, architecture or that architecture. It sounds great. This is the dream of every system architect. But when reality shines into the ideal, it is cruel to find the ideal. First of all, for the designer, when he was in a company with only hundreds of servers, it was hard to think that his system might one day run on tens of thousands of servers in size. There is also an architectural design inside the principle, is to try to avoid over-design. If the operations of a hundreds of-server-size company were developed, the boss said that a system could support tens of thousands of servers, but it would take more time to architect and refactor, and I think the boss would think his devops development must be insane. One of the architectural principles is to avoid over-design as much as possible.

But software design still advocates good architecture, good scalability. Otherwise, the value of the architecture design will be greatly discounted, code reuse and system implementation costs will grow linearly as the scale expands. A good architecture can be iterative, which in turn directs low-order system design. But the lower order cannot predict the higher order. This is one of the uses of architecture. So I'll introduce some of my experience and experiences with monitoring system architecture later in the article.

A competent designer is able to stand on the scale of hundreds of servers, taking into account the situation of thousands of units. But he was a little over-thinking about tens of thousands of units. This is not to say that the system architecture that supports tens of thousands of units is not good. But if he doesn't know, there's no need to think too much. If we can know in advance, Zhen Huan will say that it is obviously excellent.

But it is clear that more and more things need to be considered. Dozens of when you may only need to consider a computer room, hundreds of units will have 2, 3 rooms, when thousands of units may still be within 10 IDC, but when tens of thousands of is likely to have more than 15 IDC. and geographical distribution will be more extensive, the resulting carrier coverage, network complexity, the complexity of the business will be completely different. It is unrealistic not to force a devops development to be completely imaginary. He has not experienced the actual demand scenario, there is no way to take into account such a variety of problems. So I fully understand the attitude of some big companies to open source projects. A good point may be changed to use, further may be a separate branch to start the change, the more the change completely and the trunk is not the same, in fact, there is the wheel of their own. But sometimes it does not make wheels, and the open source wheels do not work.

Scalability of monitoring

specific to the monitoring system, what is the scalability of? Let's start at the beginning. Monitoring system input is monitored by the various monitoring data. These data are processed through a series of processes, which are ultimately stored for post-mortem analysis and offline analysis, while the main function is to make real-time alarms. Throughout this process we can be seen as a process of streaming computation. When it comes to streaming computing, we actually think of storm. This is another thought I thought, is to put all the processing process to strom up. Balabalabala .... That's far away. But we look carefully, Strom, streaming computing platform, is distributed. One of the features of distributed architecture is good extensibility. With the enlargement of the server scale, the scalability requirement of the intermediate data processing layer is the scalability of computing ability. Simply put, the data is much, by adding servers or upgrading the server can be done.

There are several boundaries where you need to support extensibility. The first one is the entrance. Or a receiving port called data. Outside the data flow into, if you want to do a good expansion, the first thing to consider is the reception link. Data can go through TCP, UDP, SNMP, HTTP and other protocols into the monitoring system. Considering the size of the tens of thousands of servers, this place is a test of technology. If you go to SNMP, HTTP of course, but the two protocols are in the application layer, will inevitably bring additional overhead. Take the HTTP example, we take Nginx or Apache server, in fact, the nature of scalability. Once the data is received, it can be stored in a single store (whether the store is cached or permanently stored). This process, without state, is naturally extensible. An Nginx instance can't carry, one more, one more, 10 more. This solves the extensible problem of the interface.

Another extensible is the storage link. This storage is primarily a persistent storage of monitoring data. As we said before, data reception and computation can be supported by some means. That storage will inevitably become a bottleneck. This is true in many systems, and the front end can be extensible through Web server, but eventually everyone runs to a database to read and write. Even if it's a read-write separation, it's a master repository. Main Library pressure Alexander.

This place I recommend using some distributed storage to solve this problem. But not very recommended Mango this more wonderful. Because the ability to write is not very good. Although it later had some improvements to alleviate the problem, note that it was just a relief.

In summary, for scalability, our idea is: distributed, stateless.

Not to be continued

This article from "Reboot Operation Development" blog, reproduced please contact the author!

Monitoring system of discovering (II.)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More