A 3-year integrated operation and maintenance monitoring project brings the impression

Source: Internet
Author: User
Tags hp loadrunner

Also in 2011, a customer contacted us proactively and wanted to work with us to build an effective operational monitoring platform for them. In the first contact, the customer's ideas and my ideas even coincide. At that time, as a technical consultant for many years, has advocated the more meticulous operation and maintenance monitoring platform, the more time needed to digest, the process of digestion is not only the customer, but also the fact that the team face the new monitoring environment of the digestion process. Therefore, in 2011, and the customer after a detailed exchange consultation, the entire project planning for three years, the annual schedule of February, the results of the implementation and customer requirements are also very clear:

1. Monitor all infrastructure devices in the data center

Covers all Windows\unix systems, SQL server\oralce databases, Exchange\iis\citrix applications, and various network devices

2, the production business system to conduct business-led monitoring

Involves SAP R3 and EBS

3. Achieve effective relationship between business monitoring layer and infrastructure layer

4, integrated mail, SMS alarm platform, display screen

5, emphasis on usability, while highlighting the performance of the business

6, involving a small amount of two development work, low maintenance work

Very intuitive to see, customers want is very simple, do not want to waste too much human resources to maintain the system, so as little as two times development.

Emphasis on usability, while highlighting the business performance, but also can see what customers want, is through the operation and maintenance monitoring system can minimize mttr, thereby guaranteeing the production system uptime, to minimize unnecessary economic losses.

After that, is the product selection and how to complete these work in three phases, the customer was only to understand some operations and maintenance of knowledge, and many are also through the study of foreign headquarters, they see the foreign headquarters mature operation and maintenance of monitoring experience to the real economic data of the enterprise, so a home began to do this thing. In order to ensure the unity of the enterprise, through the detailed communication with their headquarters engineers to understand their headquarters to adopt the HP OpenView program, and has been ongoing for six years. Now has basically achieved the effect of intelligent automatic operation (sometimes, really admire the patience of foreigners, some things need to take time to do in order to change from the nature), and I communicate with the engineer, a person to maintain a hundreds of server equipment data center. So, I joked with a customer said, we do not have to plan to intelligent automatic operation and maintenance monitoring system, a customer said, do not have to make intelligent, all intelligent my colleagues to do. Oh.

Therefore, the HP OpenView solution is used. HP's operation and maintenance monitoring platform, is I have been very fond of, and more focused monitoring platform, covering a wide, simple operation, strong customization, local support is good, after all, is brand goods.

Considering the customer's first such system, whether the fastest integration into the customer's own workflow, how can in the shortest possible time to reflect its role, became a phase of the purpose. After the number of customers can be used, the purpose of phase two is to scale out the first phase, extending the effect of the first phase to the entire data center and other sub-data centers. With the three-phase task is also, vertical development, the business layer of monitoring and monitoring of the infrastructure layer to effectively map the journey to a complete business-centric integrated operation and maintenance monitoring platform.

The architecture of the plan is as follows:

So in the first phase, do very simple, using the HP OM and NNMi, because there is agent monitoring, so a only a few servers, dozens of network equipment to monitor, complete with the mail and SMS platform integration. In fact, really like, a customer's patience, such a level of technical personnel in a short period of time can be familiar with the main functions of the monitoring platform, and learn how to maintain him, at the same time in accordance with our planned workflow, the completion of daily work. And in the first week when the project was just put into the line, effectively resolved two production environment network equipment in the early morning outage situation, the first time to get an alarm, technical staff first time to deal with the failure, production line has not been affected.

In 2012, the smooth entry phase two, the expansion of the same time, increased the application of the monitoring of the corresponding HP OM SPI, the basic coverage of the data center as well as the sub-center of all equipment. Monitoring mode, the continuation of a phase, the availability of the main, supplemented by performance.

According to our opinion, a customer's operation and maintenance monitoring platform is not concerned, such as database cache hit rate, application processing error page, lock request too many of these particularly detailed indicators, only the state, the number of connections such as direct response to the availability of data. This way of thinking, to avoid the technical personnel to face the complex monitoring indicators confused, not every time for a performance problem alarm and midnight heartbeat, here is not the performance monitoring is not important, but in the daily monitoring of the infrastructure, too much performance monitoring will often result in personnel processing monitoring fatigue increased, This, in turn, affects the monitoring of very important usability, especially in cases where the object being monitored is particularly numerous. Of course, there are individual engineers to make their own performance requirements, we also for the equipment he is responsible for the individual performance monitoring of the refinement and expansion.

In the two period, we added a trial operation in the project progress, monitoring optimization tuning process, in the initial test run stage, found a large number of alarm storms, the technician was measured poor said, mobile phone messages received the crash, hehe, of course, this is our negligence, forget to start up the bezel. In this respect, simple and effective monitoring strategies, as well as good message-suppressing strategies, are important for a large number of levels of monitoring.

Just until 2013, a customer notice, three, they are looking forward to, the history of the two years to integrate infrastructure system operation and maintenance monitoring into the daily work, and the production environment of high sustainable performance to play a good escort role. But the resulting problems are highlighted, monitoring can not be 100% full coverage, and the infrastructure monitoring is based on the focus on availability, pure business perspective monitoring is the blind spot of the current monitoring. Regardless of the complexity of infrastructure monitoring, the goal is to ensure that the business it supports can run well, the purpose of the business layer monitoring is to realize, directly from the business perspective, proactively detect the availability of the business and performance.

In fact, this principle is simple, based on the HP LoadRunner can be a lot of protocols to the business of the recording and development of scripts. With such a series of scripts, we can allow HP BPM to proactively explore the availability and performance of these services in an unattended situation, and can analyze them.

Such a way to make customers feel more simple and intuitive to know their own production business system health status. At the same time rely on the infrastructure layer of monitoring, can also find a lot of major failures.

In the three period, a lot of time is the investigation of the business, because the future of business and infrastructure mapping, it is necessary to need this information, the final decision is the effectiveness of monitoring. Our research has done a lot of detail, directly look at the example diagram (Note: The figure is only a schematic, not representative of any customer system architecture, please do not hotlinking, thank you):

Rely on these detailed business flow surveys to ultimately build a very detailed and complete business layer with the infrastructure Layer monitoring topology map (not representing any customer environment):


Such a topology map allows customers to visually see the overall health of the business and infrastructure layers. is a top-down 360-degree monitor.

Let's show another picture:


Above, the HP BSM Business Monitoring layer automatically generates a tree-like monitoring interface based on the acquisition layer HP BPM, where Green indicates that the business is in high availability and high performance state.

This interface supports "pull-drag". When the mouse points to a business, the auto pop-up red box shows the small board, which visually shows the operation status of the business and the current performance response time of the business.

So far, a customer for its three years of three steps, so that they have a more comprehensive business-oriented integrated monitoring platform. In the course of nearly three years, we work together with customers to learn from each other, we learn how to make operational monitoring system more efficient integration into a working environment, while maximizing the relevant data center escort, the customer also mastered how to effectively use such a platform, Transform your IT value into business value, and then into the value of your enterprise.

Although, such a similar project has been done a lot, but this is one of the few willing to a particularly practical mentality to try, learn, integration, so that operation and maintenance monitoring platform more effective for the enterprise services. Now, the customer is ready to continue this work, the next step is the operation of monitoring automation, in fact, in the first phase, we should be the customer's needs in some monitoring strategies, automatic troubleshooting of small scripts or small programs embedded. In most cases, it's better. Customers also figured out that automatic operation and maintenance monitoring is not to reduce the production or personnel, is to let the technical staff have more energy to more need of their place to the enterprise's it and even business services.

The following shows the other monitoring chart, also does not represent any customer:

Business Segmentation Report


Hpbms can provide reference data for business optimization by visually seeing the time spent by each business in different communication stages, without a specific business transaction diagnosis.

Business code Diagnostics

Drill down to the method layer


Wait a minute

End


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.