Operation and Maintenance tool system diagram

Source: Internet
Author: User
Tags logstash

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/83/5F/wKioL1dx4ZKhAZ14AAT-5JenYfk641.jpg-wh_500x0-wm_3 -wmp_4-s_2694188235.jpg "title=" Operation and Maintenance tool system 3.jpg "alt=" Wkiol1dx4zkhaz14aat-5jenyfk641.jpg-wh_50 "/>


Operational Process management tools

1. Release the Change Process management tool

As a system interface to work with other roles. and provide approval links to control the risk of release changes. Process management tools are not responsible for the execution of specific business operations, but only as a document system to track processes and ensure closed loops.

2. Alarm and burst management tools

The alarm that manifests the damage of the business is automatically built single management. Upgrade to burst orders after manual confirmation. By building a single management alarm and a burst to ensure that the process is closed, and each failure is able to summarize the experience, does not measure the availability of business to provide KPIs.


Operations Release Change Tool

1. Version management tool (database)

All publications should start with version management. The version packages that are developed are first entered into the Release management tool and then distributed from the Release management tool to the current Web release. Eliminate the practice of Rsync one server publishing another one.

2. Configuration management tools (database)

Version plus configuration is equal to the status of each machine in the current network. The most coarse-grained configuration management is to the IP level, equivalent to the machine to do asset management, grouped into different business, modules and large areas such as business concepts. Fine-grained management of the process and the associated configuration of the process.

3. Configuration and Release tools

The specified version, combined with the configuration of the configuration to the current network of the machine. Different versions and configurations require a completely different approach. The ssh/fabric is a script-centric approach. The distribution mode represented by Puppet/chef is configuration-centric.

4. Current Network Status Synchronization tool

In order to circumvent the current state drift, the records in the management tool are inconsistent. A tool is required to escalate the actual status of the current network.

5. Service Scheduling Tool

Publishing changes often requires a serial process, a module first, and then the B module. A lot of machines, you need to be able to concurrently execute the operation, not concurrent operation to ensure serial execution. At the same time, many release change processes require operations beyond the scope of management services, such as DNS server records in the cloud. This requires a service Scheduling tool unified scheduling configuration and release tools, process documentation tools, and other systems of the API interface together to assemble a process.

6. Resource Management and Isolation tools

The tools represented by XEN/KVM enable operations to cut resources more flexibly. such as the rapid start and stop of virtual machines, IP in the IDC drift and so on. The tools represented by Lxc/docker allow operations to further cut resources to the process level. Fine-grained resource control for resource isolation agents provides better resource utilization and easier deployment of scalable resources.

7. Unified interface for publishing changes

Package all the underlying tools to provide a simple interface to complete standardized release change operations.



operation and Maintenance Monitoring alarm tool

1. Collection Tools

It is generally the acquisition of log files, or it can be timed polling DB or other system interface. The popular open source solution is logstash.

2. Collection Tools

The capture tool is escalated to the collection tool. or by developing a direct modification of the code to escalate the metrics to the collection tool. The open source solution for the process is still logstash.

3. Statistics Warehousing Tools

The escalation may be reported once per call, and the statistics tool is responsible for counting the number of times in a minute. It is also possible to report a value every 5 seconds, and the statistics tool is responsible for counting the maximum value within one minute. The existence of statistical tools is for the convenience of reporting. The popular open source solution is STATSD, and there are big companies that have developed two times based on Storm.

4. Time Series Database

All timing indicators are landed in the database. The database needed to monitor alarms needs to be able to support very large amounts of data, but there is no strict ACID requirement.

5. Operations Events Database

All alarms are recorded. This includes receiving alarms from other systems and recording all changes to the current network. This data is used to support the cause of the alarm positioning.

6. Indicator Anomaly Detection Tool

Based on the mathematical model, it is found out whether the indicator deviates from the stable mode of the past, and the change of network state is inferred.

7. Testing Tools

Timed PING or HTTP GET to simulate the actual user Discovery service is interrupted, generating alarms. It also generates indicators to be reported to the collection system. The set-up is divided into local measurement and remote measurement. Local dial-up can be used to discover native alarms such as disk read-only. Remote dial-up can simulate the geographic distribution of the user, and the link condition of the network is also included in the range of the test coverage.

8. Alarm Convergence Tool

Synthesis of alarms from all sources, frequency convergence, root cause analysis. Consolidated into a report urging manual repair.

9. Automatic Alarm Repair Tool

Receive alarms for automated processing. To help operation and maintenance to complete the fixed failure of the chassis back-up library and other operations. Or in the case of the business itself is not done high availability, do fault machine replacement, IP drift and other current network repair operations to a certain extent to improve business availability.

10. Alarm Notification Tool

Important alarms need to be upgraded to phone. Need to have a high-availability phone, SMS, ETC notification interface.

11. Monitoring Alarm Unified Interface

Shielding the lower layer of various tools, providing a unified agent installation, indicator collection settings, indicator curve display, alarm query interface. A place to know all the problems of the present network.


This article from "A few" blog, declined reprint!

Operation and Maintenance tool system diagram

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.