Build CMDB from Scratch: What is CMDB?

Source: Internet
Author: User
Keywords cmdb build devops cmdb what is cmdb
How to build CMDB
In the early stage of development of a business system, it may only require one or a few machines and only a few operation and maintenance personnel. In daily work, it only needs to pay attention to whether the machine is down, whether the network is smooth, and whether the service is alive. The operation and maintenance scenario is relatively simple. As the business develops and the scale continues to grow, the operation and maintenance scenarios become more and more complex, such as the need for hierarchical release during deployment changes, multi-dimensional business monitoring for monitoring management, business capacity planning, fault stop loss, and diagnosis. In response to large-scale equipment and service O&M, how to efficiently organize and manage various O&M resources and improve O&M efficiency by combining various O&M platforms is the core value of CMDB.

CMDB is an abbreviation of configuration management, and is often regarded as the basis for building other ITIL (IT Infrastructure Library) processes, and undertakes several key functions of data integration, synchronization, and visualization. The construction of operation and maintenance CMDB usually needs to go through the following three steps:

Resource management model abstraction: Operation and maintenance objects involve multiple resource objects such as computer rooms, machines, network devices, and applications. How to abstract and model these operation and maintenance resources to support more operation and maintenance scenarios?
Synchronization of resource data integration: O&M resources need to go through complex processes and life cycles to support the entire O&M work, such as a machine from budget application to purchase and storage, then to shelf, deployment services, business monitoring, the entire process will involve With multiple operation and maintenance platforms such as assets, deployment, and monitoring, how to integrate resource data to improve operation and maintenance efficiency and ensure data consistency and accuracy?
Support upper-layer business operation and maintenance: After the operation and maintenance CMDB is built, it does not mean the end. On the contrary, this is just the beginning. The data must be valuable / scene consumption-oriented to play a greater role for the business. For example, in data visualization, automation, In terms of intelligent operation and maintenance, data operation, etc., how should CMDB support upper-layer business operation and maintenance to maximize the value of resource data?
Below we elaborate separately.

Resource management model
First of all, the objects of operation and maintenance are divided into two categories, one is infrastructure, including computer rooms, racks, servers, network equipment, security equipment, etc. The other is services built on infrastructure, including applications and middleware , Domain name, etc. The resource management model is mainly divided into two parts. One part is the abstract modeling and association of various types of resources, and the other part is the organization and management between resources. Below we explain separately: 1 The resource abstract model takes machine operation and maintenance as an example.

Machine resources have the following types of information:

Basic attribute information, such as: SN, manufacturer, model and other hardware information, production date, warranty period and other maintenance information;
Configuration information, such as: installed operating system, assigned IP address, etc.;
Run-time status information, such as: machine operating status, resource utilization, etc.
A business resource also has the following types of information:

Basic attribute information, such as: service name, service description information, person in charge, maintainer, etc.;
Configuration information, such as: deployment version, deployment path, startup parameters, open ports, etc.;
Run-time status information, such as: service running status, resource utilization rate, etc.
Through the above analysis, we found that whether it is basic physical infrastructure resources or virtual resources such as services and domain names, they basically contain the following three types of information:

Basic attributes: used to describe resources;
Configuration information: used to indicate how to use resources;
Run-time status information: indicates the current status of the resource.
In fact, in addition to the above information, there is another important type of information that is the relationship between resources, which can also be called a resource knowledge graph. For example: a business that provides Web services externally will use a domain name resource, will rely on a MySQL database service, and will deploy On a group of machines, the machines are installed in the computer room, and some network equipment is connected

The above is just a simple example. The complete resource knowledge graph is much more complicated than the above, which can provide richer relational data and bring greater value to operation and maintenance. But usually all we need is a certain level of topology (view). For example, the network topology composed of machines and switches is mainly used in network operation and maintenance, and the topology of the upstream and downstream call relationships between services is mainly used in business operation and maintenance, which can help in fault diagnosis.

Resource organization and management With the above resource management model, we still organize and organize resources from the perspective of organization and people to improve the efficiency of daily operation and maintenance. Generally in large and medium-sized enterprises, there will be a dedicated system department for unified operation and maintenance for computer rooms, machines, networks, etc., and a special operation and maintenance department SRE for business systems for operation and maintenance. Therefore, based on the operation and maintenance model, we divide resources into foundations Organization and management of facility resources and business resources:

Infrastructure resources: Also take machine operation and maintenance as an example. For fault repair scenarios, we need to classify and manage the machines according to manufacturers. For hardware monitoring scenarios, we need to configure batch monitoring for a batch of servers that support the same acquisition protocol. Therefore, the machine needs Group management of different dimensions.
Business resources: It is usually more complicated. For a business system, it may contain a large number of applications, and multiple applications may also constitute a business subsystem. Business monitoring usually uses modules as the configuration unit, which is effective on all instances under the module, but the business deployment is different. It may be deployed in clusters according to different environments, or it may be released according to the granularity of the equipment room, or A/B testing according to user classification. It actually corresponds to the management requirements of different resource views in different O&M scenarios.
In the end, regardless of infrastructure resources or business resources, there will be resource isolation requirements. For example, a business system needs to monopolize a batch of machine resources to prevent other business systems from preempting resources and reducing service performance. This is actually a requirement for resource isolation by tenants. Based on many years of operation and maintenance experience on Baidu Intranet, we take the business as the core and abstract the resource organization management model in the form of equipment tree + business tree for machine and other equipment management scenarios and business service management scenarios

Tenant: It is used for resource isolation. The tenant contains a variety of operation and maintenance resources, which can also be called a resource namespace.

Device tree: Organize and manage various physical devices.

Equipment: A collective term for physical equipment such as switches, routers, firewalls, and servers. It can correspond to various physical equipment models in the resource modeling above.
Equipment group: A collection of several devices divided according to O&M requirements. For example, machine O&M personnel can divide machines according to machine models, and network O&M personnel can group network devices according to type.
Business tree: Organize and manage various business resources.

Instances: Instances are the smallest unit of deployment and monitoring. Tag integration services can be used on instances to meet flexible management needs. For example, service monitoring scenarios will aggregate instance data in the same computer room, and service deployment scenarios will be deployed and upgraded according to the test environment and the generation environment.
Application: An application is a collection of instances that provide the same service. The concept is similar to the concept of modules mentioned in daily life, such as a Web application.
Service: A service is a collection of combined instances with the same dimension. It is determined by Tag Selector and represents a certain dimension view of the application. For example, it is divided into a development environment cluster, a test environment cluster, and a production environment cluster by cluster; it is divided into Nanjing by machine room. Computer room, Shanghai computer room; divided into V1, V2, V3, etc. according to version.
Subsystem: For large business systems, many application modules may be deployed. Subsystems are groupings of applications to facilitate organization and management.
With the above resource abstract model and organization management method, we take the operation and maintenance of a medium-sized e-commerce company as an example to briefly explain the following resource organization management methods. Suppose the company is divided into two departments: the system department and the operation and maintenance department. The system department is responsible for machine operation and network operation and maintenance, and the operation and maintenance department is responsible for business operation and maintenance.

Responsibilities of the operation and maintenance personnel of the system department: the machine operation and maintenance sub-vendor monitors the company's machines to ensure that the machines are in normal operation, and the network operation and maintenance ensures that all network equipment is in normal working condition.
Responsibilities of the operation and maintenance personnel of the operation and maintenance department: the business operation and maintenance is responsible for the change management, monitoring management, capacity planning and fault management of the online mall system business, and for security, it is hoped to set up different personnel management for different applications
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.