1 checksum
Timed monitoring of the alarm list, alerting if found and set limit value not met
There are three types of monitoring services: default service, single process checksum service, and distributed checksum service. Depending on the configuration, the default configuration: Defaults
Alarm status
Name |
Database code |
Corresponding Database action fields |
UNKNOWN |
Insufficient data |
Insufficient_data_actions |
Ok |
Ok
|
Ok_actions |
ALARM |
Alarm |
Alarm_actions |
1.1 Service 1.1.0 Alarmservice
base class, the other service integrates it, implements the basic check function
1.1.1 Alarmevaluationservice (default service)
- Start the check timer based on the alarm list (current enable alarm)
- Start the Load Balancer service and start the Heartbeat information timer
1.1.2 Singletonalarmservice (Single process calibration service)
Single-process verification, weak processing capacity, high data volume will be delayed or shutdown, not recommended to use
- Based on the alarm list (current enable alarm)
1.1.3 Partitionedalarmservice (Distributed calibration Service)
Partitionedalarmservice
It implements a set of collaboration protocols (Partitioncoordinator) between multiple evaluator processes through RPC, enabling the ability to continuously increase the processing power of alarm service through horizontal scaling, enabling a simple load balancing and high availability
Partitioncoordinator
Allow to start multiple ceilometer-alarm-evaluator processes, the relationship between these processes is a collaborative relationship between them, the earliest initiated process will be selected as the master process, the main thing the master process is to assign alarm to other processes, Each process performs three tasks on a recurring schedule:
- Publish the presence of messages, broadcast their status to other processes through RPC, tell other processes that they are alive, and that each process holds the last active time of other processes
- Check whether it can become master, each process will constantly update the status of the other processes maintained by the list, according to the status list, to determine whether it should be the master, to determine whether a process is master only one condition, that is to see who started the early
- Verify the data, check the alarm that the process is responsible for, call the Ceilometerclient interface to obtain the monitoring data corresponding to the alarm monitoring indicator, then make judgment, send alarm, etc.
1.2 Alarm1.2.1 Combination
Alarm alarm, combined with the results of multiple indicators to operate accordingly
1.2.2 Threshould
Monitor one or more indicators, if greater than, less than, or equal to the threshold of monitoring and other conditions, triggering the action of alarm specified state
2 Alarms
The alarm function is to check the meter data according to the rules stipulated in the alarm object, and if the data is found to meet the conditions, the alarm is issued. The initial alarm status is OK, and if the status changes to Unkown or alarm then the Alarm_history table will have alarm status update data while triggering the action of the corresponding state. If the current state is alarm, the post-checksum state is still alarm, and the corresponding action is not triggered.
2.1 Log
Logging, Level: info
2.2 Rest
The action of the specified state in the alarm is invoked through the HTTP protocol, usually a call to the specified address, reporting status.
2.3 Test
Test use, no actual use
2.4 Trust
Call the Keystone interface and use the method in rest to send
3 issues that may be encountered
- Error selected for time period when creating alarm. This field can be directly unassigned if you need to monitor it all the time, not just for a certain period.
- The combination of several conditions when creating alarm needs to be considered well. Interval time (period/evaluation_periods), Time range (time_constraints), alarm type and action based on type (xx_action)
- Create alarm initial state given as OK
- Alarm rule settings. Generally: [Field] in [Meter_name] record [avg/max/min/] in [evaluation_periods] time [] value [greater than (GT), less than (LT), equals (eg) ...] The condition is met and the alarm state needs to be updated. Rule examples
"Threshold_rule": {
"Comparison_operator": "GT", #大于
"Evaluation_periods": 2, #和period确定校验时间段
"Exclude_outliers": False,
"Meter_name": "Disk.device.read.requests",
"Period": 10,
"Query": [#查询规则
{
"Field": "resource_id",
"Op": "EQ",
' Type ': ' String ',
"Value": "Fc0e5394-0276-413e-8d81-e3324df35a12-vda"
}
],
"Statistic": "Avg", #针对meter中volume的具体计算方法, such as average, maximum, minimum, etc.
"Threshold": 990 #阈值
}
Openstack-ceilometer-alarm operating mechanism