Bosun Quick Start
This document is a quick installation documentation for Bosun. According to this document, you can set up a complete bosun service, which can aggregate various information of the specified machine and realize the alarm of relevant information.
Bosun
In this document, the installation of Bosun depends on Docker. If you do not want to use Docker, you can download the bosun binaries yourself from bosun.org, but you will need to install OPENTSDB and hbase yourself.
DockerInstalling Docker
If you do not have Docker installed on your system, you can refer to installing https://docs.docker.com/installation/here. Once installed, don't forget to start the Docker daemon.
Installing boson
Once the Docker installation is complete, you can install bosun using the following command, which may require sudo permissions.
docker run -d -p 8070:8070 stackexchange/bosun
This command tells the Docker process to start a background process for bosun, and the port is 8070. Wait 15 seconds, bosun service started, so Bosun server side is built and started, we can use the browser to access, http://yourip:8070
Push data to Bosun
Even if there is no Slave,buson server itself will produce a lot of data. The following will also tell you how to start bosun slave.
Bosun Checking Data
Assuming that bosun slave has been started, and that the server and slave have established a connection through port 8070, Bosun Server will receive a variety of slave information. We can view the slave nodes that are currently connected by Http://docker-server-ip:8070/items. If you see a bunch of parameters, congratulations, bosun is collecting data. The slave node that is currently producing data is shown at the bottom of the page or in the second column. Click on a slave, and then click "Available Metrics", you will see the current slave can be monitored data types, such as CPU, memory and so on.
New Alert
Now our server is already collecting all kinds of information, but the key role of monitoring system is that when the abnormal situation occurs, the system will give the alarm. The alarm is also bosun key support.
Bosun provides a workflow that makes it easier to design, test, and push an alarm. We see the navigation bar of the Bosun home page, which includes "Items", "Graph", "Expression", "Rule" and "Test Config", which is a step to create a new alert. In general, we need to first select an item (metric), which is the basis of alert. We then look at the curve drawn by this item to understand its dynamics. After that, we transform the curve into an expression, and again the expression is organized into rules. After that, we can test this rule, and then push it to Bosun server without error.
Here is a new alert example, we monitor CPU usage, if the CPU idle too low, the alarm. The metric we use here is "OS.CPU". When a machine's CPU idle lasts for an hour, we send an alarm. Open the Bosun home page and start the configuration below.
Items
Click on the "Items" tab. You will see all the items that Bosun is currently monitoring. Click "Os.cpu" and the page jumps to the "Graph" page.
Graph
On the graph page, we can see that Bosun has preloaded all the slave charts. We want to see a single slave information, in the Host input box, enter our machine name, click the Blue "Query" button, Buson will specify the machine's last one hours of CPU usage.
Now, we see the curve of CPU utilization. At the bottom of the page, there is a "Queries" area. It represents the expression that is used to generate the current curve.
In the queries area, there are also "expression" and "rule" hyperlinks, which link directly to the expressions and Rules pages of this page curve. In this tutorial, we click on the "Expression" link.
Expression
On the expression page, we can adjust the size of the result set through our query criteria. The query expression on this page should look like "Q (" sum ": Rate ...)" By this statement, Bosun will query the CPU usage of the specified machine over the last one hours. We can see the result set of this statement by clicking on the "Show" button. Each result is a form of timestamp and value.
New alert, we don't care much about the size of the result set, we're more interested in the average of the result set. To get the average of the result set, you can use the AVG () method, as follows:
q("sum:rate{counter,,1}:os.cpu{host=your-system-here}", "1h", "") avg(q("sum:rate{counter,,1}:os.cpu{host=your-system-here}", "1h", ""))
We click on the blue "Test" button and we will see that the result becomes a single number, which is the average utilization of CPU during this period. Now that we have the CPU average, we can monitor the CPU idle too low based on the size of this value. Click the "Rule" button.
Rule
On the Rules page, there are two input boxes, alert boxes, and Template boxes. The alert box shows the alarm rules we created just now. The template box shows the action after triggering an alarm, such as sending an email. The current alert box, our rule, bosun will always be recognized as "critical", because Crit and warn is a Boolean type, we assign the average CPU to it, is always a non-0 value, not 0 value is true. We need to add some logical judgments, as follows:
alert cpu.is.too.high { template = test $metric = q("sum:rate{counter,,1}:os.cpu{host=your-system-here}", "1h", "") $avgcpu = avg($metric) crit = $avgcpu > 80 warn = $avgcpu > 60 }
If the machine's CPU utilization is higher than 80%, the critical alarm is triggered, and if it is above 60%, the warning alarm is triggered. So far, our alert is not very useful because it monitors only one machine, and we can monitor all machines by modifying the value of the host to monitor other machines, or by setting the value of the host to *. If you do not want to monitor a machine, you can also fill in the alert body in the exclusion statement, but in this tutorial is not explained in detail.
Click the "Test" button, below will be listed all the crit alarm, warn alarm and normal condition. Click on the "email" button, you will see the sending of alarm messages. The default alarm email template, not too intuitive, can be changed to the following:
template test { subject = {{.Last.Status}}: {{.Alert.Name}} on {{.Group.host}} body = `<p>Alert: {{.Alert.Name}} triggered on {{.Group.host}}
Continue Reading
Bosun Quick Start