"Xiaomi Open Source monitoring System" Open-falcon

Source: Internet
Author: User
Tags virtual environment

1) Advantages

• Powerful and flexible data acquisition: Auto discovery, support for falcon-agent, SNMP, support user proactive push, user-defined plug-in support, OPENTSDB data model like (timestamp, endpoint, metric, Key-value tags) • Horizontal scalability: Support for data acquisition, alarm determination, historical data storage and querying on every period of time • Efficient alerting policy management: Efficient portal, support policy templates, template inheritance and overlay, multiple alarm modes, support for callback calls • User-friendly Alarm settings: Maximum number of alarms, alarm levels, alarm recovery notifications, alarm pauses, different thresholds for different periods, support maintenance cycles, and high efficiency graph components: Single stand support 2 million metric escalation, archiving, storage (1 minutes) · Efficient historical Data Query component: Using RRDtool data archiving strategy, second level returns hundreds of metric a year of historical data Dashboard: Multi-dimensional data display, user-defined screen high availability: The whole system without core single point, Isianvi, easy to deploy, Horizontal expansion and development language: the backend of the entire system, all golang written, and the portal and the dashboard are written using Python

2) Architecture diagram

2.1) Website Frame composition

2.2) User-drawn architecture diagram

2.3) Basic Components


Component Name (drawing component)
function remarks
agent

1.agent Capture Machine monitoring indicator, push to transfer

2 every 60 seconds. Agent and Tranfer establish long connection, transfer data faster

3. The agent provides an HTTP interface/v1/push receive user manual push data and then forwards to Tranfer


graph

1.graph component is a component that stores drawing data, historical data,

2. Tranfer will send the received data to GRANPH

  1. Multiple instances can be deployed for cluster

  2. need to connect to database graph

Transfer 1. Receive data from the agent, forward the data to the backend graph and judge
Query 1. Querying each graph data, provides a unified HTTP query interface: After the query component receives the user's request, it will query the data from multiple graph on the backend, and after aggregation, return to the user
task
  1. index update; include "Full Update" and "Junk Index Cleanup" for icon index

  2. falcon the service component's own State data collection, mainly collecting tranfer, graph, Internal state of Task three services

  3. < Span style= "font-family: Blackbody, Simhei; font-size:14px; " >1. Need to connect to graph library

dashboard front-end interface
  1. need python virtual environment

  2. you need to connect to the database dashboard

Component Name (alarm component)
Function Note
Sender 1. Alarm send module, control concurrency, provide buffer queue for sending
uic (Fe) 1. User group management, single sign-on 1. Need to connect to database UIC
portal 1. Configuring the alert policy, managing the web side of the machine grouping
  1. need to connect to database falcon-portal

  2. Span style= "font-family: Blackbody, Simhei; font-size:14px; " > need python virtual environment

hbs 1. Heartbeat server: HBS has 2 addresses, 1 HTTP addresses, agent and HBS RPC address traffic 1. Need to connect Falcon-portal library
judge 1. Alarm judgment module, Judge relies on HBS, so we have to first build HBS < Span style= "font-family: Blackbody, Simhei; font-size:14px; " >1. Deployable multi-instance
links 1.links is a component written for the alarm merge feature. If you do not want to use the alarm merge feature, this component is not required to install the 1. Need to connect falcon_links
alarm 1. Alarm Event Processor 1.alarm module is to handle alarm event, Judge generated alarm event write Redis,alarm from Redis read, this module is the business mess, each company can according to their own company's needs to rewrite
nodata 1.nodata is used to detect escalation exceptions for monitoring data. NoData and real-time alarm judge module work together, the process is: configuration of NoData Data acquisition timeout, NoData generated a default simulation data, the user configured the corresponding alarm policy, received mock data generated alarm. Acquisition item escalation anomaly detection, as a necessary complement of Judge module, can make judge real-time alarm function more reliable, perfect
Aggregator 1. Cluster aggregation module. Aggregates the values of an indicator for all machines under a cluster, providing a monitoring experience from a cluster perspective

3) Installation

* Environment Configuration * * * installation of redis** source code and RPM package can be, depending on the situation * * installation mysql**1. Source and RPM packages are available, subject to availability 2. Initialize the MySQL table structure *mysql -uroot -proot  < 1_uic-db-schema.sql/2_portal-db-schema.sql/3_dashboard-db-schema.sql/4_graph-db-schema.sql/5_ alarms-db-schema.sql/links-db-schema.sql********************************************************************** Back-end configuration * * Download the official source package **1. Download open-falcon-v0.2.1.tar.gz, open-falcon-v0.2.0.tar.gz2. Unzip to/home/work/open-falcon3. Modify the configuration file, typically you need to specify "Database Password" 4. Start Open-falcon:./open-falcon start (start all services)   ./open-falcon check (check service startup status) ****************** *************************************************************************************************************** Front-END configuration * * * Download the source package **1.git clone 2. Install dependency Packages: Python-devel, Openldap-devel, Mysql-devel, Virtualenv ( Download the tar package from Python website), yum groupinstall  "Development tools" 3. Installing the front-end flask module: cat pip_requirements.txt  flask==0.10.1 flask-babel==0.9&nbsP jinja2==2.7.2 werkzeug==0.9.4 gunicorn==19.1.1 python-dateutil==2.2 requests==2.3.0  MYSQL-PYTHON&NBSP;PYTHON-LDAP4. Modify the rrd/config.py profile  ##  according to the actual situation, modify portal_db_*,  Default user name is root, the default password is "". and database host  ##  according to the actual situation, modify alarm_db_*,  Default user name is root, the default password is "", and the database host  ##  api_addr =  os.environ.get ("Api_addr", "Http://10.59.2.133:8080/api/v1"), where you need to change to Open-falcon server IP5. Start dashboard ./ Control start

4) Monitoring Client host

1. Unzip open-falcon-v0.2.0.tar.gz2. Modify Agent configuration File "Heartbeat": {"Enabled": True, "addr": "10.59.2.133:6030", #服务端ip "I    Nterval ":", "timeout": +, "transfer": {"Enabled": True, "Addrs": ["10.59.2.133:8433" #服务端ip], "Interval": "Timeout": 1000 3. Start the agent service./open-falcon Start Agent * View log: Agent.log is normal *


"Xiaomi Open Source monitoring System" Open-falcon

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.