1) Advantages
• Powerful and flexible data acquisition: Auto discovery, support for falcon-agent, SNMP, support user proactive push, user-defined plug-in support, OPENTSDB data model like (timestamp, endpoint, metric, Key-value tags) • Horizontal scalability: Support for data acquisition, alarm determination, historical data storage and querying on every period of time • Efficient alerting policy management: Efficient portal, support policy templates, template inheritance and overlay, multiple alarm modes, support for callback calls • User-friendly Alarm settings: Maximum number of alarms, alarm levels, alarm recovery notifications, alarm pauses, different thresholds for different periods, support maintenance cycles, and high efficiency graph components: Single stand support 2 million metric escalation, archiving, storage (1 minutes) · Efficient historical Data Query component: Using RRDtool data archiving strategy, second level returns hundreds of metric a year of historical data Dashboard: Multi-dimensional data display, user-defined screen high availability: The whole system without core single point, Isianvi, easy to deploy, Horizontal expansion and development language: the backend of the entire system, all golang written, and the portal and the dashboard are written using Python
2) Architecture diagram
2.1) Website Frame composition
2.2) User-drawn architecture diagram
2.3) Basic Components
Component Name (drawing component) |
function |
remarks |
agent |
1.agent Capture Machine monitoring indicator, push to transfer 2 every 60 seconds. Agent and Tranfer establish long connection, transfer data faster 3. The agent provides an HTTP interface/v1/push receive user manual push data and then forwards to Tranfer |
|
graph |
1.graph component is a component that stores drawing data, historical data, 2. Tranfer will send the received data to GRANPH |
-
Multiple instances can be deployed for cluster
-
need to connect to database graph
|
Transfer |
1. Receive data from the agent, forward the data to the backend graph and judge |
|
Query |
1. Querying each graph data, provides a unified HTTP query interface: After the query component receives the user's request, it will query the data from multiple graph on the backend, and after aggregation, return to the user |
|
task |
-
index update; include "Full Update" and "Junk Index Cleanup" for icon index
-
falcon the service component's own State data collection, mainly collecting tranfer, graph, Internal state of Task three services
-
< Span style= "font-family: Blackbody, Simhei; font-size:14px; " >1. Need to connect to graph library
|
dashboard |
front-end interface |
-
need python virtual environment
-
you need to connect to the database dashboard
|
Component Name (alarm component)
|
Function |
Note |
Sender |
1. Alarm send module, control concurrency, provide buffer queue for sending |
|
uic (Fe) |
1. User group management, single sign-on |
1. Need to connect to database UIC |
portal |
1. Configuring the alert policy, managing the web side of the machine grouping |
-
need to connect to database falcon-portal
-
Span style= "font-family: Blackbody, Simhei; font-size:14px; " > need python virtual environment
|
hbs |
1. Heartbeat server: HBS has 2 addresses, 1 HTTP addresses, agent and HBS RPC address traffic |
1. Need to connect Falcon-portal library |
judge |
1. Alarm judgment module, Judge relies on HBS, so we have to first build HBS |
< Span style= "font-family: Blackbody, Simhei; font-size:14px; " >1. Deployable multi-instance |
links |
1.links is a component written for the alarm merge feature. If you do not want to use the alarm merge feature, this component is not required to install the |
1. Need to connect falcon_links |
alarm |
1. Alarm Event Processor |
1.alarm module is to handle alarm event, Judge generated alarm event write Redis,alarm from Redis read, this module is the business mess, each company can according to their own company's needs to rewrite |
nodata |
1.nodata is used to detect escalation exceptions for monitoring data. NoData and real-time alarm judge module work together, the process is: configuration of NoData Data acquisition timeout, NoData generated a default simulation data, the user configured the corresponding alarm policy, received mock data generated alarm. Acquisition item escalation anomaly detection, as a necessary complement of Judge module, can make judge real-time alarm function more reliable, perfect |
|
Aggregator |
1. Cluster aggregation module. Aggregates the values of an indicator for all machines under a cluster, providing a monitoring experience from a cluster perspective |
|
3) Installation
* Environment Configuration * * * installation of redis** source code and RPM package can be, depending on the situation * * installation mysql**1. Source and RPM packages are available, subject to availability 2. Initialize the MySQL table structure *mysql -uroot -proot < 1_uic-db-schema.sql/2_portal-db-schema.sql/3_dashboard-db-schema.sql/4_graph-db-schema.sql/5_ alarms-db-schema.sql/links-db-schema.sql********************************************************************** Back-end configuration * * Download the official source package **1. Download open-falcon-v0.2.1.tar.gz, open-falcon-v0.2.0.tar.gz2. Unzip to/home/work/open-falcon3. Modify the configuration file, typically you need to specify "Database Password" 4. Start Open-falcon:./open-falcon start (start all services) ./open-falcon check (check service startup status) ****************** *************************************************************************************************************** Front-END configuration * * * Download the source package **1.git clone 2. Install dependency Packages: Python-devel, Openldap-devel, Mysql-devel, Virtualenv ( Download the tar package from Python website), yum groupinstall "Development tools" 3. Installing the front-end flask module: cat pip_requirements.txt flask==0.10.1 flask-babel==0.9&nbsP jinja2==2.7.2 werkzeug==0.9.4 gunicorn==19.1.1 python-dateutil==2.2 requests==2.3.0 MYSQL-PYTHON&NBSP;PYTHON-LDAP4. Modify the rrd/config.py profile ## according to the actual situation, modify portal_db_*, Default user name is root, the default password is "". and database host ## according to the actual situation, modify alarm_db_*, Default user name is root, the default password is "", and the database host ## api_addr = os.environ.get ("Api_addr", "Http://10.59.2.133:8080/api/v1"), where you need to change to Open-falcon server IP5. Start dashboard ./ Control start
4) Monitoring Client host
1. Unzip open-falcon-v0.2.0.tar.gz2. Modify Agent configuration File "Heartbeat": {"Enabled": True, "addr": "10.59.2.133:6030", #服务端ip "I Nterval ":", "timeout": +, "transfer": {"Enabled": True, "Addrs": ["10.59.2.133:8433" #服务端ip], "Interval": "Timeout": 1000 3. Start the agent service./open-falcon Start Agent * View log: Agent.log is normal *
"Xiaomi Open Source monitoring System" Open-falcon