Objective
This article may not detail every step of the implementation process, but to a certain extent can lead the small partners to a more open vision, in tandem with each link, showing you a different effect.
Business Scale
- 8 Platforms
- 100+ Platform Server
- More than one cluster grouping
- Micro-Service 600+
- User n+
Facing problems
With the development of distributed micro-service container technology, traditional monitoring system faces many problems:
- How containers are monitored
- How micro-services are monitored
- How cluster performance is analyzed and calculated
- How to manage a large number of agent-side configuration scripts
These are the traditional monitoring to face the thorny problem, then how to solve the current problems encountered, GPE turned out, will focus on analysis.
System Monitoring
- Target group: System log, Server, container, system software running indicator
- Log schema: ELK (Elasticsearch+logstash+kibana+redis)
- Monitoring architecture: GPE (Grafana+prometheus+exporter+consul)
- Alarm mode: Mail, SMS, nail and custom webhook, monitoring center 7x24 hours
Elk Log
With the prevalence of distributed services, functional modules of the split refinement, regardless of the development or operation, the importance of the log is self-evident, but how to store analytics to view the log, 100 companies may have 200 practices. Some rarely record logs, some log levels are not divided, and some write to the text and then no matter whether or not to ask, and some to the MySQL database threw no below, wait until the user complained or found the problem, will turn over.
So how to properly and gracefully record the log? I believe that we are not unfamiliar with elk, may be a lot of small partners have contacted, for small and medium-sized Internet startups, the use of Elk build Log Analysis system is indeed a good choice.
Architecture diagram
Core components
Elk by Elasticsearch, Logstash and Kibana Three Musketeers, of course, the above is the most basic components, in order to make the architecture process more plump, we joined the Redis buffer queue, configured SendMail to do abnormal log alarm.
ElasticSearch
Elasticsearch is a Lucene-based search server. It provides a distributed multi-user-capable full-text search engine, based on a restful web interface. It features: distributed, 0 configuration, Auto discovery, Index auto-shard, index copy mechanism, RESTful style interface, etc.
Logstash
Logstash Data Analysis tool, which can collect, analyze and store the log generated by the system. In 2013, Logstash was acquired by Elasticsearch Corporation, and ELK Stack became official language.
Kibana
Kibana is an open source analytics and visualization platform for searching and viewing data stored in the Elasticsearch index.
Work flow
- Logstash (Shipper) real-time monitoring and filtering of log information collected for each service
- Logstash (Shipper) sends the collected logs (INFO, DEBUG, rror, warn, etc.) to Redis
- Logstash (indexer) reads log information from Redis and sends it to Elasticsearch, respectively, by log classification
- Logstash (indexer) filter out rror logs by mail or other Webhook way alarm development operations personnel
- Kibana reading Elasticsearch data with custom search for page presentation
GPE Monitoring
Elk the main collection and analysis of the early warning is the business log of each service in our platform system, generally through the log components (log4j, LOG4J2, Logback) to collect and write text. But for the system itself and some application software monitoring and warning, this package is obviously not suitable, here is recommended GPE Three Musketeers, of course, GPE is my own out of the mix.
Architecture diagram
Core components
Grafana, Prometheus, Exporter (a series of plugins), Custom Three Musketeers, of course, in order to make the Integrated Monitoring program more fluent and complete, we joined the Registry consul to do service discovery, to implement dynamic add services, using mail, Nail and webhook for abnormal alarms.
GPE component is only one of the implementation, Grafana with Influxdata provide Telegraf can also collect a lot of metrics, to achieve a richer large screen monitoring and warning.
Grafana
Grafana is an out-of-the-box visualization tool with a full-featured metric panel and graphic editor with flexible and rich graphical options that can be mixed in multiple styles to support multiple data source features.
Prometheus
Prometheus is an open-source service monitoring system that collects data from remote machines and stores them on a local time series database via the HTTP protocol.
- Multidimensional data Model (timing column data consists of a metric name and a set of Key/value)
- Flexible query Language on multiple dimensions (PROMQL)
- No reliance on distributed storage, single master node work.
- Acquisition of time series data via the pull mode based on HTTP
- Sequential column data push via push Gateway (pushing)
- The target server to capture can be obtained through service discovery or static configuration
- Multiple visualization charts and dashboards support
As shown in the architecture diagram, Prometheus collects monitoring data by exporter installed on a remote machine.
Consul
Consul has multiple components, but overall, it's a tool for discovering and configuring services in your infrastructure. It provides several key features such as:
- Service Discovery : Some clients of Consul can provide a service, such as API or MySQL, and other clients can use Consul to discover the provider of the service. With DNS or HTTP, applications can easily find the services they depend on.
- Health Check : The consul client can provide some health checks that can be associated to a specified service (whether the service returns a $ OK) or to a local node (memory usage is below 90%). This information can be used by an operator to monitor the health status of the cluster and be used by the service discovery component to route away from unhealthy hosts.
- Key-Value storage : Applications can use the hierarchical key-value store provided by Consul for a number of purposes, including dynamic configuration, feature tagging, collaboration, leader elections, and more. This component can be easily used by a simple HTTP API.
- Multi- Datacenter: Consul has very good support for multiple data centers, which means that consul users don't have to worry about multiple regions resulting from creating more layers of abstraction.
Consul is designed to be friendly to the DevOps community and application developers, and he is well suited for modern, scalable infrastructure.
Work flow
- Exporter component registration to the Consul Registration center
- Prometheus pull the servers of Consul registration center
- Exporter components get metrics of server or system software
- Grafana Configure Prometheus Data source to obtain its acquisition data combined with a custom panel for monitoring large screen
- Grafana Monitoring alerts by setting up alerting
Summary
As mentioned at the beginning of the article, this article does not have a step-by-step detailed documentation of the installation of the use of tutorials, these tutorials are available online, even if there is a pit, I believe as a programmer you can also solve. No, here is just a tip, I hope you can learn more small partners.
Remember many years ago in the spring, when the site is still a static page, no pictures and no brilliant effect, no 24-hour service, but the programmer is so happy, although only the Three Musketeers Web pages, in the Internet, at the fingertips, in the BBS, the sway of their youth, if one day I am old without a dependent, Please leave me in the wave of the Internet.
Nowadays, with the prevalence of cloud computing, distributed, microservices, programmers are you tired with their own curd, whether it is already disdain with the product Wang, come, return to the top of the small partners to see again, who said the programmer all the time to knock code, it is time to find their own another sky.
Disclaimer: Some of the text is introduced from the network.
Using elk to quickly realize the visualization of website traffic monitoring