ELK has become the most popular centralized log solution, it is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the real-time log collection, storage, display and other one-stop solution. This article is mainly about the distributed real-time log analysis Solution ELK deployment architecture, the need for friends can see
I. Overview
ELK has become the most popular centralized log solution, it is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to jointly complete the real-time log collection, storage, display and other one-stop solution. This article will cover the common architecture of elk and related problem solving.
Filebeat:filebeat is a lightweight, resource-intensive data-collection engine that is a new member of the Elk family, replacing Logstash as the log-collection engine on the application server, supporting the output of collected data to queues such as Kafka,redis.
Logstash: Data collection engine compared to filebeat, but it integrates a lot of plug-ins, supports rich data source collection, can filter, analyze, format log format for collected data.
Elasticsearch: Distributed data search engine, based on Apache
Lucene implementation, clustering, provides centralized data storage, analysis, and powerful data search and aggregation capabilities.
Kibana: The data visualization platform, through which the relevant data in the Elasticsearch can be viewed in real-time, and provides a rich graph statistics function.
Second, Elk Common deployment architecture
2.1. Logstash as Log Collector
This architecture is a relatively primitive deployment architecture, deploying a Logstash component on each application server as a log collector, then filtering, parsing, and formatting the data collected by Logstash to Elasticsearch storage. Finally, using Kibana for visual display, this architecture is not enough: Logstash server resources, so it will increase the load pressure on the application server side.
2.2. Filebeat as Log Collector
The architecture differs from the first architecture in that the application-side log collector is replaced with filebeat,filebeat lightweight and consumes less server resources, so using Filebeat as the application server-side log collector, the general filebeat is used with Logstash. This type of deployment is also the most commonly used architecture today.
2.3. Introduction of Cache Queue Deployment architecture
Based on the second architecture, the architecture introduces Kafka Message Queuing (which can also be other message queues), sends the data collected by Filebeat to Kafka, and then reads the data in Kafka through Logstasth, mainly to solve the log collection scheme under large data volume. The use of cache queues is primarily to address data security and equalization Logstash and elasticsearch load pressures.
2.4, the above three kinds of structure summary
The first deployment architecture is now rarely used due to resource usage, the second deployment architecture is currently used most, and for the third deployment architecture the individual does not feel the need to introduce Message Queuing unless there are other requirements because, in the case of large data volumes, Filebeat uses a pressure-sensitive protocol to Logstash or Elasticsearch send data. If Logstash is busy processing data, it tells Filebeat to slow down the read speed. After the congestion is resolved, the filebeat resumes its initial speed and continues to send data.
Recommend an Exchange Learning Group: 478030634 It will share some of the senior architect recorded video recordings: Spring,mybatis,netty source analysis, high concurrency, performance, distributed, microservices architecture Principles, JVM Performance optimization These become architects must have the knowledge system. You can also receive free learning resources, and now benefit from:
Iii. Problems and Solutions
Question: How do I implement the multi-line merge feature of a log?
The logs in the system application are usually printed in a specific format, and the data belonging to the same log may be printed in multiple lines, so when you use elk to collect the logs, you need to merge multiple rows of data that belong to the same log.
Solution: Use the multiline multi-line merge plug-in in Filebeat or Logstash to implement
When using the multiline multi-line merge plug-in, it is important to note that different Elk deployment architectures may multiline used differently, and if this is the first deployment architecture in this article, then multiline needs to be configured for use in Logstash, and if it is the second deployment architecture, Then the multiline needs to be configured in Filebeat, eliminating the need to configure multiline in Logstash.
1, multiline in the Filebeat configuration mode:
Pattern: Regular Expression
Negate: The default is false, lines that match pattern are merged to the previous line; True indicates that rows that do not match the pattern are merged into the previous row
Match:after to merge to the end of the previous line, before represents the beginning of the merge to the previous line
Such as:
Pattern: ' ['
Negate:true
Match:after
This configuration represents merging lines that do not match pattern patterns to the end of the previous line
2, multiline in the Logstash configuration mode
(1) The What property value configured in Logstash is previous, which is equivalent to the "what" value configured in After,logstash in Filebeat is next, which is equivalent to filebeat in before.
(2) pattern "%{loglevel}s*" in the LOGLEVEL is Logstash prefabricated regular matching mode, prefabricated there are a lot of commonly used regular matching mode, see in detail: https://github.com/logstash-p ...
Question: How do I replace the time field that displays the log in Kibana with the time in the log information?
By default, the Time field we view in Kibana is inconsistent with the time in the log information because the default Time field value is the current time at log collection, so you need to replace the time in the field with the time in the log information.
Solution: Use the Grok word breaker with the date time format plug-in to implement
Configure the Grok word breaker with the date time formatting plug-in in the filter for the Logstash configuration file, such as:
To match the log format: "debug[defaultbeandefinitiondocumentreader:106] Loading Bean Definitions", the way to resolve the time field of the log is:
① by introducing a well-written expression file, such as an expression file, Customer_patterns, the content is:
Customer_time%{year}%{monthnum}%{monthday}s+%{time}
Note: The content format is: [custom expression name] [regular expression]
You can then refer to this in Logstash:
② as a configuration item, the rule is: (?< custom expression name > regular match rule), such as:
Question: How to view data in Kibana by selecting a different system log module
In general, the log data displayed in Kibana is mixed with data from different system modules, so how can I select or filter only the log data for the specified system modules?
Solution: Add a field to identify different system modules or build ES index from different system modules
1. Add a field to identify different system modules, and then in Kibana, you can filter the data of different modules according to the field.
This is explained in the second deployment architecture, which is configured in Filebeat:
To identify different system module logs by adding: Log_from field
2, according to different system modules to configure the corresponding ES index, and then create the corresponding index pattern matching in Kibana, you can select different system module data in the page through the index Mode drop-down box.
This is explained in the second deployment architecture, which is divided into two steps:
The configuration content of ① in Filebeat is:
Identify different system modules with Document_type
② Modify the configuration of output in Logstash as:
Increase the Index property in output,%{type} to create an ES index by different document_type values
Iv. Summary
This paper mainly introduces the three kinds of deployment architectures of Elk real-time log analysis, and the problems that can be solved by different architectures, the second deployment mode of these three architectures is the most popular and most common deployment method, finally introduces some problems and solutions of elk in log analysis, and finally, Elk can be used not only as a centralized query and management for distributed log data, but also as a scenario for project application and server resource monitoring.