I. Introduction of Logstash
Logstash is an open source data collection engine with real-time pipeline capabilities. Logstash can dynamically unify data from different data sources and standardize the data to the destination of your choice.
Second, Logstash processing process
logstash processing process can be broadly divided into 3 stages, Input---->filter---->output, described in Chinese,-----> analysis/parsing----> Data output The specific processing flow can be viewed, some of the following functions and some concepts, etc. we are talking about in detail later:
Here we explain the concept of Codec, Codec is the concept of the new introduction of Logstash starting from 1.3.0 (Codec from the Coder/Decoder two words initials). Prior to this, Logstash only supported plain text input and then processed it with a filter . But now, we can process different types of data during the input period, all because of the codec setting. So, here's a concept that needs to be corrected. Logstash is not just a stream ofinput | filter | outputdata, but ainput | decode | filter | encode | outputstream of data! codec is used to decode and encode events.
Logstash event also needs to be introduced, similar to the object of Java, in this process we can do some assignment and so on;
Iii. Introduction to the Logstash architecture
The above describes the Logstash data flow, and then we introduce the next Logstash architecture, of course, I introduce 6. The architecture of X, see
By knowing that we can have multiple input text, in addition by the queue distribution to different pipline, here the pipline Chinese meaning is the pipeline, in the program we can understand as a thread, can have multiple pipline, and each pipline is non-intrusive, In addition each batcher,filter and output composition, Batcher is bulk from the queue to obtain data, this value can be configured to set;
The Pipline configuration is as follows:
Pipeline.workers:8 pipeline number of threads, and filter_output number of processing threads, the default is the CPU's core number, the command line parameter is-W
pipeline.batch.size:125 Batcher The number of documents to be processed in a batch, the default 125 (the recommended time to ES output between 10-20MB, you can calculate the number of documents), can be adjusted according to the output, the more the assembly occupies more heap space, Can be adjusted by jvm.options. command line arguments-b
Pipeline.batch.delay:5: Batcher Wait time, unit is Ms. Command-Line arguments-u
Next we understand the next queue design ideas, using the queue after the first thing we should pay attention to is how to ensure that the data is normally consumed, and here qutput will send an ACK to queue to tell the queue these Logstash event has been processed, This ensures that a data is consumed properly, which is a common means of Message Queuing, and then we'll talk about the queue category:
1.In memory (in RAM)
In memory, fixed size, inability to handle process crash, machine downtime, etc., can result in data loss.
2.Persistent Queue in disk (persisted to disk)
Process crash can be handled to ensure that data is not lost. Ensure that the data is consumed at least once; acts as a buffer, instead of Kafka and other Message Queuing functions.
Let's take a look at how the persistent queue is guaranteed. Here we start from the data to the processing of the queue, the first queue to back up the data to disk, the queue returns the response to input, and the final data after output is returned ACK to the queue, When the queue receives a message, it begins to delete the data backed up in disk, which guarantees data persistence;
Performance for example, the basic performance is no different:
The queue is primarily configured as follows:
queue.type: persisted default is memory
queue.max_bytes: 4gb default is 1GB
Iv. Introduction of Logstash Configuration
The configuration file we will use is below the Config directory,logstash.yml和jvm.options,另外6.0以上的版本会有pipelines.yml,这个文件是为了在同一个进程中运行多个管道,具体可以参考下 Official, next we mainly introduce the configuration of the following parameters LOGSTASH.YML configuration:
Node.name: node name, default is host name;
Path.data: Persistent folder that stores data, by default, in the Logstash home directory
Path.config: Setting the directory for the pipeline configuration file
Path.log: Set Pipeline log directory
The third section describes the queue and pipeline can be set under the file, here do not do too much introduction, in addition to some of the details of the configuration of some parameters we can refer to the official;
Jvm.options this inside of the configuration we can do their own machine according to the situation of the JVM parameter tuning;
There are also pipeline configuration files, which define the data processing flow files, ending with a. conf
Five, pipeline configuration
Boolean Type boolean:isfailed = True
Numeric type Number:age = 33
String type String:name = "Hello"
Array Type: users = [{age=>11, Name=>wtz}, {age = =, name = = MYT}] or path = ["/var/log/error,lo G ","/var/log/warn.log "]
Hash Type: Match + = {"Field" = "value1" "field" = "value2"}
Remark: #
The properties (fields) of the Logstash event can be referenced in the configuration, mainly by the following two ways:
1. Direct reference to field values – directly using [], if multiple layers are directly nested
2. Refer to sprintf in string-using%{} to implement
The configuration file supports conditional judgment syntax:
The expression operation is as follows:
1. Compare: = = = < > <= >=
2. Regular match: =~ =
3. Include (string or array): In, not in
4. Boolean operator: And OR NAND XOR!
5. Grouping operator (when the condition is very complex, it is actually equivalent to parentheses): ()
Six, plug-in detailed
The second part of the time we introduced the Logstash data processing process, involving a number of plug-ins process, next we introduce the configuration of the plug-in, is divided into 3 part of the introduction:
Input Plugin Description:
1.stdin
Simplest input, read data from standard input
2.file
To read data from a file, the parameters are described below:
3.kafka
This is the configuration we use for your reference, in addition, you can refer to the official documentation
Input { Kafka { = = "Service address " + "consumer group ID " = "List of subscribed topics", "List of topics subscribed" ] = "json" = "Kafka offset, exceeded set to the earliest offset" + "Number of Threads"} }
View Code
In addition, you can also look at the official documents to choose their own appropriate use;
Filter Plugin Introduction
1.grok
Parsing and constructing arbitrary text,Grok is currently the best way to parse unstructured log data into structured and queryable data in Logstash , using the built-in 120 modes;
You can also read this article, do you really understand grok?
2.mutate
Perform general conversions on event fields, and you can rename, delete, replace, and modify the fields in the event.
3.drop
A complete delete event, such as a debug event.
4.clone
Replication events, which may add or remove fields.
5.geoip
Add information about the location of the IP address
More people can read the official documents
Output Plugin Introduction
The most common is the output to ES, configured as follows
Elasticsearch { = = "Address " = "index " = "Document type " = > whether to override}
View Code
Codec Plugin Introduction
1.plain reading the original content
2.rubydebug Logstash event is output in ruby format for easy commissioning
3.line processing of content with line break
4.json processing JSON-formatted content
5.multiline processing of the contents of multi-line data
Seven, finally say something
References and learning sources for MU-net and official documents, welcome to join my QQ group: 438836709
You are welcome to pay attention to my public number:
Another Oath2.0 article and demo are working hard to write!! All right everyone move small hands point a praise!! Thank you!!