Elastic Stack First-logstash

Last Update:2018-08-27 Source: Internet

Author: User

Tags ack logstash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Introduction of Logstash

Logstash is an open source data collection engine with real-time pipeline capabilities. Logstash can dynamically unify data from different data sources and standardize the data to the destination of your choice.

Second, Logstash processing process

logstash processing process can be broadly divided into 3 stages, Input---->filter---->output, described in Chinese,-----> analysis/parsing----> Data output The specific processing flow can be viewed, some of the following functions and some concepts, etc. we are talking about in detail later:

Here we explain the concept of Codec, Codec is the concept of the new introduction of Logstash starting from 1.3.0 (Codec from the Coder/Decoder two words initials). Prior to this, Logstash only supported plain text input and then processed it with a filter . But now, we can process different types of data during the input period, all because of the codec setting. So, here's a concept that needs to be corrected. Logstash is not just a stream ofinput | filter | outputdata, but ainput | decode | filter | encode | outputstream of data! codec is used to decode and encode events.

Logstash event also needs to be introduced, similar to the object of Java, in this process we can do some assignment and so on;

Iii. Introduction to the Logstash architecture

The above describes the Logstash data flow, and then we introduce the next Logstash architecture, of course, I introduce 6. The architecture of X, see

By knowing that we can have multiple input text, in addition by the queue distribution to different pipline, here the pipline Chinese meaning is the pipeline, in the program we can understand as a thread, can have multiple pipline, and each pipline is non-intrusive, In addition each batcher,filter and output composition, Batcher is bulk from the queue to obtain data, this value can be configured to set;

The Pipline configuration is as follows:

Pipeline.workers:8 pipeline number of threads, and filter_output number of processing threads, the default is the CPU's core number, the command line parameter is-W

pipeline.batch.size:125 Batcher The number of documents to be processed in a batch, the default 125 (the recommended time to ES output between 10-20MB, you can calculate the number of documents), can be adjusted according to the output, the more the assembly occupies more heap space, Can be adjusted by jvm.options. command line arguments-b

Pipeline.batch.delay:5: Batcher Wait time, unit is Ms. Command-Line arguments-u

Next we understand the next queue design ideas, using the queue after the first thing we should pay attention to is how to ensure that the data is normally consumed, and here qutput will send an ACK to queue to tell the queue these Logstash event has been processed, This ensures that a data is consumed properly, which is a common means of Message Queuing, and then we'll talk about the queue category:

1.In memory (in RAM)

In memory, fixed size, inability to handle process crash, machine downtime, etc., can result in data loss.

2.Persistent Queue in disk (persisted to disk)

Process crash can be handled to ensure that data is not lost. Ensure that the data is consumed at least once; acts as a buffer, instead of Kafka and other Message Queuing functions.

Let's take a look at how the persistent queue is guaranteed. Here we start from the data to the processing of the queue, the first queue to back up the data to disk, the queue returns the response to input, and the final data after output is returned ACK to the queue, When the queue receives a message, it begins to delete the data backed up in disk, which guarantees data persistence;

Performance for example, the basic performance is no different:

The queue is primarily configured as follows:

queue.type: persisted default is memory

queue.max_bytes: 4gb default is 1GB

Iv. Introduction of Logstash Configuration

The configuration file we will use is below the Config directory,logstash.yml和jvm.options,另外6.0以上的版本会有pipelines.yml,这个文件是为了在同一个进程中运行多个管道，具体可以参考下 Official, next we mainly introduce the configuration of the following parameters LOGSTASH.YML configuration:

Node.name: node name, default is host name;

Path.data: Persistent folder that stores data, by default, in the Logstash home directory

Path.config: Setting the directory for the pipeline configuration file

Path.log: Set Pipeline log directory

The third section describes the queue and pipeline can be set under the file, here do not do too much introduction, in addition to some of the details of the configuration of some parameters we can refer to the official;

Jvm.options this inside of the configuration we can do their own machine according to the situation of the JVM parameter tuning;

There are also pipeline configuration files, which define the data processing flow files, ending with a. conf

Five, pipeline configuration

Boolean Type boolean:isfailed = True

Numeric type Number:age = 33

String type String:name = "Hello"

Array Type: users = [{age=>11, Name=>wtz}, {age = =, name = = MYT}] or path = ["/var/log/error,lo G ","/var/log/warn.log "]

Hash Type: Match + = {"Field" = "value1" "field" = "value2"}

Remark: #

The properties (fields) of the Logstash event can be referenced in the configuration, mainly by the following two ways:

1. Direct reference to field values – directly using [], if multiple layers are directly nested

2. Refer to sprintf in string-using%{} to implement

The configuration file supports conditional judgment syntax:

The expression operation is as follows:

1. Compare: = = = < > <= >=

2. Regular match: =~ =

3. Include (string or array): In, not in

4. Boolean operator: And OR NAND XOR!

5. Grouping operator (when the condition is very complex, it is actually equivalent to parentheses): ()

Six, plug-in detailed

The second part of the time we introduced the Logstash data processing process, involving a number of plug-ins process, next we introduce the configuration of the plug-in, is divided into 3 part of the introduction:

Input Plugin Description:

1.stdin

Simplest input, read data from standard input

2.file

To read data from a file, the parameters are described below:

3.kafka

This is the configuration we use for your reference, in addition, you can refer to the official documentation

Input {    Kafka { = = "Service address " + "consumer group ID " = "List of subscribed topics", "List of topics subscribed" ] = "json" = "Kafka offset, exceeded set to the earliest offset" + "Number of Threads"} }

View Code

In addition, you can also look at the official documents to choose their own appropriate use;

Filter Plugin Introduction

1.grok

Parsing and constructing arbitrary text,Grok is currently the best way to parse unstructured log data into structured and queryable data in Logstash , using the built-in 120 modes;

You can also read this article, do you really understand grok?

2.mutate

Perform general conversions on event fields, and you can rename, delete, replace, and modify the fields in the event.

3.drop

A complete delete event, such as a debug event.

4.clone

Replication events, which may add or remove fields.

5.geoip

Add information about the location of the IP address

More people can read the official documents

Output Plugin Introduction

The most common is the output to ES, configured as follows

Elasticsearch { = = "Address " = "index " = "Document type " = > whether to override}

View Code

Codec Plugin Introduction

1.plain reading the original content

2.rubydebug Logstash event is output in ruby format for easy commissioning

3.line processing of content with line break

4.json processing JSON-formatted content

5.multiline processing of the contents of multi-line data

Seven, finally say something

References and learning sources for MU-net and official documents, welcome to join my QQ group: 438836709

You are welcome to pay attention to my public number:

Another Oath2.0 article and demo are working hard to write!! All right everyone move small hands point a praise!! Thank you!!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More