Logstash Quick Start, logstash
Original article address: Workshop
Introduction Logstash is a tool for receiving, processing, and forwarding logs. Supports system logs, webserver logs, error logs, and application logs. In short, it includes all types of logs that can be flushed. How does it sound amazing?
In a typical use case (ELK): Elasticsearch is used as the storage of background data, and kibana is used for front-end report presentation. Logstash acts as a porter in the process. It creates a powerful pipeline chain for data storage, REPORT query, and log parsing. Logstash provides a variety of input, filters, codecs and output components, allowing users to easily implement powerful functions. Okay, let's get started.
Dependency condition: javalogash only depends on the java Runtime Environment (jre ). You can run the java-version command on the command line to display the following results:
java -versionjava version "1.7.0_45"Java(TM) SE Runtime Environment (build 1.7.0_45-b18)Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
We recommend that you use the recent jre version to ensure that Logstash runs successfully. You can get the Open Source version of jre in: http://openjdk.java.net or you can download the Oracle jdk version on the official website: http://www.oracle.com/technetwork/java/index.html一 jrehas been successfully installed in your system, we can continue
The first step is to download the Logstash
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-1.4.2.tar.gz
Now you should have a file named logstash-1.4.2.tar.gz. Decompress it.
tar zxvf logstash-1.4.2.tar.gzcd logstash-1.4.2
Now let's run:
bin/logstash -e 'input { stdin { } } output { stdout {} }'
Now we can enter some characters in the command line, and then we will see the output of logstash:
hello world2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
OK, it's quite interesting... in the preceding example, we define an input named "stdin" and an output named "stdout" in the running logstash. No matter what characters we enter, logstash returns the input characters in some format. Note that we have used
-EParameter, which allows Logstash to be set directly through the command line. This is especially quick to help us repeatedly test whether the configuration is correct without writing the configuration file.
Let's try a more interesting example. First, run the CTRL-C command on the command line to exit the previously running Logstash. Run Logstash again and run the following command:
bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
Let's enter some more characters. This time we enter "goodnight moon ":
goodnight moon{ "message" => "goodnight moon", "@timestamp" => "2013-11-20T23:48:05.335Z", "@version" => "1", "host" => "my-laptop"}
In the preceding example, the output parameter "stdout" is reset (the "codec" parameter is added) to change the output performance of Logstash. Similarly, you can add or modify inputs, outputs, and filters in your configuration file to make it possible to format log data at will, therefore, you can customize a more reasonable storage format to facilitate queries.
Now you can use Elasticsearch to store logs, you may say: "It looks pretty big, but you can enter the characters manually and display the characters in the console. The actual situation is not practical ". After that, we will create Elasticsearch to store the log data input to Logstash. If you have not installed Elasticsearch, you can download the RPM/DEB package or manually download the tar package by running the following command:
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.tar.gztar zxvf elasticsearch-1.1.1.tar.gzcd elasticsearch-1.1.1/./bin/elasticsearch
Note:This article uses Logstash 1.4.2 and Elasticsearch 1.1.1. Elasticsearch is recommended for different Logstash versions. Check the Logstash version you are using!
For more information about how to install and set Elasticsearch, refer to the Elasticsearch official website. Because we mainly introduce how to get started with Logstash, Elasticsearch's default installation and configuration have already met our requirements.
Now Elasticsearch is running and listening to port 9200. (All done, right ?), Elasticsearch can be used as its backend by setting Logstash. The default configuration is sufficient for Logstash and Elasticsearch. We ignore some additional options to set elasticsearch as output:
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
Enter some characters at will, and Logstash will process logs as before (but this time we will not see any output, because we have not set stdout as the output option)
you know, for logs
We can use the curl command to send a request to check whether ES has received the data:
curl 'http://localhost:9200/_search?pretty'
The returned content is as follows:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "logstash-2013.11.21", "_type" : "logs", "_id" : "2ijaoKqARqGvbMgP3BspJA", "_score" : 1.0, "_source" : {"message":"you know, for logs","@timestamp":"2013-11-21T18:45:09.862Z","@version":"1","host":"my-laptop"} } ] }}
Congratulations, now you have successfully used Elasticsearch and Logstash to collect log data.
Here, the Elasticsearch plug-in is another tool that is useful for querying your Logstash data (data in Elasticsearch). It is called the Elasticsearch-kopf plug-in. For more information, see the Elasticsearch plug-in. To install elasticsearch-kopf, you only need to execute the following command in the directory where you install Elasticsearch:
bin/plugin -install lmenezes/elasticsearch-kopf
Next, visit http: // localhost: 9200/_ plugin/kopf to browse the data stored in Elasticsearch, set and map!
Set multiple outputs as a simple example. Let's set stdout and elasticsearch as outputs to re-run Logstash, as shown below:
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
After entering some phrases, the input content is displayed back to our terminal and saved to Elasticsearch! (You can use curl and kopf plug-ins for verification ). Default Configuration-index by daily date you will find that Logstash can be smart enough to create an index on Elasticsearch... indexes are created daily in logstash-YYYY.MM.DD by default format. At midnight (GMT), Logstash automatically updates the index according to the timestamp. We can determine the amount of data to be maintained based on how long the data is traced. Of course, you can also migrate older data to other places (re-indexing) for convenient query, in addition, if you simply delete data for a period of time, you can use Elasticsearch Curator. Next we will begin to learn more advanced configuration items. In the following sections, we will focus on some core features of logstash and how to interact with the logstash engine. The lifecycle Inputs, Outputs, Codecs, and Filters of an event constitute the core configuration items of Logstash. Logstash creates an event processing pipeline to extract data from your logs and save it to Elasticsearch, providing the foundation for efficient data query. To help you quickly understand the various options provided by Logstash, let's first discuss some of the most commonly used configurations. For more information, see Logstash event pipeline.
Inputs
Input and input are transmitted to Logstash. Common configurations are as follows:
- File: reads a file from the file system, which is similar to the UNIX Command "tail-0a"
- Syslog: listens to port 514 and parses log data according to rfc00004 Standard
- Redis: reads data from the redis server and supports the channel (publish and subscribe) and list modes. Redis is generally used as the "broker" role in the Logstash consumption cluster to save the total Logstash consumption of the events queue.
- Lumberjack: uses the lumberjack protocol to receive data. Currently, it has been changed to logstash-forwarder.
Filters
Fillters serves as an intermediate processing component in the Logstash processing chain. They are often combined to implement specific behaviors to process event streams that match specific rules. Common filters are as follows:
- Grok: parses unordered text and converts it to a structured format. Grok is currently the best way to convert unstructured data into structured data that can be queried. There are more than 120 matching rules to meet your needs.
- Mutate: mutate filter allows you to change the input document. You can name, delete, move, or modify fields in the event processing process.
- Drop: discard some events and do not process them. For example, debug events.
- Clone: copy the event. You can also add or remove fields in this process.
- Geoip: Add Geographic Information (for the front-end kibana graphical display)
Outputs
Outputs is the final component of the logstash processing pipeline. An event can be output in multiple ways during processing, but once all outputs are executed, this event completes the lifecycle. Some common outputs include:
- Elasticsearch: If you plan to efficiently store data and make it easy and simple to query... Elasticsearch is a good method. Yes, there is suspicion of advertising here.
- File: Save event data to a file.
- Graphite: Send event data to graphical components, a popular open-source component that stores graphical display. Http://graphite.wikidot.com /.
- Statsd: statsd is a statistical service, such as technology and time statistics. One or more background services are aggregated through udp communication. If you have started using statsd, this option should be useful to you.
Codecs
Codecs is a data stream-based filter that can be configured as part of input and output. Codecs helps you easily split and send serialized data. Popular codecs include json, msgpack, and plain (text ).
- Json: encode/decode data in json format
- Multiline: aggregates data from multiple events into a single row. For example, java exception information and stack information
For complete configuration information, see the "plugin configuration" section in the Logstash document. For more interesting Logstash content, It is very common to use the-e parameter in the configuration file to specify the configuration in the command line. However, it takes a long time to configure more settings. In this case, we first create a simple configuration file and specify logstash to use this configuration file. For example, create a configuration file named "logstash-simple.conf" and save it in the same directory as Logstash. The content is as follows:
input { stdin { } }output { elasticsearch { host => localhost } stdout { codec => rubydebug }}
Next, run the following command:
bin/logstash -f logstash-simple.conf
We can see that logstash runs the example according to the configuration file you just created, which is more convenient. Note: We use the-f parameter to get the configuration from the file instead of the previous one using the-e parameter to get the configuration from the command line. The above is a very simple example. Of course, we will continue to write some complicated examples.
Filter filters is a row processing mechanism that organizes formatted data into the data you need. Let's take a look at the following example, the grok filter.
input { stdin { } }filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] }}output { elasticsearch { host => localhost } stdout { codec => rubydebug }}
Execute Logstash according to the following parameters:
bin/logstash -f logstash-filter.conf
Paste the following line of information to your terminal (of course, Logstash will process this standard input ):
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
You will see feedback similar to the following:
{ "message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"", "@timestamp" => "2013-12-11T08:01:45.000Z", "@version" => "1", "host" => "cadenza", "clientip" => "127.0.0.1", "ident" => "-", "auth" => "-", "timestamp" => "11/Dec/2013:00:01:45 -0800", "verb" => "GET", "request" => "/xampp/status.php", "httpversion" => "1.1", "response" => "200", "bytes" => "3891", "referrer" => "\"http://cadenza/xampp/navi.php\"", "agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""}
As you can see, Logstash (with the grok filter) can split a line of log data (in Apache's "combined log" Format) into different data fields. This is very useful for parsing and querying our own log data in the future. For example, HTTP return status codes and IP addresses are very easy. Few matching rules are not included by the grok, so if you are trying to parse some common log formats, someone may have done this. For details about the matching rules, see logstash grok patterns.
Another filter is date filter. This filter is used to parse the timestamp in the log and assign the value to the timestame field (no matter when the data is collected to logstash ). In this example, you may notice that the @ timestamp Field is set to December 11,201 3, indicating that logstash is processed for a period of time after logs are generated. This field is added to the data in the processing log. For example, the value of... is the timestamp when logstash processes the event.
Practical example: Apache Log (obtained from a file) Now, let's use some very practical configuration... apache2 access log! We will read the log file from the local machine and process the event that meets our needs through condition settings. First, we create a configuration file named logstash-apache.conf with the following content (you can modify your file name and path as needed ):
input { file { path => "/tmp/access_log" start_position => beginning }}filter { if [path] =~ "access" { mutate { replace => { "type" => "apache_access" } } grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] }}output { elasticsearch { host => localhost } stdout { codec => rubydebug }}
Next, create a file (in this example, "/tmp/access. log "), you can use the following log information as the file content (you can also use the logs generated by your own webserver ):
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"98.83.179.51 - - [18/May/2011:19:35:08 -0700] "GET /css/main.css HTTP/1.1" 200 1837 "http://www.safesand.com/information.htm" "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
Use the-f parameter to execute the preceding example:
bin/logstash -f logstash-apache.conf
You can see that apache log data has been imported to ES. Here, logstash reads and processes the specified file according to your configuration. Any content added to the file will be captured and finally saved to ES. In addition, the field value of type in the data will be replaced with "apache_access" (this function has been specified in the configuration ).
This configuration only enables Logstash to monitor apache access_log, but in practice it is often not enough. You may need to monitor error_log as long as you change one line in the preceding configuration, as shown below:
input { file { path => "/tmp/*_log"...
Now you can see that logstash processes error logs and access logs. However, if you check your data (maybe elasticsearch-kopf), you will find that the access_log log is divided into different fields, but error_log does not. This is because we use the "grok" filter and only configure to match the combinedapachelog log format. In this way, logs that meet the condition are automatically divided into different fields. We can control the log to parse the log according to its own format. Isn't it good? Right.
In addition, you may also find that Logstash does not repeat the events that have been processed in the file. Because Logstash records the File Processing location, it only processes the number of newly added lines in the file. Pretty!
We use the previous example to introduce the concept of conditional judgment. This concept should generally be familiar to most Logstash users. You can use if, else if, and else statements like other common programming languages. Let's mark the types of log files that each event depends on (access_log, error_log, and other log files ending with logs ).
input { file { path => "/tmp/*_log" }}filter { if [path] =~ "access" { mutate { replace => { type => "apache_access" } } grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } else if [path] =~ "error" { mutate { replace => { type => "apache_error" } } } else { mutate { replace => { type => "random_logs" } } }}output { elasticsearch { host => localhost } stdout { codec => rubydebug }}
I think you have noticed that we use the "type" field to mark each event, but we have not actually parsed logs of the "error" and "random" types... in actual situations, there may be many types of error logs. You can leave it as an exercise for your readers. You can rely on existing logs. SyslogOk. Now we will continue to understand a very useful example: syslog. Syslog is a long-used configuration for Logstash, And it performs well (the Protocol format complies with rfc00004 ). Syslog is actually a UNIX Network Log standard. The client sends log data to a local file or log server. In this example, you do not need to create a syslog instance. You can use the command line to implement a syslog service. In this example, you will see what happened.
First, let's create a simple configuration file to implement logstash + syslog, the file name is logstash-syslog.conf
input { tcp { port => 5000 type => syslog } udp { port => 5000 type => syslog }}filter { if [type] == "syslog" { grok { match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] } syslog_pri { } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } }}output { elasticsearch { host => localhost } stdout { codec => rubydebug }}
Run logstash:
bin/logstash -f logstash-syslog.conf
Generally, a client is required to link to port 5000 on the Logstash server and then send log data. In this simple demonstration, we simply use telnet to link to the logstash server to send log data (similar to sending log data in the standard input state of the command line in the previous example ). First, open a new shell window and enter the following command:
telnet localhost 5000
You can copy and paste the following sample information (you can also use other characters, but this may be incorrectly parsed by the grok filter ):
Dec 23 12:11:43 louis postfix/smtpd[31499]: connect from unknown[95.75.93.154]Dec 23 14:42:56 louis named[16000]: client 199.48.164.7#64817: query (cache) 'amsterdamboothuren.com/MX/IN' deniedDec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
Then you can see the output result in the window where you run Logstash. The information is processed and parsed!
{ "message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)", "@timestamp" => "2013-12-23T22:30:01.000Z", "@version" => "1", "type" => "syslog", "host" => "0:0:0:0:0:0:0:1:52617", "syslog_timestamp" => "Dec 23 14:30:01", "syslog_hostname" => "louis", "syslog_program" => "CRON", "syslog_pid" => "619", "syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)", "received_at" => "2013-12-23 22:49:22 UTC", "received_from" => "0:0:0:0:0:0:0:1:52617", "syslog_severity_code" => 5, "syslog_facility_code" => 1, "syslog_facility" => "user-level", "syslog_severity" => "notice"}
Congratulations! You have become a qualified Logstash user. You can easily configure, run Logstash, and send events to Logstash. However, this process will be worth exploring.