ELK deployment reference

Last Update:2017-05-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

ELK deployment reference

Brief Introduction:

ELK is composed of three open-source tools:

Elasticsearch is an open-source distributed search engine that features: distributed, zero-configuration, automatic discovery, automatic index sharding, index copy mechanism, restful APIs, and multiple data sources, automatically search for loads.

Logstash is a fully open-source tool that collects, filters, and stores your logs for future use (such as searching ).

Kibana is also an open-source and free tool. It provides a user-friendly Web interface for log analysis provided by Logstash and ElasticSearch, helping you summarize, analyze, and search for important data logs.

Scenario Analysis:

Logs mainly include system logs, application logs, and security logs. O & M personnel and developers can learn about the server software and hardware information through logs, and check the configuration errors and the causes of errors. Regular log analysis can understand the load and performance security of servers, so as to take timely measures to correct errors.

Generally, logs are stored separately on different devices. If you manage dozens of hundreds of servers, you are still using the traditional method of logging on to each server in sequence to view logs. Is this complicated and inefficient. It is imperative to use centralized log management, such as open-source syslog To collect and summarize logs on all servers.

After the log is managed centrally, log statistics and retrieval become troublesome. Generally, we can use Linux commands such as grep, awk, and wc to perform retrieval and statistics, however, this method is not suitable for queries, sorting, statistics, and a large number of machines.

Here using open source Real-time log analysis ELK platform can perfectly solve our above problems, of course, there are other platforms or tools can be used, here only discuss ELK, Official Website: https://www.elastic.co

The elk official website provides the latest stable version 5.4.0.

Expected results:

1. system messages logs are imported to elasticsearch in the Local beat mode (data is not processed), and can be queried through kibana.

2. Apache access logs are imported to elasticsearch by means of remote beat (data processing). kibana can be used to search for any field in the log and combine fuzzy search. That is to say, apache logs are stored in elasticsearch in json format.

3. Nginx access logs, Apache access logs, and system logs of different clients are imported to elasticsearch through different matching condition regular processing. Nginx and system logs must be compiled with a simple regular expression.

Important Notes:

1. Elk versions must be consistent. 2. It is recommended that the operating system versions of all nodes be consistent and use the current stable centos7.3 version as much as possible. The configuration of the three Elk nodes must be a little higher than that of other nodes, which is 2C4G and the other nodes are 2C2G. The memory is too low. Software packages must be installed to ensure that all nodes can access the Internet.

3. Disable the firewall and selinux.

4. elk uses the tar package to install all software for unification. Installing logstash with yum, especially logstash, may encounter many pitfalls.

5. The construction process is not difficult. What is difficult is the mutual debugging of various projects. What is difficult is the advanced use method of elk.

Note:

The purpose of this article is to get you started. For more advanced elk applications and usage, please refer to the official website or other technical documents. All applications are deployed separately to be deployed in the docker container in the future. Of course, you can also deploy them on one server.

Details:

IP address	Host Name	Purpose	Install software
192.168.2.25	Apache	Client	Httpd, filebeat
192.168.2.26	Nginx	Client	Nginx, filebeat
192.168.2.27	Logstash	Log Analysis and Processing	Logstash, filebeat
192.168.2.28	Elasticsearch	Store Data	Elasticsearch
192.168.2.30	Kibana	Query data	Kibana

Installation steps:

1. Install jdk on three Elk nodes. jdk can be downloaded from the official oracle website. The version number can be different from mine.

 
 
  
  rpm -ivh jdk-8u102-linux-x64.rpm

2. Install the elasticsearch Node

 
 
  
  Wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.0.tar.gz
  
  Tar zxvf elasticsearch-5.4.0.tar.gz
  
  Music elasticsearch-5.4.0/usr/local/elasticsearch
  
  Cd/usr/local/elasticsearch/config
  
  Back up the default elasticsearch configuration file to prevent modification errors.
  
  Cp elasticsearch. yml elasticsearch. yml. default

After editing:

To add an elasticsearch user, run the tar package as a normal user.

 
 
  
  useradd elasticsearch
  
  chown -R elasticsearch:elasticsearch /usr/local/elasticsearch

Open the sysctl. conf file and add the following content:

 
 
  
  vm.max_map_count = 655360
  
  sysctl -p /etc/sysctl.conf

Open the/etc/security/limits. conf file and modify the number of opened file handles.

 
 
  
  * soft nofile 65536
  
  * hard nofile 65536
  
  * soft nproc 65536
  
  * hard nproc 65536
  
  su - elasticsearch
  
  cd /usr/local/elasticsearch
  
  bin/elasticsearch

It takes some time for the first startup because Initialization is required. If the startup is not successful, please check the logs related to elasticsearch. Note that the above is only the frontend startup debugging, and you need to add & in the background and restart it.

Check whether the port is enabled

Curl simple test

3. Install the logstash Node

 
 
  
  wget https://artifacts.elastic.co/downloads/logstash/logstash-5.4.0.tar.gz
  
  tar zxvf logstash-5.4.0.tar.gz
  
  mv logstash-5.4.0 /usr/local/logstash

Download filebeat from logstash and start it to listen to the newly added content of the data source file after being processed by logstash and then upload it to es.

 
 
  
  wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-5.4.0-linux-x86_64.tar.gz
  
  tar zxvf filebeat-5.4.0-linux-x86_64.tar.gz
  
  mv filebeat-5.4.0-linux-x86_64 /usr/local/filebeat
  
  cd /usr/local/filebeat
  
  cp filebeat.yml filebeat.yml.default

Edit the filebeat. yml file with the following content:

Start the filebeat Service

 
 
  
  cd /usr/local/filebeat
  
  ./filebeat &

Note that filebeat does not have a listening port. It mainly depends on logs and processes.

Create a local file messages-log. You can obtain the following messages files of the local system:

Note that the file records monitored by filebeat are stored in/usr/local/filebeat/data/registry.

Finally, create a logstash startup specified test. conf configuration file. The content is as follows:

By default, Logstash has three regions: input, filter, and output. Generally, at least input and output need to be configured!

Choose not to modify the default logstash. yml configuration file of logstash!

Cd/usr/local/logstash

First, test logstash to start logstash without specifying the configuration file.

 
 
  
  bin/logstash -e 'input { stdin { } } output { stdout {} }'

Manually input hello world, which also outputs hello world

Then specify the configuration file test. conf to start. Note that the configuration file is started on the frontend to facilitate debugging.

Check whether port 5044 and port 9600 are enabled.

Wait for a while and the following output will appear, that is, the last line of definition in test. conf is output to the screen.

However, the configuration file is also entered in elasticsearch. Let's verify it:

Note that only one piece of data is intercepted. To view the complete data, use kibana.

4. Install the kibana Node

 
 
  
  wget https://artifacts.elastic.co/downloads/kibana/kibana-5.4.0-linux-x86_64.tar.gz
  
  tar zxvf kibana-5.4.0-linux-x86_64.tar.gz
  
  mv kibana-5.4.0-linux-x86_64 /usr/local/kibana
  
  cd /usr/local/kibana/config
  
  cp kibana.yml kibana.yml.default

Edit the kibana. yml configuration file

Start the kibana Service

 
 
  
  bin/kibana

Check whether the port is enabled

Open your browser and enter http: // 192.168.2.30: 5601

Click the create button and then click the discover button above. Note that if there is no data, check the import time @ timestamp to compare with the current time, kibana only displays the data of the last 15 minutes by default. If the data exceeds 15 minutes, select the appropriate time. From kibana, you can see that all the four data entries in messages-log are imported normally. This completes the first effect of our implementation. But this is just to run the process, and there are more things we need to do next. Note that you can create an index in kibana only after importing data to elasticsearch.

Now we need to implement the second effect. First, we need to clear the data in elasticsearch. In fact, it doesn't matter whether to delete the data. Here we just want to demonstrate the location of data storage in elasticsearch.

Rm-rf/usr/local/elasticsearch/data/nodes

Close the elasticsearch service and restart it. The deleted nodes directory will be initialized and created again. Refresh the discover button on the kibana page and change the timeline to the data of the last five years, no data is found.

5. Install the apache node. I will install it directly in yum for a simple test.

 
 
  
  yum install httpd -y
  
  systemctl start httpd

Access http: // 192.168.2.25 in a browser. The apache main interface is displayed. The following log is displayed. To facilitate the demonstration, I only retrieve 6 data entries and the status code is "200 ", one "403" and one "404". Now we need to import the data to elasticsearch through logstash and query it through kibana.

Install filebeat on the apache node as a client

For installation steps, refer to the above

The configuration file is as follows:

Start the filebeat Service

 
 
  
  ./filebeat &

Stop the logstash service, and then re-specify a test02.conf configuration file, with an additional filter area. Here, the apache log is matched according to the grok regular expression, and each field in the log is imported in json format. The content is as follows:

The % {COMBINEDAPACHELOG} regular in is provided by default by logstash. The specific location is as follows:

/Usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-4.0.2/patterns/grok-patterns

In the grok-patterns file, there are two COMMONAPACHELOG parameters. The preceding one is the COMMONAPACHELOG parameter, which is the log format used by apache as the nginx backend server, the following COMBINEDAPACHELOG parameter directly calls the above COMMONAPACHELOG parameter and adds two parameters as the log format used by the web server. Here I use apache as the web server, so use the COMBINEDAPACHELOG parameter! Use the COMMONAPACHELOG parameter for nginx backend web servers! The format of each parameter is separated by a colon, followed by the variables defined in grok-pattrens, and the names of variables can be customized later. Each % represents a matching parameter.

Check whether the configuration file test02.conf has a syntax error before starting logstash.

Logstash is officially started. Here, only one Data graph is captured because of the large amount of data.

We can see that each field in the apache log has been imported to elasticsearch in json format. In addition, some fields are added, such as timestamp and @ timestamp, which are the most confusing ones, the former is the apache access time, and the latter can be understood as the logstash processing time, which is eight hours later than our Beijing time. I think this time is rarely used, we can query the accuracy of the number of data entries in kibana. 6 hits indicates that there are 6 data records, which is exactly the same as the number in the access_log above.

Click the arrow of any data and click json. We can see that all fields in the apache Log are stored in json format, such as request, status code, and request size.

Try fuzzy search and search for the Status Code 404 for a certain access time

Search status code greater than 400 less than 499

We basically know that when search conditions become more rigorous, the only way we can do is to split and store our data into elasticsearch fields. In this way, the search is what we need. This basically completes the second effect we want to achieve.

Next, we need to store apache, nginx, and system logs in elasticsearch according to different log formats, first, each machine needs to collect system logs, and then collect server logs of different services according to different services. Here, apache collects apache and system logs, and nginx also collects nginx and system logs.

6. Install nginx nodes and use nginx as the front-end Reverse Proxy Server

 
 
  
  . yum install epel-release -y
  
  . yum install nginx -y

First, let's take a look at the default nginx log format.

Generally, three forwarding parameters are added to the log, including the address returned by the backend server, the status code returned by the backend program, and the response time of the backend program.

Note that the nginx log format is not available in the grok of logstash by default, but it is basically used as a web server like apache, and many field parameters can be shared, add a COMMONNGINX parameter to the grok-patterns file.

COMMONNGINXLOG % {COMBINEDAPACHELOG} % {QS: x_forwarded_for }(? : % {HOSTPORT1: upstream_addr} |-) (% {STATUS: upstream_status} |-) (% {BASE16FLOAT: upstream_response_time} | -)

The preceding $ http_x_forwarded_for parameter can be directly called by apache. The following four variables must be defined. For example, HOSTPORT1 and STATUS are the default variable names not available in logstash, therefore, we need to use regular expression matching to add the following content to the grok-patterns file:

Save and exit, and then directly call the COMMONNGINXLOG parameter!

Now we have defined the system log. Although there are by default but it cannot meet our needs, we manually write a regular expression to add it to the grok-patterns file.

SYSLOG % {SYSLOGTIMESTAMP: syslog_timestamp }%{ SYSLOGHOST: syslog_hostname }%{ DATA: syslog_program }(? : \ [% {POSINT: syslog_pid} \])? :%{ GREEDYDATA: syslog_message}

You can also use the Grok Debugger or Grok Comstructor tool for testing. When adding a custom regular expression, you can select "Add mpmpatterns" in the Grok Debugger ".

Now you need to debug nginx forwarding requests to the apache server for processing. That is, nginx is the frontend reverse proxy, and apache is the backend server.

Edit the nginx main configuration file nginx. conf and change location/to the following:

Start the nginx Service

 
 
  
  systemctl start nginx

Install filebeat on nginx (refer to the above steps)

The filebeat configuration file of nginx is as follows:

The content of the new messages_log file of nginx is as follows:

Modify the httpd. conf file of the apache main configuration file and modify the log format because it is used as a backend web server and does not need to record agent and other information.

Remove this line of comment

CustomLog "logs/access_log" common

Add comments to the front of the line

CustomLog "logs/access_log" combined

Create a test.html file under the default/var/www/htmldirectory of the apache server. You can write anything you like:

Restart the apache service

Systemctlrestart httpd

Access nginx Service

Nginx logs are displayed on the following page, which indicates that they are normal.

 
 
  
  192.168.9.106 - - [10/May/2017:09:14:28 +0800] "GET /test.html HTTP/1.1" 200 14 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36" "-" 192.168.2.25:80 200 0.002
  
  192.168.9.106 - - [10/May/2017:09:14:28 +0800] "GET /favicon.ico HTTP/1.1" 404 209 "http://192.168.2.26/test.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36" "-" 192.168.2.25:80 404 0.001

The following page appears in the apache Log, indicating that it is normal.

 
 
  
  192.168.2.26 - - [10/May/2017:09:14:31 +0800] "GET /test.html HTTP/1.0" 200 14
  
  192.168.2.26 - - [10/May/2017:09:14:31 +0800] "GET /favicon.ico HTTP/1.0" 404 209

The filebeat configuration file of apache is as follows:

The content of the new messages_log file of apache is as follows:

All the configuration and test files are ready here. There are two pieces of data in each nginx and apache server log, and two pieces of System Log, that is, eight pieces of data.

Finally, we are playing a major role. The test03.conf configuration file of logstash is as follows:

Note that this apache regular expression match has been modified because it is used as a backend server. to verify the correctness of data import, clear Data in elstucsearch and filebeat information imported from nginx and apache clients. Stop the filebeat service before deleting the registry file and then start the filebeat service. To clear elasticsearch data, follow these steps.

Start logstash and import only one system access log. You can see that the system logs are also imported to elasticsearch for storage by log format fields.

From kibana, we can see that there are exactly 8 data records.

Let's take a look at the json format after a system log is imported. It is mainly divided into four fields according to the SYSLOG Regular Expression: syslog_timestamp, syslog_hostname, syslog_program, syslog_message

Let's take a look at the json format after nginx logs are imported. Here, the nginx fields are not explained in detail.

In this way, we first need to display three results, of course, this is only a very basic setup and configuration, please query the official documentation for more advanced ELK usage methods.

FAQ summary:

1. Duplicate import of new content in the listener File

This is generally caused by directly editing the new content in the file. The correct method is echo "xxx"> filename

2. No data is found in Kibana, but elasticsearch contains

It may be that the data query time is incorrect.

3. Logstash startup is very slow

Install the epel source, install haveged, start it, and restart logstash.

4. logstash installed in Yum can be started, but data cannot be imported to elasticsearch.

Generally, elasticsearch and kibana installed on yum have no major problems, but logstash does not work. It seems that it is not good to specify the configuration file, and many pitfalls may occur.

5. After the data is imported, it is not displayed in json format according to the specified regular expression.

Generally, the data format and regular expression are not matched.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

ELK deployment reference

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

ELK deployment reference

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support