Logstash transmitting Nginx logs via Kafka (iii)

Last Update:2016-08-04 Source: Internet

Author: User

Tags nginx server logstash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A single process Logstash can implement read, parse, and output processing of the data. But in a production environment, running the Logstash process from each application server and sending the data directly to Elasticsearch is not the first choice: first, excessive client connections are an additional pressure on Elasticsearch; second, network jitter can affect Logstash processes, which in turn affect production applications; third, OPS may not be willing to deploy Java on production servers, or let Logstash compete for Java resources with business code.

Therefore, in practical application, the Logstash process will be divided into two different roles. Running on the application server, as far as possible to reduce the operating pressure, only read and forward, this role is called shipper; run on a stand-alone server, complete data parsing processing, responsible for writing Elasticsearch role, called Indexer.

Kafka is a high-throughput distributed subscription log service with high availability, high performance, distributed, high scalability, and durability features. Unlike Redis for lightweight Message Queuing, Kafka uses disk for Message Queuing, so there is no problem with the disk when the message is buffered. It is also recommended to use Kafka for Message Queuing in a production environment. In addition, if the company has Kafka services in operation, Logstash can also be quickly accessed, eliminating the hassle of repetitive construction.

First, Logstash construction

Detailed construction can refer to Logstash installation and construction (i.).

Second, the configuration shipper

Shipper is the Logstash process running on the Nginx server, Logstash writes through Logstash-input-file, and writes the log to the Logstash-output-kafka cluster through the Kafka plug-in.

Logstash uses a Ruby Gem library called Filewatch to listen for file changes. This library supports glob to expand the file path, and records the database file in. sincedb to track the current read location of the log file being listened to.

The inode for each monitored file is recorded in the. sincedb file, major number, minor number, and Pos.

Input configuration Instance

Input {file{Path="/var/log/nginx/log_access.log"type="nginx-access"Discover_interval= the    #Logstash how often to check if there are new files under the path being monitored. The default value is 15 seconds. Sincedb_path ="/etc/logstash/.sincedb"    #define the location of the Sincedb fileStart_position ="beginning"    #define where file reads are  }}

Additional configuration details:

exclude    do not want to be monitored files can be excluded. Close_older the    file that has been monitored, if it has not been updated within this time, shut down the handle that listens to the file. The default is: 3600s, which is one hour. Ignore_older    This file is ignored each time the file list is checked, if the last modification time of the file exceeds the value. The default is: 86400s, that is, one day. Sincedb_path    defines the. sincedb file path, which defaults to $HOME/. Sincedb. Sincedb_write_interval    interval How often write a sincedb file, default 15s. Stat_interval    how often to check the status of the monitored files (whether there is an update), the default is 1s. Start_position    tailcattail -F.

Output Configuration Instance

The following configuration enables basic use of the Kafka producer. For more detailed configuration of the producer, see the manufacturer section of the Kafka official documentation.

Output {  Kafka {    "localhost: 9092"    # producer     "nginx-access-log"    # setting writes to Kafka topic    "snappy "    # message compression mode, default is None, optional gzip, snappy.   }}

Logstash-out-kafka Other configuration details:

Compression_type    message compression mode, default is none, valid values are: None,gzip, snappy. asks    message acknowledgement mode, default is 1, valid values are:0,1in- sync All replication are confirmed. Send_buffer_bytes    the size of the buffer when TCP sends data.

The Logstash-kafka plugin input and output default codec are in JSON format. Note the encoding format when entering and outputting. During message passing, Logstash by default adds the appropriate timestamp and hostname information to the message encoding. If you do not want the above information (typically in the case of message forwarding), you can use the following configuration, for example:

Output {    Kafka {        = Plain {            "%{message}"        }    }}

Third, build configuration Kafka

Build configuration Kafka can refer to Kafka cluster construction.

Iv. Configuration Indexer

is to use the Logstash-input-kafka plug-in to read data from the Kafka cluster.

Input Configuration Example:

Input {kafka {Zk_connect="localhost:2181"    #Zookeeper AddressTOPIC_ID ="Nginx-access-log"    #topic name in Kafka, remember to create the topicgroup_id ="Nginx-access-log"     #The default is "Logstash"codec ="Plain"    #consistent with shipper end output configuration ItemsConsumer_threads =1    #number of threads consumedDecorate_events =true    #output the message back to its own information, including: the size of the consumption message, topic source, and consumer group information. Type ="Nginx-access-log"        }}

More Logstash-input-kafka configurations can be viewed from the Logstash official documentation.

Logstash is an input | Decode | Filter | Encode | The output data stream. The above configuration has codec "plain" , that is, logstash in the form of forwarding, the original information will not be encoded conversion. The presence of a rich filter plug-in is an important factor in the power of Logstash, providing more than just filtering functionality, complex logic processing, and even the addition of new Logstash events to subsequent processes. Only the Logstash-output-elasticsearch configuration is listed here.

Logstash-output-elasticsearch Configuration Example:

Output {elasticsearch {hosts= ["localhost:9200"]//elasticsearch addresses, with multiple addresses separated by commas. index ="logstash-%{type}-%{+yyyy. MM.DD}"    //index naming, uppercase letters not supported (except Logstash)Document_type ="%{type}"    //Document typeWorkers =1flush_size=20000    //Number of batches of data sent to ElasticsearchIdle_flush_time =Ten    //Elasticsearch the time interval to send data in bulk, even if the flush_size is not metTemplate_overwrite =true    //Set to True, the custom template will be overwritten with the Logstash template    }}

The log on the Nginx has been forwarded to Elasticsearch.

Logstash transmitting Nginx logs via Kafka (iii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More