A single process Logstash can implement read, parse, and output processing of the data. But in a production environment, running the Logstash process from each application server and sending the data directly to Elasticsearch is not the first choice: first, excessive client connections are an additional pressure on Elasticsearch; second, network jitter can affect Logstash processes, which in turn affect production applications; third, OPS may not be willing to deploy Java on production servers, or let Logstash compete for Java resources with business code.
Therefore, in practical application, the Logstash process will be divided into two different roles. Running on the application server, as far as possible to reduce the operating pressure, only read and forward, this role is called shipper; run on a stand-alone server, complete data parsing processing, responsible for writing Elasticsearch role, called Indexer.
Kafka is a high-throughput distributed subscription log service with high availability, high performance, distributed, high scalability, and durability features. Unlike Redis for lightweight Message Queuing, Kafka uses disk for Message Queuing, so there is no problem with the disk when the message is buffered. It is also recommended to use Kafka for Message Queuing in a production environment. In addition, if the company has Kafka services in operation, Logstash can also be quickly accessed, eliminating the hassle of repetitive construction.
First, Logstash construction
Detailed construction can refer to Logstash installation and construction (i.).
Second, the configuration shipper
Shipper is the Logstash process running on the Nginx server, Logstash writes through Logstash-input-file, and writes the log to the Logstash-output-kafka cluster through the Kafka plug-in.
Logstash uses a Ruby Gem library called Filewatch to listen for file changes. This library supports glob to expand the file path, and records the database file in. sincedb to track the current read location of the log file being listened to.
The inode for each monitored file is recorded in the. sincedb file, major number, minor number, and Pos.
Input configuration Instance
Input {file{Path="/var/log/nginx/log_access.log"type="nginx-access"Discover_interval= the #Logstash how often to check if there are new files under the path being monitored. The default value is 15 seconds. Sincedb_path ="/etc/logstash/.sincedb" #define the location of the Sincedb fileStart_position ="beginning" #define where file reads are }}
Additional configuration details:
exclude do not want to be monitored files can be excluded. Close_older the file that has been monitored, if it has not been updated within this time, shut down the handle that listens to the file. The default is: 3600s, which is one hour. Ignore_older This file is ignored each time the file list is checked, if the last modification time of the file exceeds the value. The default is: 86400s, that is, one day. Sincedb_path defines the. sincedb file path, which defaults to $HOME/. Sincedb. Sincedb_write_interval interval How often write a sincedb file, default 15s. Stat_interval how often to check the status of the monitored files (whether there is an update), the default is 1s. Start_position tailcattail -F.
Output Configuration Instance
The following configuration enables basic use of the Kafka producer. For more detailed configuration of the producer, see the manufacturer section of the Kafka official documentation.
Output { Kafka { "localhost: 9092" # producer "nginx-access-log" # setting writes to Kafka topic "snappy " # message compression mode, default is None, optional gzip, snappy. }}
Logstash-out-kafka Other configuration details:
Compression_type message compression mode, default is none, valid values are: None,gzip, snappy. asks message acknowledgement mode, default is 1, valid values are:0,1in- sync All replication are confirmed. Send_buffer_bytes the size of the buffer when TCP sends data.
The Logstash-kafka plugin input and output default codec are in JSON format. Note the encoding format when entering and outputting. During message passing, Logstash by default adds the appropriate timestamp and hostname information to the message encoding. If you do not want the above information (typically in the case of message forwarding), you can use the following configuration, for example:
Output { Kafka { = Plain { "%{message}" } }}
Third, build configuration Kafka
Build configuration Kafka can refer to Kafka cluster construction.
Iv. Configuration Indexer
is to use the Logstash-input-kafka plug-in to read data from the Kafka cluster.
Input Configuration Example:
Input {kafka {Zk_connect="localhost:2181" #Zookeeper AddressTOPIC_ID ="Nginx-access-log" #topic name in Kafka, remember to create the topicgroup_id ="Nginx-access-log" #The default is "Logstash"codec ="Plain" #consistent with shipper end output configuration ItemsConsumer_threads =1 #number of threads consumedDecorate_events =true #output the message back to its own information, including: the size of the consumption message, topic source, and consumer group information. Type ="Nginx-access-log" }}
More Logstash-input-kafka configurations can be viewed from the Logstash official documentation.
Logstash is an input | Decode | Filter | Encode | The output data stream. The above configuration has codec "plain" , that is, logstash in the form of forwarding, the original information will not be encoded conversion. The presence of a rich filter plug-in is an important factor in the power of Logstash, providing more than just filtering functionality, complex logic processing, and even the addition of new Logstash events to subsequent processes. Only the Logstash-output-elasticsearch configuration is listed here.
Logstash-output-elasticsearch Configuration Example:
Output {elasticsearch {hosts= ["localhost:9200"]//elasticsearch addresses, with multiple addresses separated by commas. index ="logstash-%{type}-%{+yyyy. MM.DD}" //index naming, uppercase letters not supported (except Logstash)Document_type ="%{type}" //Document typeWorkers =1flush_size=20000 //Number of batches of data sent to ElasticsearchIdle_flush_time =Ten //Elasticsearch the time interval to send data in bulk, even if the flush_size is not metTemplate_overwrite =true //Set to True, the custom template will be overwritten with the Logstash template }}
The log on the Nginx has been forwarded to Elasticsearch.
Logstash transmitting Nginx logs via Kafka (iii)