In general, the client side of the log collection scheme needs to install an additional agent to collect logs, such as Logstash, Filebeat, and so on, and the additional program means that the environment is complex and the resource is occupied, is there a way to implement log collection without the need for an additional installation program? Rsyslog is the answer you're looking for!
Rsyslog
Rsyslog is a high-speed Log collection processing service that features high performance, security, and modularity, and is capable of receiving log input from a variety of sources (e.g., file,tcp,udp,uxsock, etc.) and outputting the results by processing the different destinations (for example: MySQL, Mongodb,elasticsearch,kafka, etc.), processing log volume per second can exceed millions.
The enhanced upgrade version of Rsyslog as a syslog has been installed by default on each Linux distribution without additional installation.
Collect Nginx Logs
Elk collects the log flowchart through Rsyslog as follows:
- The process is: Nginx--syslog--> Rsyslog--omkafka--> Kafka-to-Logstash-Elasticsearch
- Nginx generated log through the Syslog system service to the Rsyslog server, Rsyslog received the log through the Omkafka module to write Kafka,logstash read Kafka queue and then write Elasticsearch, Users retrieve logs stored in Elasticsearch via Kibana
- The Rsyslog service system comes with no installation, so the client does not need to install additional applications throughout the process
- Service side Although Rsyslog is also installed, but the default does not have Omkafka module, if need Rsyslog write Kafka need to install this module first
- The Omkafka module is supported in versions after Rsyslog v8.7.0, so you need to
rsyslogd -v
view the Rsyslog version by command first, if the version is lower, you need to upgrade
Rsyslog Upgrade
1. Add a key to the Rsyslog source
# apt-key adv --recv-keys --keyserver keys.gnupg.net AEF0CF8E
2. Add Rsyslog Source Address
echo "deb http://debian.adiscon.com/v8-stable wheezy/" >> /etc/apt/sources.listecho "deb-src http://debian.adiscon.com/v8-stable wheezy/" >> /etc/apt/sources.list
3. Upgrade Rsyslog Service
# apt-get update && apt-get -y install rsyslog
Add Omkafka Module
1. Install the compiler tool, the bottom autoreconf need to use, otherwise cannot generate configure file
# apt-get -y install pkg-config autoconf automake libtool unzip
2.omkafka need to install a bunch of dependent packages
# apt-get -y install libdbi-dev libmysqlclient-dev postgresql-client libpq-dev libnet-dev librdkafka-dev libgrok-dev libgrok1 libgrok-dev libpcre3-dev libtokyocabinet-dev libglib2.0-dev libmongo-client-dev libhiredis-dev# apt-get -y install libestr-dev libfastjson-dev uuid-dev liblogging-stdlog-dev libgcrypt-dev# apt-get -y install flex bison librdkafka1 librdkafka-dev librdkafka1-dbg
3. Compile and install the Omkafka module
# mkdir tmp && cd tmp# git init# git pull git@github.com:VertiPub/omkafka.git# autoreconf -fvi# ./configure --sbindir=/usr/sbin --libdir=/usr/lib --enable-omkafka && make && make install && cd ..
Rsyslog Collection Nginx Log client-side Nginx configuration
log_format jsonlog '{' '"host": "$host",' '"server_addr": "$server_addr",' '"http_x_forwarded_for":"$http_x_forwarded_for",' '"remote_addr":"$remote_addr",' '"time_local":"$time_local",' '"request_method":"$request_method",' '"request_uri":"$request_uri",' '"status":$status,' '"body_bytes_sent":$body_bytes_sent,' '"http_referer":"$http_referer",' '"http_user_agent":"$http_user_agent",' '"upstream_addr":"$upstream_addr",' '"upstream_status":"$upstream_status",' '"upstream_response_time":"$upstream_response_time",' '"request_time":$request_time''}';access_log syslog:server=rsyslog.domain.com,facility=local7,tag=nginx_access_log,severity=info jsonlog;
1.Nginx in the post-v1.10 version only support syslog processing log, please ensure that your nginx version is above 1.10
2. In order to reduce the logstash processing pressure, but also to reduce the complexity of the entire configuration, our Nginx log directly in the JSON format
3. Discard the text file to record the Nginx log, and use syslog to directly transfer the log to the remote Rsyslog server, so that our subsequent processing Another very important benefit of doing this is that we no longer have to consider the partitioning and periodic deletion of Nginx logs (typically we use the Logrotate service to manage logs on a daily basis and periodically delete them for ease of administration, so that the disk is not fully occupied)
4.access_log Direct output to the Syslog service, the parameters are explained as follows:
- syslog: Indicates that logs are received with the Syslog service
- Server: Receive syslog send log rsyslog server address, default UDP protocol, port is 514
- facility: Specifies the type of logging message, such as authentication type AUTH, scheduled task cron, program custom local0-7, etc., no special meaning, do not have to delve into the default value is LOCAL7
- tag: Add a tag to the log, mainly to facilitate our service side to distinguish which service or client from the log, for example, we gave the tag:
nginx_access_log
, if there are multiple services at the same time to write the log to Rsyslog, and the configuration of the non-pass tag, In the Rsyslog server can be based on this tag to find out which is the Nginx log
- Severity: Defines the level of the log, such as Debug,info,notice, which is error by default
Server-side Rsyslog configuration
# cat /etc/rsyslog.d/rsyslog_nginx_kafka_cluster.conf module(load="imudp")input(type="imudp" port="514")# nginx access log ==> rsyslog server(local) ==> kafkamodule(load="omkafka")template(name="nginxLog" type="string" string="%msg%")if $inputname == "imudp" then { if ($programname == "nginx_access_log") then action(type="omkafka" template="nginxLog" broker=["10.82.9.202:9092","10.82.9.203:9092","10.82.9.204:9092"] topic="rsyslog_nginx" partitions.auto="on" confParam=[ "socket.keepalive.enable=true" ] )}:rawmsg, contains, "nginx_access_log" ~
1. Add a profile that specifically handles Nginx logs in the RSYSLOG.D directory
The important configuration of the 2.rsyslog configuration file is explained below:
- module: Load module, here we need to load the IMUDP module to receive the Nginx server syslog sent over the log data, also need to load the Omkafka module to write the log to Kafka
- input: Open UDP protocol, Port 514, can also open the TCP protocol, both can coexist
- Template: Define a templates, called Nginxlog, template can define the format of the log, because we pass the JSON already, do not need to match the format, so there is no additional definition, note that the template name to be unique
- action: After matching to InputName
imudp
and ProgramName as nginx_access_log
(that is, we have the tag inside the Nginx configuration) after the processing method, The configuration here is matched to the log written by the Omkafka module to the Kafka cluster, and there are some Omkafka more detailed configuration references on the Omkafka module Official document
- : Rawmsg, contains: The last line means to ignore the included
nginx_access_log
log, without which the Rsyslog service will log all the logs to a message in the default, we have the log output to Kafka, There's no need to record it locally.
The 3.omkafka module checks if the topic exists in the Kafka and is created if it does not exist, without manually creating Kafka topic
Server-side Logstash configuration
input { kafka { bootstrap_servers => "10.82.9.202:9092,10.82.9.203:9092,10.82.9.204:9092" topics => ["rsyslog_nginx"] }}filter { mutate { gsub => ["message", "\\x", "\\\x"] } json { source => "message" } date { match => ["time_local","dd/MMM/yyyy:HH:mm:ss Z"] target => "@timestamp" }}output { elasticsearch { hosts => ["10.82.9.205", "10.82.9.206", "10.82.9.207"] index => "rsyslog-nginx-%{+YYYY.MM.dd}" }}
Important configuration parameters are explained as follows:
- input: Configure Kafka's cluster address and topic name
- Filter: Some filtering strategies, because the time to pass in the Kafka is in JSON format, so there is no need for additional processing, the only thing to note is that if there is Chinese in the log, such as the URL has Chinese content needs to be replaced
\\x
, or JSON format will be error
- Output: Configure the address of the ES server cluster and Index,index automatically split by day
Test of the joint adjustment
After the configuration is completed, restart the Rsyslog service and Nginx service, and access the Nginx generated log
1. See if Kafka has a normal build topic
# bin/kafka-topics.sh --list --zookeeper 127.0.0.1:2181__consumer_offsetsrsyslog_nginx
2. See if topic can receive logs properly
# bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic rsyslog_nginx{"host": "domain.com","server_addr": "172.17.0.2","http_x_forwarded_for":"58.52.198.68","remote_addr":"10.120.89.84","time_local":"28/Aug/2018:14:26:00 +0800","request_method":"GET","request_uri":"/","status":200,"body_bytes_sent":1461,"http_referer":"-","http_user_agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36","upstream_addr":"-","upstream_status":"-","upstream_response_time":"-","request_time":0.000}
3.kibana Add index, see if there is data in the Elasticsearch, if the first two steps are normal, Kibana cannot search index or index no data, most of the index name is wrong, such as the basic problem, carefully check
Kibana Query Show
Open rsyslog-nginx-*
The index that Kibana added, and select Timestamp to create the index Pattern
Enter the Discover page, can be very intuitive to see the changes in the amount of time, according to the left field to implement simple filtering, for example, we want to see all the access status of 404 URI, you can click on the Request_uri and status behind the Add, The contents of these two items will appear on the right side, then click the plus sign below the status code 404, then view only the status of 404 requests, click above Auto-refresh can set the page automatic refresh time
The combination of queries through various conditions can achieve a variety of requirements, such as requests per second, bandwidth consumption, abnormal proportions, slow response, top IP, top URL, and so on, and can be easily visualize through the information to draw the icon, generate dashboard Save
Written in the last
- Nginx Access log is absolutely a treasure of the site, through the change of log volume can know the traffic situation of the site, through the status of the analysis can know we provide the reliability of the service, through the specific activity URL tracking can be real-time to understand the popularity of the activity, The combination of certain conditions can also provide advice and help for website operations, making our site more user-friendly and easy to use
- A single point of Rsyslog service can be deployed by deploying multiple Rsyslog services over three layers of load to ensure high availability, but in our experience Rsyslog service is still very stable, running for more than a year, log processing capacity of about 20w per minute, there is no downtime situation, If you don't want to be so complicated, you can write a check. Rsyslog Service status script run backstage, hang up automatically pull up
- The whole process we used the UDP protocol, the first is because the Nginx log syslog mode is supported by default UDP protocol, turned the official website did not find a way to support TCP, I think this is also considering the UDP protocol performance is much better than TCP, The second consideration is that if TCP encounters network instability, it may not stop retrying or waiting, affecting the stability of nginx. For a problem where the content is too long to exceed the Ethernet data frame length,
If you feel that the article is helpful to you, please forward it to more people. If you don't feel like reading, read the following articles:
- Elk building MySQL Slow log collection platform
- Docker-based DevOps practices for small and medium teams