One, the problem scenario:
We have multiple access logs on Apache Server,apache that are delivered to Rsyslog Center and presented to users via Rsyslog client. In recent times, there have been many users who have feedback on their access log incomplete, there is a missing phenomenon.
Second, find the reasons and solutions:
1, deployment monitoring
We gave a rsyslog center a monitor, a 2min clock curl 5 times the specified URL, and then see if Rsyslog Center appears this log, the number of loss per 2min display in graphite. To allow us to have an intuitive control of the loss ratio of logs.
2, analysis and monitoring chart
The log is found to be lost, and the loss ratio is not fixed.
3, analysis log
A request is processed by which Apache, we have a field in log to record. Analysis of Rsyslog Center I monitored the log of the URL and found a strange phenomenon. For example: Monitoring found 4 consecutive hours of lost logs, and this four hours log in the processing of the monitoring URL does not appear a certain Apache server. And the next 2 hours, this Apache server appeared again. And then another Apache server is not in Rsyslog center. The log of the monitored URL is normal in the Apache server log, and there is no loss.
4, packet capture analysis
In the Rsyslog center through the capture packet analysis, found in the Lost log, in the Rsyslog center server grabbed the packet is able to catch, so at the time that the issue of lost logs and the server has a relationship.
5, according to analysis, that is rsyslog performance can not keep up with the loss of log, so do the following actions:
The main queue and action queue of rsyslog are tuned;
Rsyslog upgrade to the latest stable version;
The log of each Apache group is divided into another machine for processing;
The result: The problem remains, the log-off situation has not changed
6, parsing rsyslog error log
Rsyslog Client Side no error, all normal;
Rsyslog Server segment error is as follows:
Nov 16:02:01 90887294 bytes, max msg size is 4096, truncating ...
Nov 16:02:41 Framing Error in received TCP Message:delimiter are not SP, but have ASCII value 46.
Nov 16:02:49 uncompression of a message failed with return code-3-enable debug logging if you need further infor Mation. Message ignored.
Nov 16:03:13 Framing Error in received TCP Message:delimiter are not SP, but have ASCII value 45.
Nov 16:03:13 Framing Error in received TCP Message:delimiter are not SP, but have ASCII value 46.
Nov 16:03:24 Framing Error in received TCP Message:delimiter are not SP, but have ASCII value 35.
Nov 16:03:24 Framing Error in received TCP Message:delimiter are not SP but have ASCII value-27.
A strange phenomenon that should not occur:
(1) Log in unexpectedly there will be received oversize message error, our log is a single send, and a log is the maximum 2kB, less than maxmessagesize, should not appear oversize situation.
(2) Access log with <158> is found in Rsyslog Center, and the log appears as multiple lines next to each other. Looking at the size, the total size of a number of logs coincides with the maxmessagesize. And,<158> is part of the Rsyslog protocol, is the LOCAL3 info log, Rsyslog unexpectedly did not parse.
Find Rsyslog Error:
1, received oversize message appears
It is obvious that Rsyslog received the message to a single message greater than the maxmessagesize size, so truncated. Keep the maxmessagesize, the rest are discarded. A log of 90MB was discarded.
2, the Framing error in the received TCP Message:delimiter is the not SP and has ASCII value:
See this log error, according to source analysis, is because of the use of TCP Message with Octet-counter one way. See Rsyslog configuration, and supportoctetcountedframing This parameter, the default is on. If set to ' on ', The Legacy octed-counted framing (similar to RFC5425 framing) is activated.
Then we'll take a look at octed-counted framing:
By looking at the official documentation, we know that it is a workaround for TCP that sends multiple messages in a single message, and in this way, Rsyslog client sends this large message, The length of the message can be added to the first package, in the form of "length <158>message". This successfully solves the problem of sending multiple messages in a single message. If there is no space between length and <158>, it will report an error similar to Framing error in received TCP Message:delimiter are not sp.
Source Address:https://github.com/rsyslog/rsyslog/blob/a9133e3e8c48491eeb7b87ec27b780532aa04649/runtime/tcps_sess.c#L386
Rsyslog about octed-counted framing explanation:http://www.rsyslog.com/doc/v8-stable/configuration/modules/omfwd.html
Third, problem solving
Summarize all of the above information, and then according to the business scenario of our cloud platform hundreds of thousands of application. We infer that:
It is possible that a user's debug or info log contains the carriage return control \ n, and we rsyslog the client segment of the configuration file escapecontrolcharactersonreceive is off, that is, the control is not escaped, So the log containing the \ n control is sent to Rsyslog center. and Rsyslog Center is based on the carriage return control \ n to determine whether it is a log. If the user's log contains similar fields:get/123/id=123&\n90887294--sdf, then Rsyslog will take "get/123/id=123&" as a log, and 90887294 as the length of the next message , and there will be an error just Framing error . Rsyslog will wait to receive 90887294byte of data, and then determine that it is greater than maxmessagesize 4KB, then rsyslog only save the 4KB log from 90887294 onwards, the remaining 90887294-4*1024= 90883198byte of data is truncated and discarded. This has the error of received oversize message.
issues:https://github.com/rsyslog/rsyslog/issues/111
Workaround:
1, open the Rsyslog client's escapecontrolcharactersonreceive to on state. Avoid this problem by escaping the carriage return control
Impact: Converts the \ n in the user log to a three-bit octal number #012 that begins with #. (Basic ASCII table conversion)
2, turn off the Rsyslog center supportoctetcountedframing to off status. That is, octet data streams are not supported.
Impact: Log is truncated when there is \ n in the user log.
Weird question explanation:
1, the first question has been answered.
2, there is a continuous <158> log, because Rsyslog Center accepts 90887294 bytes of data, only 4KB data, and 4KB data may have more than one log, and each log is <158> start, Rsyslog does not escape <158> as part of the message body.
This article is from the "Good" blog, please be sure to keep this source http://leejia.blog.51cto.com/4356849/1714815
Analysis and resolution of Rsyslog Center's strange lost log problem