Extension of the log system Flume-linedeserializer

Source: Internet
Author: User
Tags rabbitmq

I blog article if not specifically noted are original! If reproduced please specify the source: http://blog.csdn.net/yanghua_kobe/article/details/46595401

Continuing the chat log system, the previous it has mentioned that our selection on the log collection is Flume-ng. The application logs the log to its own log file or to the specified folder (log files are scrolled by day), and then uses the Flume agent to go to the log file for collection.

Deserializer Introduction

Flume abstracts a log into an event. Here we collect the logs from the log files using the custom version of the Spooldirectorysource (we support the Append write collection for today's log files). Converting each log into an event from the log source requires Deserializer (the Deserializer). The deserializer of each source corresponding to the flume must implement the interface Eventdeserializer, which defines the Readevent/readevents method to read the event from various log sources.

The flume mainly supports two types of deserialization:

(1) Avroeventdeserializer: Resolves the Avro container file's deserializer. Generates a flume event for each record of the Avro file and stores Avro encoded binary records into the event body.

(2) Linedeserializer: It is a log file-based deserializer that divides each row into a single log record with a "\ n" line terminator.

Linedeserializer's flaws

In most cases spooldictionarysource work together with Linedeserializer. However, when the log record itself is split into multiple rows, such as the Exception Log stack or the log contains "\ n" newline characters, the problem is: the original way of defining logging by row does not meet this requirement. A format like this:

[2015-06-22 13:14:28,780] [ERROR] [SysName] [Subsys or component] [Thread-9] [Com.messagebus.client.handler.common.CommonLoopHandler]-*-stacktrace-*-: Com.rabbitmq.client.ShutdownSignalException:clean channel shutdown; Protocol method: #method <channel.close> (reply-code=200, Reply-text=ok, Class-id=0, method-id=0) at Com.rabbitmq.client.QueueingConsumer.handle (queueingconsumer.java:203) at Com.rabbitmq.client.QueueingConsumer.nextDelivery (queueingconsumer.java:220) at Com.messagebus.client.handler.common.CommonLoopHandler.handle (commonloophandler.java:34) at Com.messagebus.client.handler.consume.ConsumerDispatchHandler.handle (consumerdispatchhandler.java:17) at Com.messagebus.client.handler.MessageCarryHandlerChain.handle (messagecarryhandlerchain.java:72) at Com.messagebus.client.handler.consume.RealConsumer.handle (realconsumer.java:44) at Com.messagebus.client.handler.MessageCarryHandlerChain.handle (messagecarryhandlerchain.java:72) at Com.messagebus.client.handler.consume.ConsumerTAggenerator.handle (consumertaggenerator.java:22) at Com.messagebus.client.handler.MessageCarryHandlerChain.handle (messagecarryhandlerchain.java:72) at Com.messagebus.client.handler.consume.ConsumePermission.handle (consumepermission.java:37) at Com.messagebus.client.handler.MessageCarryHandlerChain.handle (messagecarryhandlerchain.java:72) at Com.messagebus.client.handler.consume.ConsumeParamValidator.handle (consumeparamvalidator.java:17) at Com.messagebus.client.handler.MessageCarryHandlerChain.handle (messagecarryhandlerchain.java:72) at Com.messagebus.client.carry.GenericConsumer.run (genericconsumer.java:50) at Java.lang.Thread.run (thread.java:744 ) caused By:com.rabbitmq.client.ShutdownSignalException:clean channel shutdown; Protocol method: #method <channel.close> (reply-code=200, Reply-text=ok, Class-id=0, method-id=0)

Of course, you can also do special processing of the log content, so that all the contents of a log output in one line, but this needs to customize the log framework, and sometimes it is not under your control. So the best option here is to customize the log collector.

Source Problem Location

Let's take a look at the core implementation of Linedeserializer in Flume source code:

  Private String ReadLine () throws IOException {    StringBuilder sb = new StringBuilder ();    int C;    int readChars = 0;    while ((c = In.readchar ())! =-1) {      readchars++;      Fixme:support \ r \ n      if (c = = ' \ n ') {break        ;      }      Sb.append ((char) c);      if (readChars >= maxlinelength) {        Logger.warn ("line length exceeds Max ({}), truncating line!",            Maxlinelength);        break;      }    }    if (ReadChars > 0) {      return sb.tostring ();    } else {      return null;    }  }

First, a StringBuilder is constructed, then read in characters, and if the newline character "\ n" is read, it reads the end of the log, jumps out of the loop, or appends the string to StringBuilder. At the same time, the number of characters to read is counted: If the number of characters read is greater than the maximum string length of a pre-configured row of logs, the loop will also jump out.

The main problem here is the delimiter logic with the newline character "\ n" as the end of the log. When we record the exception log, we need to re-find a way to define the end of the log record.

Solution Ideas

Given that we use [] as the tag qualifier for the log, almost every log starts with "[". Therefore, the approach we have taken is: judge read to the newline character "\ n" before reading the next bit, if the next one is "[", you think this is a normal non-breaking log, at this time, and then fallback a character (because just read a character, you need to let the pointer back to the original position), and then jump out of the loop; "[", it is considered an exception log or a multiline log. Then continue reading the character, and repeat the above judgment again when you encounter a line feed. Of course, if your log format starts with a fixed format and the initials are fixed, you may want to configure the log apender so that it is judged by a particular symbol as the end of the log. In addition, it is sometimes possible to match on a regular basis.

Custom implementations

To improve extensibility, we provide a configuration for the next character of the read-ahead and name it: Newlinestartprefix. We create a new deserialization class: Multilinedeserializer. Most of the logic of this class is the same as Linedeserializer, the main need to re-implement the above ReadLine method, implemented as follows:

    Private String ReadLine () throws IOException {StringBuilder sb = new StringBuilder ();        int C;        int readChars = 0;            while ((c = In.readchar ())! =-1) {readchars++;                Fixme:support \ r \ n (c = = ' \ n ') {//walk more one step c = In.readchar ();                if (c = =-1) break;                    else if (c = = This.newlinestartprefix) {//retreat one step long currentposition = In.tell ();                    In.seek (currentPosition-1);                Break            }} sb.append ((char) c);                            if (readChars >= maxlinelength) {Logger.warn ("line length exceeds Max ({}), truncating line!",                Maxlinelength);            Break        }} if (ReadChars > 0) {return sb.tostring ();        } else {return null; }    }

Here is a small episode, because the Source/sink has been customized for the sake of. I thought Deserializer could be customized in the same way. and specify the fully qualified name of the custom Deserializer in the agent's deserializer configuration. But after the verification found that the road does not go through, will be error (seemingly from the Flume official web site also can not find the introduction of Deserializer Custom). Therefore, you can only expand on the source code, and then compile the source code, regenerate the jar.

From the source you will find out why it is not feasible to extend Deserializer in a third-party package. From GitHub on the source under Clone, enter the following class of Flume-ng-core module: Org.apache.flume.serialization.EventDeserializerType, you will be at a glance:

Public enum Eventdeserializertype {line  (LineDeserializer.Builder.class),  MULTILINE ( MultiLineDeserializer.Builder.class),  AVRO (AvroEventDeserializer.Builder.class), other  (null);  Private final class<? Extends eventdeserializer.builder> Builderclass;  Eventdeserializertype (class<? extends eventdeserializer.builder> Builderclass) {    This.builderclass = Builderclass;  }  Public class<? Extends Eventdeserializer.builder> Getbuilderclass () {    return builderclass;  }}

You must explicitly define the Deserializer enumeration here, and then specify the class instance of its builder, and fill in the enumeration name you have here in the Deserializer configuration item in the agent. We just need to create a new Multilinedeserializer class in the sub-package:serialization and then re-implement the logic, compile, package Flume-ng-core module to generate a new jar. Flume the jar generated by each module in its source code is placed under the Lib folder of the binary package. You simply replace the repackaged Flume-ng-core jar with the original one and restart the agent to see the effect.

Here's one more thing to keep in mind: Linedeserializer has a parameter (Maxlinelength) that defines the maximum number of characters for a journal line. If a log exceeds this length, it will no longer be read. While a log occupies multiple lines, this value needs to be increased as the stack length of the Exception log is significantly longer than the normal log, where you can set it to 8192.

Extension of the log system Flume-linedeserializer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.