Flume-based Log collection system (i) Architecture and design
Question Guide:
1. Flume-ng and scribe contrast, flume-ng advantage in where?
2. What questions should be considered in architecture design?
3.Agent crash how to solve?
Does 4.Collector crash affect?
What are the 5.flume-ng reliability (reliability) measures?
The log collection system in the United States is responsible for the collection of all business logs of the United States Mission and provides real-time data flow to the Hadoop platform with off-line data and storm platform respectively. The log collection system of the American Regiment is based on flume design and construction.
The Flume log collection system will present the architecture design and actual combat experience of the American Journal Collection system in two parts.
The first part of architecture and design will focus on the overall architecture design of the log collection system and why it is designed.
The second part of the improvement and optimization, will focus on the actual deployment and use of the problems encountered in the process of flume do function modification and optimization.
1st Records Collection System Introduction
Log collection is the cornerstone of large data.
Many companies ' business platforms generate a lot of log data every day. Collecting business log data for offline and online analysis systems is exactly what the Log collection system is doing. High availability, high reliability and scalability are the basic features of the log collection system.
The current common open source log collection system has flume, scribe and so on. Flume is a highly available, highly reliable, distributed system of mass log capture, aggregation, and transmission, which is currently a subproject of Apache, Cloudera. scribe is Facebook's Open source Log collection system, which provides a scalable, highly fault-tolerant simple solution for distributed collection and processing of logs.
2 common open Source Log collection system comparison
The following is a comparison of the common open source log collection systems Flume and scribe. In contrast, Flume will mainly use Flume-ng under Apache as Reference object. At the same time, we divided the common Log collection system into three layers (agent layer, collector layer and store layer) to compare.
[TD]
Contrast Item Flume-ng Scribe
Using the language Java C + +
Fault-tolerant agent and collector, collector and store are fault-tolerant, and provide three levels of reliability assurance; There is fault tolerance between agent and collector, collector and store;
Load Balancing agent and collector between collector and store have loadbalance and failover two modes without
Scalability Good
Agent rich degree to provide rich agent, including avro/thrift socket, text, tail, etc. mainly thrift port
Store richness can be directly written HDFs, text, console, TCP, write HDFs support for text and sequence compression; Provides buffer, receptacle, file (HDFs, text), etc.
Code structure System Framework is good, the module is clear, easy to develop code simple
3 United States Mission Log Collection system architecture
The log collection system in the United States is responsible for the collection of all business logs of the United States Mission and provides real-time data flow to the Hadoop platform with off-line data and storm platform respectively. The log collection system of the American Regiment is based on flume design and construction. Currently collects and processes about T-level log data daily.
The image below is an overall framework for the log collection system of the United States.
A. The whole system is divided into three layers: agent layer, collector layer and store layer. wherein the agent layer each machine deploys a process, is responsible for the single machine log collection work; The collector layer is deployed on the central server, responsible for receiving the logs sent by the agent layer and writing the logs to the corresponding store layer according to the routing rules; The store layer is responsible for providing permanent or temporary log storage services or directing log flow to other servers.
B. Agent to collector Use the LoadBalance strategy to evenly send all logs to all collector to achieve load-balancing goals while addressing the problem of single collector failures.
C. The objectives of the collector layer are mainly three: Sinkhdfs, Sinkkafka and Sinkbypass. Provide off-line data to HDFS, and provide real-time log flow to Kafka and bypass. The Sinkhdfs is divided into sinkhdfs_b,sinkhdfs_m and sinkhdfs_s three sink according to the log amount, in order to improve the performance of writing to HDFs, as described later.
D. For the store, HDFS is responsible for storing all the logs permanently; Kafka stores the latest 7-day log and provides real-time log flow to the storm system; Bypass is responsible for providing real-time log flow to other servers and applications.
The following figure is the group's Log collection system module decomposition diagram, detailed agent, collector and bypass in the source, channel and sink relationship.
Module naming rules: All source starts with SRC, all channel begin with CH, all sink begin with sink;
B. Channel unified use of the Dualchannel for the development of the United States, the specific reasons for detailed, after the use of filtered log nullchannel, the specific reasons for detailed;
C. The internal communication between modules uses the Avro interface uniformly;
4 Architectural Design Considerations
The architecture below will be carefully parsed from usability, reliability, scalability, and compatibility.
4.1 Availability (availablity)
For a log collection system, availability (availablity) refers to the total period of time during which a system can run without fault during a fixed period. To improve the usability of the system, we need to eliminate the single point of the system and improve the redundancy of the system. Here's a look at the usability considerations of the U.S. Log collection system.
4.1.1 Agent dies.
Agent death is divided into two situations: the machine freezes or the agent process dies.
For machine crashes, the process of generating logs will also die, so no new logs will be generated and no service is available.
It does reduce the availability of the system when the agent process dies. In this case, we have the following three ways to improve the usability of the system. First, all agents start in a supervise manner, and if the process dies, the system reboots immediately to provide the service. Second, all the agents to survive monitoring, found that the agent died immediately alarm. Finally, for very important logs, it is recommended that you write the log directly to the disk, and the agent uses Spooldir to get the latest log.
4.1.2 Collector dead.
Because the hub server provides a peer-to-peer and undifferentiated service, the agent accesses the collector and makes a loadbalance and retry mechanism. So when a collector fails to provide a service, the agent's retry policy sends the data to the other available collector. So the entire service is unaffected.
4.1.3 HDFs Normal Downtime
We provide the switch option in the collector Hdfssink to control collector stop writing HDFs and cache all events to the FileChannel function.
4.1.4 HDFs abnormal downtime or inaccessible
If HDFs abnormal downtime or inaccessible, at this time collector cannot write HDFs. Because we use dualchannel,collector, we can cache the events received to FileChannel, save them on disk, and continue to provide services. Once the HDFs is restored, the cached events in the FileChannel are then sent to the HDFs. This mechanism is similar to scribe and can provide better fault tolerance.
4.1.5 collector slows down or agent/collector network slows down
If the collector process slows down (for example, machine load is too high) or the network between Agent/collector slows, it may cause the agent to send to collector slowly. Similarly, in this case, we use dualchannel,agent on the agent side to cache the events received to FileChannel, save on disk, and continue to provide services. Once the collector is restored, the cached events in the FileChannel are sent to collector.
4.1.6 HDFs slows down
When Hadoop has more tasks and has a large number of read and write operations, HDFs read and write data is often slow. This is common because there are peak usage periods every week.
We also use Dualchannel to solve the problem of HDFs slowing. When HDFs writes faster, all events pass data only through Memchannel, reducing disk IO and getting higher performance. When HDFs writes slowly, all events pass data only through FileChannel and have a large data cache space.
4.2 Reliability (Reliability)
For the Log collection system, the reliability (reliability) is the flume in the data stream transmission process, to ensure the reliable transmission of events.
For flume, all events are stored in the agent's channel and then sent to the next agent in the data stream or to the final storage service. So when is the events in the channel of an agent deleted? When and only if they are saved to the next agent's channel or saved to the final storage service. This is the most basic single hop message delivery semantics that Flume provides point-to-point reliability in data streams.
So how does flume achieve the most basic message-passing semantics?
First, transaction exchange between agents. Flume uses transactions to ensure reliable delivery of event. Source and sink are encapsulated in transactions, which are provided by the store that holds the event or provided by the channel. This ensures that event is reliable in point-to-point transmission of data streams. In a multilevel data stream, the following figure, the sink at the top level and the source at the next level are included in the transaction, ensuring that the data is reliably transferred from one channel to another channel.
Second, the persistence of channel in the data stream. In Flume, Memorychannel is likely to lose data (when the agent dies), and FileChannel is persistent, providing a log mechanism similar to MySQL to ensure that data is not lost.
4.3 Scalability (scalability)
For a log collection system, scalability (scalability) refers to the linear expansion of the system. When the log volume increases, the system can increase the machine to achieve linear expansion.
For Flume-based log collection system, it is necessary to provide services on a linear scale in every layer of design. The extensibility of each layer is described below.
4.3.1 Agent Layer
For the agent layer, each machine deploys an agent that can be scaled horizontally and unrestricted. One aspect, the agent's ability to collect logs is limited by the performance of the machine, and normally an agent can provide enough service for a single computer. On the other hand, if the machine is more, may be limited to the service provided by the backend collector, but the agent to collector is the load balance mechanism, so that the collector can be linearly expanded to improve the ability.
4.3.2 Collector Layer
For the collector layer, the agent to the collector has the load balance mechanism, and the collector provides the undifferentiated service, so it can be linearly expanded. Its performance is largely limited by the ability of the store layer to provide.
4.3.3 Store Floor
For the store layer, HDFs and Kafka are distributed systems that can be scaled linearly. Bypass is a temporary application, only corresponding to a certain type of log, performance is not a bottleneck.
Selection of 4.4 Channel
In Flume1.4.0, the official Memorychannel and FileChannel are available for you to choose from. The advantages and disadvantages are as follows:
Memorychannel: All events are kept in memory. The advantage is high throughput. The disadvantage is that the capacity is limited and the agent dies and loses the data in memory.
FileChannel: All events are kept in the file. The advantage is that the data can be recovered when the capacity is large and dead. The disadvantage is that the speed is slow.
The above two kinds of channel, the advantages and disadvantages are opposite, each has its own suitable scene. However, for most applications, we want channel to provide high throughput and large caching. Based on this, we developed the Dualchannel.
Dualchannel: Based on Memorychannel and FileChannel development. When the number of events in the channel is less than the threshold, all events are kept in Memorychannel and sink read from Memorychannel; When the number of events stacked in channel is greater than the threshold, all events are automatically stored in FileChannel, sink read from FileChannel. In this way, when the system is running normally, we can use the high throughput characteristic of the Memorychannel, when the system has the exception, we can take advantage of FileChannel's large cache characteristic.
4.5 and scribe Compatible
At the beginning of the design, we asked each type of log to have a category counterpart, and the Flume agent provided Avrosource and scribesource two kinds of services. This will keep up with the previous scribe and reduce the cost of changing the business.
4.6 Rights control
In the current log collection system, we use only the simplest permissions control. Only the set of category can enter the storage system. So the current authority control is category filtering.
If the permission control is placed on the agent side, the advantage is that it can control the garbage data flow in the system better. But the disadvantage is configuration modification trouble, each additional log will need to restart or overload agent configuration.
If the rights control is placed on the collector side, the advantage is to facilitate configuration modification and loading. The disadvantage is that partially unregistered data may be transferred between Agent/collector.
Considering that the log transfer between agent/collector is not a system bottleneck, and the current log collection is an internal system, security issues are secondary issues, so choose to use collector-end control.
4.7 Provide real-time streaming
Some of the business, such as real-time referral, anti-reptile services and other services, need to deal with real-time data flow. So we want Flume to be able to export a real-time stream to the Kafka/storm system.
A very important requirement is that the real-time data stream should not be affected by the speed of other sink to ensure the speed of the real-time data stream. This, we are isolated by setting different channel in the collector, and Dualchannel's large capacity ensures that the processing of the log is not affected by the sink.
5 system Monitoring
Monitoring is an essential part of a large, complex system. Design reasonable monitoring, can be found in time, as long as there is a mobile phone, you can know whether the system is functioning properly. For the U.S. team's log collection system, we have established a multi-dimensional monitoring to prevent unknown anomalies.
5.1 Transmission speed, congestion, write HDFs speed
By sending the data to the Zabbix, we can map out the number, congestion and write HDFs speed of the chart, for the unexpected congestion, we will report to the police to find out why.
Here is a screenshot of the Flume Collector Hdfssink write data to HDFs:
Here is a screenshot of the Flume collector's FileChannel Events data:
5.2 Flume write Hfds State monitoring
Flume write to HDFs as TMP file, for particularly important log, we will check every 15 minutes or so each collector has produced TMP file, for the collector and logs that do not normally produce TMP files we need to check for anomalies. This can be found in time flume and log exceptions.
5.3 Log size anomaly monitoring
For important logs, we will monitor the log every hour of the week is a large fluctuation, and to give reminders, this alert effectively found the exception of the log, and many times found the application of the log sent the exception, timely give the other side feedback, help them to repair their own system early abnormalities.
Through the above explanation, we can see that the flume-based American log collection system is a distributed service with high availability, high reliability and scalability.
Improved and optimized
Question Guide:
What's the problem with 1.Flume?
2. What features are added to the flume based on open source?
How to tune the 3.Flume system?
1 Flume Summary of problems
The main problems encountered during the use of Flume are as follows:
A. Channel "acclimatized": the use of fixed size memorychannel at the peak of the log often reported the size of the queue is not enough abnormal, the use of FileChannel caused IO busy problem;
B. Hdfssink performance problem: Use Hdfssink to HDFs write log, at peak time slower;
C. System management issues: Configuration upgrades, module restart, etc.
2 flume function improvement and optimization point
As you can see from the above questions, there are some requirements that native flume cannot satisfy, so we have added a lot of functionality, modified bugs, and tuned in flume based on open source. Some key aspects are described below.
2.1 Increase Zabbix Monitor Service
On the one hand, flume itself provides HTTP, ganglia monitoring services, and we are currently mainly using Zabbix to do monitoring. As a result, we have added a Zabbix monitoring module for Flume and seamless integration with SA's monitoring service.
On the other hand, purify the flume metrics. Just send the metrics we need to Zabbix to avoid the pressure of Zabbix server. At present, we are most concerned about whether Flume can promptly send the application end of the log to the HDFs, corresponding to the attention of the metrics:
Source: Number of event received and event processed
Event number of congestion in Channel:channel
Sink: Number of event processed
2.2 Add the index function automatically for Hdfssink
First of all, our Hdfssink write to Hadoop files using Lzo compressed storage. Hdfssink can read the list of encoded classes provided in the Hadoop configuration file, and then get a configuration of what compression code to use, and we currently use Lzo to compress the data. Using Lzo compression rather than bz2 compression is based on the following test data:
Event Size (Byte) sink.batch-size hdfs.batchsize compressed format total data size (g) time consuming (s) average events/s compressed size (g)
544 10000 bz2 9.1 2448 6833 1.36
544 10000 Lzo 9.1 612 27333 3.49
Second, our hdfssink adds the ability to create index automatically after creating Lzo files. Hadoop provides an index to the Lzo so that the compressed files are removable so that the Hadoop job can process the data files in parallel. Hdfssink itself Lzo compression, but the Lzo file is not indexed, and we add the indexing function after the close file.
/**
* Rename Bucketpath file from. tmp to permanent location.
*/
private void Renamebucket () throws IOException, Interruptedexception {
if (Bucketpath.equals (TargetPath)) {
Return;
}
Final Path Srcpath = new Path (Bucketpath); Final Path Dstpath = new Path (TargetPath); Callwithtimeout (New Callrunner () {@Override public Object call () throws Exception {if (filesystem.exists (Srcpath)) {// Could block Log.info ("renaming" + Srcpath + "to" + Dstpath); Filesystem.rename (Srcpath, Dstpath); Could block//index the Dstpath lzo file if (CodeC!= null && ". Lzo". Equals (Codec.getdefaultextension ())) { Lzoindexer lzoindexer = new Lzoindexer (new revisit ()); Lzoindexer.index (Dstpath); return null; } });
}
2.3 Add Hdfssink Switch
We add switches in Hdfssink and Dualchannel, and when the switch is turned on, the Hdfssink no longer writes to the HDFs, and the data is written only to Dualchannel in FileChannel. This strategy prevents HDFs from maintaining normal downtime.
2.4 Increase Dualchannel
Flume itself provides Memorychannel and FileChannel. Memorychannel processing speed is fast, but the cache size is limited, and not persistent; FileChannel is just the opposite. We want to use the advantages of both, in sink processing fast enough, channel did not cache too many logs, the use of Memorychannel, when the sink processing speed, and need to channel can cache the application sent over the log, Using FileChannel, we developed a dualchannel that can intelligently switch between two channel.
Its specific logic is as follows:
/*
* Puttomemchannel indicate put event to Memchannel or FileChannel
* Takefrommemchannel indicate take event from Memchannel or FileChannel
* */
Private Atomicboolean Puttomemchannel = new Atomicboolean (true);
Private Atomicboolean Takefrommemchannel = new Atomicboolean (true);
void DoPut (Event event) {
if (Switchon && puttomemchannel.get ()) {
Write data to Memchannel
Memtransaction.put (event);
if (Memchannel.isfull () | | filechannel.getqueuesize () >) {Puttomemchannel.set (false);} else {//write data to FileChannel filetransaction.put (event);}
}
Event Dotake () {
Event event = null;
if (Takefrommemchannel.get ()) {
Fetching data from Memchannel
event = Memtransaction.take ();
if (event = = null) {
Takefrommemchannel.set (FALSE);
}
else {
Fetching data from FileChannel
event = Filetransaction.take ();
if (event = = null) {
Takefrommemchannel.set (TRUE);
Puttomemchannel.set (TRUE); } return event;
}
2.5 adds Nullchannel
Flume provides nullsink to discard unwanted logs directly through Nullsink and not store them. However, source needs to first store events in channel, Nullsink and then remove events. To improve performance, we moved this step into channel, so we developed nullchannel.
2.6 Add Kafkasink
to support real-time data flow to storm, we added Kafkasink to write real-time data streams to Kafka. Its basic logic is as follows:
public class Kafkasink extends Abstractsink implements configurable {
Private String zkconnect;
Private Integer zktimeout;
Private Integer batchsize;
Private Integer queuesize;
Private String Serializerclass;
Private String Producertype;
Private String Topicprefix;
Private Producer Producer; public void Configure {//Read configuration and check configuration} @Override public synchronized void Start () {//Initialize producer} @ Override public synchronized void Stop () {//close producer} @Override public Status process () throws Eventdeliveryexception { Status status = Status.ready; Channel Channel = Getchannel (); Transaction tx = Channel.gettransaction (); Obtain the category from the header, generate the topic, and store it in the map above;//Send the data in map through producer to Kafka Tx.commit (); catch (Exception e) {tx.rollback (); throw new Eventdeliveryexception (e);} finally {tx.close ();} return status; }
}
2.7 fixes and scribe compatibility issues
scribed packets that are larger than 4096 bytes when sent via Scribesource send packets to Flume, the response of a dummy packet check server is sent first. And Flume's Scribesource for Logentry.size () =0 packet return Try_later, at this time scribed think error, disconnect. This loop tries again and again, unable to actually send the data. Now, in the thrift interface of Scribesource, return OK for size 0 to ensure that the data is sent back normally.
3. Flume System Tuning Experience
3.1 Basic parameter tuning experience
Hdfssink The default serializer adds a line break at the end of each line, and we have a newline character in the log itself, which results in a blank line behind each log. Modify the configuration do not automatically add line breaks,
Lc.sinks.sink_hdfs.serializer.appendNewline = False
to increase the capacity of the Memorychannel, Make full use of Memorychannel fast processing capability;
Increase the Hdfssink batchsize, improve throughput, reduce hdfs flush times;
appropriately increase hdfssink calltimeout to avoid unnecessary timeout errors ;
3.2 Hdfssink Gets the optimizations for filename
Hdfssink The path parameter indicates where the log is written to HDFs, which can refer to the formatted parameter and write the log to a dynamic directory. This facilitates the management of the log. For example, we can write logs to the directory of category categories and store them by day and by hour:
Lc.sinks.sink_hdfs.hdfs.path =/user/hive/work/orglog.db/%{category}/dt= %y%m%d/hour=%h
HDFSS Ink to handle each event, obtain the HDFS path and filename to which this event should be written based on configuration, and the default method is to replace the variable in the configuration with a regular expression. Get the real path and filename. Because this process is what each event does, it takes a long time. Through our test, 200,000 logs, this operation will take 6-8s or so.
because our current path and filename have a fixed pattern, they can be obtained through string concatenation. The latter is dozens of times times faster than the regular match. The way of stitching up a string, the operation of 200,000 logs takes only hundreds of milliseconds.
3.3 hdfssink b/m/s optimization
In our initial design, all the logs are written to hdfssink through a channel and a HDFS. Let's see what's wrong with this.
First, let's take a look at the events
for (txneventcount = 0; Txneventcount < Hdfssink in the logic of sending data:
//batchsize size from channel. BatchSize; txneventcount++) {
//category append to corresponding bucketwriter per log;
Bucketwriter.append (event);
}
for (Bucketwriter bucketwriter:writers) {
Then call the corresponding flush method on each bucketwriter to flush the data to HDFs
Bucketwriter.flush ();
}
Suppose our system has 100 category,batchsize sizes set to 200,000. For every 200,000 data, 100 files need to be append or flush operations.
Secondly, for our log, basically conforms to the 80/20 principle. That is, 20% of the category produces a system 80% log amount. So for most of the log, each 200,000 may contain only a few logs, also need to hdfs on the flush once.
The above situation will lead to hdfssink write HDFs efficiency extremely poor. The following figure is a channel-per-hour dispatch and write HDFs time trend chart.
In view of this practical application scenario, we put the log to the size of the classification, divided into big, middle and Sgt three categories, so that it can effectively avoid small log with large log with frequent flush, improve the effect is obvious. The following figure is the hourly volume of the big queue after the queue and the time trend chart for writing HDFs.
Flume (NG) custom sink implementation and property injection
Question Guide:
1. How to achieve the flume end of a custom sink, to follow our rules to save the log?
2. How do I configure a RootPath value to be obtained from a flume configuration file?
Recently you need to use flume to collect remote logs, so learn some flume most basic usage. This is only recorded here.
The whole idea of remote log collection is remote customization implementation log4j Appender send the message to the Flume end, flume the end of the custom implementation of a sink to follow our rules to save the log.
Custom Sink Code:
public class Localfilelogsink extends Abstractsink implements configurable {
private static final Logger Logger = loggerfactory
. GetLogger (Localfilelogsink Class);
private static final String Prop_key_rootpath = "RootPath";
Private String RootPath;
@Override
public void Configure {
String RootPath = context.getstring (Prop_key_rootpath);
Setrootpath (RootPath);
}
@Override public Status process () throws Eventdeliveryexception {logger. Debug (' do process ');}
}
Implement the configurable interface to get the values of the configured parameters from the context when initialized, using the Configure method. Here, we want to get the RootPath value from the Flume configuration file, which is the root path of the log save. The following are configured in Flume-conf.properties:
Agent.sinks = Loggersink
Agent.sinks.loggerSink.rootPath =./logs
Loggersink is the name of the custom sink, we take the value of the key, only need to loggersink the back of the part, that is, the rootpath here.
The execution of the actual business logic is achieved by inheriting the process method in the replication Abstractsink. Gets the channel from the Getchannel method of the base class, from which event processing can be removed.
Channel ch = getchannel ();
Transaction txn = Ch.gettransaction ();
Txn.begin ();
try {
logger. Debug ("Get event.");
Event event = Ch.take ();
Txn.commit ();
Status = status. READY;
Return status;
}finally {
Log. info ("Trx close.");
Txn.close ();
}