===================================
(Next: Architecture Design: Inter-system Communication (32)--Other messaging middleware and scenario applications (2))
5-7, Solution Three: non-invasive scheme
In the above two scenarios, in order to enable the business system to integrate the log capture function, we need more or less to write some code on the business system side. Although the design of some code structures can reduce or even completely isolate the coupling of these code and business code, it is necessary for the business development team to expend the effort to maintain the code, and the business system deploys the business to adjust the configuration information of the code accordingly.
Here we will introduce a non-intrusive log acquisition scheme for readers. We all know that when a business system is accessed, there are some traces of access. Similarly, for example, when a visitor opens a "Product Details" page (the URL is a), then the Nginx access log will have the corresponding 80 port log, if the "Product Details" information is not completely static, Then the code that works on the business service will also output the appropriate access information on the log4j file (if the developer uses log4j). All we have to do is find a software that collects these log messages and stores them in the right place so that the data analysis platform can then use that data for analysis.
Of course, in order to ensure that the log information has a complete original attribute, the business system developers and operations personnel should coordinate in advance a mutually acceptable log description format, as well as log file storage location and storage rules and other information.
5-7-1, Flume Introduction
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data Model, allows for online analytic application.
The above text is quoted from Apache Flume official website (http://flume.apache.org/). The main idea is that Flume is a distributed, highly reliable, high-availability service for efficiently collecting and summarizing log data. Its architecture is based on data flow, simple and flexible ... The non-intrusive log capture scheme we are going to introduce is based on the Apache Flume implementation.
Apache Flume is very, very simple, and the official user manual is sufficient for you to understand how it is used and how it works (http://flume.apache.org/ flumeuserguide.html), so this article does not specifically describe the installation and basic use of flume, and try to integrate the use of flume into the example of interpretation. If you want to learn more about the design implementation of Flume, I still build your reading flume source code, in its official website user documentation has given a few key implementation classes, through these implementation classes can be inverted flume use of various design patterns:
5-7-2, program Design
Flume and the Business services system work independently on the physical server and are two independent processes at the operating system level, with no association. Flume Data flow monitoring only for the file system on the operating system, the specified network port, or the RPC service (known as source in Flume). When the specified file, the specified network port, or the specified RPC service has new data generated, flume will transfer these data to the specified location (referred to as sink in flume) in a pre-configured configuration. The specified location can be a network address, a file system, or another software. The data flow transmission between source and sink is called the channel.
The official website from Apache Flume is an example of source and sink in flume. In this example, Flume uses an HTTP Source to receive data from an externally transmitted HTTP protocol, and the Sink end of the flume uses HDFs Sink to write the data obtained from the channel into HDFs. So based on the Apache flume working characteristics described above, we use the following ideas for the design of the log acquisition scheme three:
- Source and collection of log data
The business system works on 140, 141, 1423 physical nodes and produces log4j files. Of course, you can also use the native log files of services such as JBoss, Tomcat, as the log data source directly. In some cases, we need to analyze the HTTP request on the proxy service such as Nginx, then we can use Nginx Access.log file as the source of the log data. You can also monitor multiple files at the same time on each physical node, depending on your design needs .
On 140, 141, and 1423 physical nodes, Apache Flume were also installed. Their task is the same, that is, to read the data changes from the specified log file to be monitored and sent to the specified sink by the configured channel. In the settings are both monitoring the changes of the log4j file, using thrift RPC over the channel to transfer to port 6666 on the remote server 192.168.61.138.
The physical node 192.168.61.138 is responsible for collecting log data information from 140, 141, and 1423 physical nodes through thrift RPC transfer to 6666 ports. and through channel transmission to the appropriate storage scenario, these appropriate storage scenarios may be HDFs, possibly an MQ or some kind of object storage system (such as ceph), or even a local file system.
5-7-3, program configuration process
As explained above, the main task of the Apache flume on the 192.168.61.140 physical node is to monitor the log4j log file of the business service, which is sent to the specified sink by the channel already configured in the flume when new data is generated by the journal file. The configuration information is as follows:
Agent. Sources= S1agent. Channels= C1agent. Sinks= T1# source ===========================# Log4j.log file changes will be used as the source of the FlumeAgent. Sources. S1. Type= Execagent. Sources. S1. Channels= C1agent. Sources. S1. Command= Tail-f/logs/log4j. Log# Channel ==================================# Connect the channels of source and sinkAgent. Channels. C1. Type= Memoryagent. Channels. C1. Capacity= +# sink T1 ===================================# The data sent through the channel will be sent through the thrift RPC call to 6666 ports on the 138 nodeAgent. Sinks. T1. Type= Thriftagent. Sinks. T1. Channel= C1agent. Sinks. T1. Hostname=192.168.. 138Agent. Sinks. T1. Port=6666
The 192.168.61.141 and 192.168.61.142 two physical nodes also host the business service, and the business service will output the logs to the same log4j location. So the configuration of Apache Flume on these two nodes is consistent with the configuration of Apache Flume in the above 140 physical nodes. There is no longer a repeat of the configuration of the other two physical nodes.
It is also important to note that the Linux Tail command is configured for Agent.sources.s1.command. The tail command can show changes to the current file, and if you have only the-f parameter, you are monitoring the file changes starting at the last 10 lines at the end of the file. If this is configured, then when Flume starts, it is assumed that the existing 10 rows in the log4j file are recorded as newly received log data, resulting in a false send .
To work around this problem, you can use the-n parameter and specify that the file changes to be monitored from the very end of the file:
# 应该使用0 /logs/log4j.log# 注意:tail -f /logs/log4j.log 命令相当于:# tail -f -n 10 /logs/log4j.log
192.168. The flume on the 61.138 node is used to collect the log data transmitted by the 140-142 node via thrift RPC. After this data is collected, it will be stored in the appropriate location by the flume on the 138 node. These locations can be hdfs,hbase, local file systems, Apache Kafka, and so on.
Agent. Sources= S1agent. Channels= C1agent. Sinks= T1# Thrift ==================# Use Thrift RPC to listen on the 6666 port of the node in order to receive dataAgent. Sources. S1. Type= Thriftagent. Sources. S1. Channels= C1agent. Sources. S1. Bind=0.0. 0. 0Agent. Sources. S1. Port=6666# sink HDFs ==============# agent.sinks.t1.type = HDFs# Agent.sinks.t1.channel = C1# Agent.sinks.t1.hdfs.path = hdfs://ip:port/events/%y-%m-%d/%h%m/%s# agent.sinks.t1.hdfs.filePrefix = events-# Agent.sinks.t1.hdfs.round = True# agent.sinks.t1.hdfs.roundValue = Ten# agent.sinks.t1.hdfs.roundUnit = Minute# sink=====================# In order to detect the entire configuration is correct, you can first output to the consoleAgent. Sinks. T1. Type= Loggeragent. Sinks. T1. Channel= C1# channel=================Agent. Channels. C1. Type= Memoryagent. Channels. C1. Capacity= +
In the above configuration file, in order to see if the configuration of this acquisition system is successful, we will output it as sink in the flume console. The information for the comment is the configuration of HDFs as a sink.
5-8, Solution three optimization
In the previous section of solution three, the weakest location is the 138 nodes that assume the log Data Rollup task . There is only one such summary node in the entire log collection main schema, and once the 138 node is down for various reasons the main schema will crash. Even if the 138 node is able to work stably, it is also very likely to become a performance bottleneck because the 138 nodes assume the data logs from multiple physical nodes at the same time. So we need to find a solution to the weak position in the three.
5-8-1, Flume supported high-availability mode
Fortunately, Apache Flume provides us with a very simple and practical high-availability mode: load_balance mode and failover mode. Both of these modes of work describe how multiple sink work together:
This mode of operation provides multiple sinks load balancing capabilities. Load_balance maintains a list of active sinks, based on this list, using the Round_robin (polling schedule) or random (random) selection mechanism (default: Round_robin), to the Sinks collection. Basically, these two options are sufficient, and if you have specific requirements for scheduling, you can implement a custom selection mechanism by inheriting the Abstractsinkselector class.
This mode of operation provides multiple sinks failover capabilities. Failover maintains a list of two sinks, Failover list and live list, in Failover mode, Flume will prioritize the highest priority sink as the primary sending target. When this sink fails continuously, flume will move the sink into the failover list and set a freeze time. After this freezing time, Flume will attempt to send data using this sink, and once sent, the sink will be moved back into the live list.
To ensure that performance pressures can be shared for the data aggregation nodes, we use the load_balance mode to further illustrate the optimization of the data summary node.
5-8-2, using load_balance mode
As you can see from the optimization method in scenario three, we use a new node (192.168.61.139) together with the original 138 nodes to form a set of load nodes, which collectively assume the task of summarizing the log data. Then the front-end log monitoring node (140, 141, 1423 nodes) also need to make corresponding configuration file modification.
5-8-3, load_balance configuration process
- Modify 192.168.61.140 nodes
Agent. Sources= S1agent. Channels= C1# Two sets of sinkAgent. Sinks= LT1 Lt2agent. Sinkgroups= G1# source ===========================# Data source or new data from the log4j log fileAgent. Sources. S1. Type= Execagent. Sources. S1. Channels= C1agent. Sources. S1. Command= Tail-f-N0/log/log4j. Log# sink LT1 ===================================Agent. Sinks. Lt1. Type= Thriftagent. Sinks. Lt1. Channel= C1agent. Sinks. Lt1. Hostname=192.168.. 138Agent. Sinks. Lt1. Port=6666# sink LT2 ==================================Agent. Sinks. Lt2. Type= Thriftagent. Sinks. Lt2. Channel= C1agent. Sinks. Lt2. Hostname=192.168.. 139Agent. Sinks. Lt2. Port=6666# Channel ==================================Agent. Channels. C1. Type= Memoryagent. Channels. C1. Capacity= +# Sinkgroup ===============================# Two SINK:LT1 LT2 set into a set of sink. and use Load_balance mode to workAgent. Sinkgroups. G1. Sinks= LT1 Lt2agent. Sinkgroups. G1. Processor. Type= Load_balanceagent. Sinkgroups. G1. Processor. Backoff= Trueagent. Sinkgroups. G1. Processor. Selector= Random
The configuration of the 141 and 1422 log data monitoring nodes is consistent with the configuration of the 140 nodes, so the same is no longer mentioned.
- Added 192.168.61.139 nodes
Agent. Sources= S1agent. Channels= C1agent. Sinks= T1# thrift==================Agent. Sources. S1. Type= Thriftagent. Sources. S1. Channels= C1agent. Sources. S1. Bind=0.0. 0. 0Agent. Sources. S1. Port=6666# sink=====================Agent. Sinks. T1. Type= Loggeragent. Sinks. T1. Channel= C1# channel=================Agent. Channels. C1. Type= Memoryagent. Channels. C1. Capacity= +
The configuration information for the flume on the new 139 node is consistent with the flume configuration information on the original 138 node. This ensures that no matter which node the log data is sent to, it can be stored correctly.
5-9 limitations of Scenario Three
There are limitations in the three-log acquisition scheme: This scheme is not suitable for open log collection system . In other words, if your log collection system needs to be like "Baidu Webmaster Statistics Tool", from the beginning of the design goal is to publish to the Internet on the use of various sites. Then this is based on the operating system log changes, and the use of third-party software to complete the acquisition process of the architecture scheme is not applicable.
In addition, scenario three we use thrift RPC for network communication. This method can be used in a real production environment, but more configuration item designations are required. The following two link addresses are the configuration properties that can be used when using thrift as source and sink, respectively.
Http://flume.apache.org/FlumeUserGuide.html#thrift-source
Http://flume.apache.org/FlumeUserGuide.html#thrift-sink
In addition to thrift RPC, the author also recommends the use of Avro.
6, Scene application--online game: Bullet Ballistic Log function
TODO This is a button and subsequent articles will talk about
7, the following introduction
After "Architecture design: Inter-system Communication (--MQ): Message Protocol (top)" 14 articles, we basically introduced the basic knowledge of Message Queuing and the use of actual combat. We turn to the knowledge of ESB Enterprise service bus from below.
Architecture Design: Inter-system Communication (33)--Other message middleware and scenario applications (3)