First, Logstash
Logstash: It is a flexible data transmission and processing system that is responsible for the collection before the beats comes out. Logstash's task is to put all kinds of data, through the configuration of conversion rules, unified into the Elasticsearch. The Logstash developed with Ruby is a great flexibility. But performance has always been a problem to be criticized.
Because Logstash is not good at data collection, and as an agent, performance is not standard. Elastic released the Beats series of lightweight collection components. At this point, elastic formed a complete ecosystem chain and technology stack, become the big data market leader.
Second, Elastic Stack Beats series
Beats is a lightweight product in the ELK stack technology stack that is responsible for single-purpose data acquisition and is pushed to Logstash or Elasticsearch.
Beats Architecture:
Beats is a platform built using Golang, Libbeat is its core library to provide APIs for connection to Elasticsearch,logstash, as well as to configure input features and implement information collection. It encapsulates an output module (publisher), which can be responsible for sending the collected data to Logstash or Elasticsearch. Because the go language is designed with a channel, the logical code that collects the data and Publisher is communicated through the channel, the least of the coupling degree. Therefore, the development of a collector, completely do not need to know the existence of Publisher, the program runs naturally "magical" data sent to the server. In addition, it also encapsulates the functions of profile processing, log processing, and daemon, so that developers can expand their beats capabilities.
Beats is a group of lightweight collection programs that we typically use several of the following:
1) Filebeat: File and directory collection, mainly used to collect log data.
2) Metricbeat: To carry out the indicator collection, the indicator can be system, can also be a lot of middleware products, mainly used to monitor the performance of the system and software.
3) Packetbeat: Through the network packet Capture, protocol analysis, some of the request-response system communication monitoring and data collection, you can collect a lot of conventional ways to collect information.
4) Winlogbeat: Data acquisition specifically for Windows event log.
5) Heartbeat: Inter-system connectivity detection, such as ICMP, TCP, HTTP and other system connectivity monitoring.
6) You can generate your own beats through the beats generator
1. Filebeats
Filebeat is built on beats, and is used in the implementation of the log collection scenario to replace the Logstash forwarder 's next generation Logstash collectors for faster and more stable, lightweight, and low-consumption collection work. It can be conveniently connected to the Logstash and directly with the Elasticsearch .
1) Robustness
Filebeat the location where the last stop will continue after an abnormal interrupt restart. (log offset is recorded by ${filebeat_home}\data\registry file)
2) intelligent adjustment of transmission speed, prevent logstash, ES overload
Filebeat uses the pressure-sensitive protocol (backpressure-sensitive) to transmit data, filebeat slows down the read-transfer speed when Logstash is busy, and Logstash restores the original speed once the filebeat is restored.
2. Metricbeat
Metricbeat is a lightweight system-level performance metrics monitoring tool. Collect metrics for various services such as CPU, memory, disk, etc. system metrics and Redis,nginx.
1) by deploying metricbeat on LINUX,WINDOWS,MAC, you can collect statistics such as CPU, memory, file system, disk IO, and network IO.
2) Support the acquisition of Apache, NGINX, MongoDB, MySQL, PostgreSQL, Redis, and zookeeper services indicators. 0 dependencies, only need to be enabled in the configuration file
3) If you use Docker to manage your services. A single metricbeat container can be used on the host, and he collects statistics about each container on the Docker host by reading the cgroups information directly from the proc file system. No special permissions are required to access the Docker API
4) Metricbeats is a member of the Elk Stack family and can work seamlessly with elk. For example, use Logstash to process data two times, analyze with Elasticsearch, or create and share dashboards with Kibana.
3. Packetbeat
Packetbeat is a lightweight network packet analysis tool. Packetbeat can analyze the network interaction of the application by grasping the packet. and send the captured data to Logstash or Elasticsearch.
1) packetbeat easy real-time monitoring and parsing of network protocols such as HTTP. To understand how traffic is passing through your network. Packetbeat is passive, does not increase latency overhead, no code intrusion, and does not interfere with other infrastructure.
2) Packetbeat is a library that supports a variety of application layer protocols such as HTTP, DNS, Mysal, ICMP, Postgres, Redis, and more.
3) Packetbeat allows you to capture packets on the target server in real time-decode-GET request and response-expand fields-Send the results in JSON format to Elasticsearch.
4) Packetbeat is a member of the Elk Stack family and can work seamlessly with elk. For example, use Logstash to process data two times, analyze with Elasticsearch, or create and share dashboards with Kibana.
4. Winlogbeat
Winlogbeat is a lightweight Windows event log Collection tool. Send Windows events to Elasticsearc H or Logstash
If you have a Windows server, you can actually see a lot of things from the Windows event log. For example, login (4624), Login failed (4625), plug in the USB portable device (4663) or the newly installed software (11707). The winlogbeat can be configured to read from any event log channel and provide the raw event data in a structured format. Makes it easy to filter and aggregate results by Elasticsearch.
Winlogbeat is a member of the Elk Stack family, and can work seamlessly with elk. For example, use Logstash to process data two times, analyze with Elasticsearch, or create and share dashboards with Kibana.
5. Heartbeat
Heartbeat is a heartbeat detection tool that mainly monitors the availability of services. Monitor whether a given address is available (official website: For a given list of URLs,Heartbeat asked, is he alive?) Live a squeak ... ) can be further analyzed in conjunction with other Elk stack products
1) Whether you are testing the same host service or other network services, heartbeat can easily generate uptime and response time data. And modifying the configuration does not require a restart heartbeat
2) heartbeat is ping via icmp,tcp, and HTTP, also supports TLS, authentication (authentication), and proxy (proxies). Because of the simple DNS resolution, you can monitor all load-balanced services (original: You can monitor all the hosts behind a load-balanced server thanks to simple DNS resolution)
3) Today's infrastructure, services and hosts are often dynamically tuned. Heartbeat can modify the configuration file after automatic loading (original: Heartbeat makes it easy to automate the process of adding and removing monitoring targets via a Simple, file-based interface.)
4) Heartbeat is a member of the Elk Stack family and can work seamlessly with elk. For example, use Logstash to process data two times, analyze with Elasticsearch, or create and share dashboards with Kibana.
6. Create a Beat of your own
You can use the Beats generator to generate your own beats based on official documentation
Https://www.elastic.co/cn/blog/build-your-own-beat
Third, Fluentd
FLUENTD is a fully open source, free log information collection software that supports log information collection for more than 125 systems. Its architecture diagram
Fluentd can be divided into two modules: client and server. The client is a program installed on the captured system to read information such as log files and send them to the FLUENTD server. The server is a collector. On the FLUENTD server, we can configure it so that it can filter and process the collected data and eventually route to the next hop. The next hop can be a database for storage, such as MongoDB, Amazon S3, or other data processing platforms, such as Hadoop.
1. Install & Start
Because the installation of FLUENTD is more troublesome, the stable installation version of the industry popular is actually treasure Data company provides td-agent
Curl-lhttps://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh|sh
The FLUENTD service can be started, shut down, and restarted through commands such as Start, stop, restart, and so on. The default FLUENTD configuration file directory is the/etc/td-agent/td-agent.conf file.
2. Post Sample Logs via HTTP
By default, the/etc/td-agent/td-agent.conf file already has a basic configuration for td-agent. You can receive data that is post via HTTP and route and write it to/var/log/td-agent/td-agent.log.
You can try the post data using the following Curl command.
$curl-xpost-d ' json={"JSON": "Message"} ' Http://localhost:8888/debug.test
Once executed, the test data we entered can be found on the last line of the output log.
3. Syntax of Config
In Fluentd, the configuration file is very important, and it defines what the FLUENTD should do.
Open the/etc/td-agent/td-agent.conf file to see the details of the configuration file. The basic configuration files appear in the following categories:
Source: Define Input
Match: Defines the target of the output, such as writing to a file, or sending to a specified location.
Filter: Filtering, also known as the event processing pipeline, can run between input and output.
System: The settings at the systems level.
@include: Introduce other files similar to Java, Python import.
1) Source: Define Input
FLUENTD supports multiple inputs. Each input configuration must contain type/type, such as TCP data input, or HTTP type input. Type will specify the input plugin to use. In the following example, two input sources are defined, one is the TCP traffic that enters from Port 24224, and the other is HTTP data entered from Port 9880.
# Receive events from 24224/tcp#-used by log forwarding and the Fluent-cat Command@type forward Port 24224#http:/ /this.host:9880/myapp.access?json={"event": "Data"} @type http port 9880
The input plug-in specified by source will submit an event/event with {tag, Time,record} Three properties to the FLUENTD engine to complete the data entry.
2) Match: Defines the target of the output, such as writing to a file, or sending to a specified location
Match configures the matching rules for the data flow and the actions to be performed after the match succeeds, similar to the routing table entries. For example, the following configuration performs the file type action on packets that match the myapp.access tag success, and writes the data to a file with path/var/log/fluent/access.
# Match events tagged with ' myapp.access ' and# store them to/var/log/fluent/access.%y-%m-%d# of course, you can control H ow partition your data# with the Time_slice_format option. @type file path/var/log/fluent/access
The standard actions are file and forward, and so on. The file indicates that the files are written, while forward indicates forwarding to the next hop.
The design of match pattern is no different from normal regular match, the specific classification is as follows:
*: Matches a part of tag, such as a.* can match a.b, but A.B.C cannot match successfully. * *: Matches 0 or more tag parts. For example a.** can match a.b,a.b.c{x,y,z}: Match X,y,orz, or relationship.
In addition, they can be mixed, such as a. {b,c,d}.* and so on. When there are multiple matching patterns within a tag, it will support or logically match, that is, as long as the match succeeds one performs the corresponding operation. Like what:
Match A and B. Match A,A.B,A.B.C
3) Logging
FLUENTD supports two types of logging configurations, one global and one plugin for plug-ins.
The output levels of supported logs are as follows:
Fatal error warn info debug trace
4) FLUENTD has 5 types of plugins, namely:
Input: Completion of reading of inputs, configured by source section
Parser: Parsing plugins
Output: The operation to finish outputting data, configured by the match section
Formatter: Message formatted plug-in, which belongs to the filter type
Buffer: Cache plug-in for caching data
Each type contains a variety of plug-ins, such as the type of input contains the following plug-ins:
In_forward in_http in_tail in_exec in_syslog in_scribe
5) Route
Route refers to the processing line of data in Fluentd, the general process is
Output, filter, input
Output with label, filter, input
That is, the input plug-in to obtain data, and then to filter for processing, and then to the output plugin to forward. It also supports the re-commit of packets/events, such as rerouting after modifying tags, and so on.
Reroute Event by tags
Reroute Event by record content
Reroute event to other label
4. Use case
Here you will choose one of the simplest use cases to describe the use of fluentd. Fluentd collects Docker's Landing information case.
First, create a config file that configures the behavior of the FLUENTD, which can be named "in_docker.conf".
Type forward port 24224 bind 0.0.0.0type stdout
Then save the file. Run FLUENTD using the following command.
$fluentd-cin_docker.conf
If the operation succeeds, the output information is as follows:
$ fluentd-c in_docker.conf2015-09-01 15:07:12-0600 [info]: Reading config file path= "in_docker.conf" 2015-09-01 15:07:12 -0600 [INFO]: Starting fluentd-0.12.152015-09-01 15:07:12-0600 [INFO]: Gem ' fluent-plugin-mongo ' version ' 0.7.10 ' 2015-0 9-01 15:07:12-0600 [INFO]: Gem ' fluentd ' version ' 0.12.15 ' 2015-09-01 15:07:12-0600 [info]: adding match pattern= "* *" Ty Pe= "stdout" 2015-09-01 15:07:12-0600 [info]: Adding source type= "forward" 2015-09-01 15:07:12-0600 [info]: Using Configur ation file: @type forward port 24224 bind 0.0.0.0@type stdout2015-09-01 15:07:12-0600 [INFO]: Listening fluent Socke T on 0.0.0.0:24224
Then launch the Docker Containner. If you have not previously installed the Docker engine, please install it yourself. Because Docker natively supports FLUENTD gathering information, it is possible to start the Fluentd client/client by starting a command.
$dockerrun--log-driver=fluentdubuntuecho "Hello fluentd!" hellofluentd!
Ubuntu in the above command is a mirror, and if not, Docker engine will download it automatically and create a container on this image. After the container is started, view the default output information file:/var/log/td-agent/td-agent.log, which can be viewed in the last row for output information.
Summarize
FLUENTD is an excellent log information collection of open source freeware, currently available in support of more than 125 systems of log information. FLUENTD combined with the use of other data processing platforms, can build a big data collection and processing platform, to build a commercial solution.
Iv. Comparison of Fluentd & Logstash
Logstash supports all major log types, plug-in support is the richest, can be flexible DIY, but poor performance, the JVM easily lead to high memory usage.
FLUENTD supports all major log types, plug-ins support more, performance is better.
Reference:
Https://www.jianshu.com/p/9c26bd9f6ebd
https://juejin.im/entry/58bad514ac502e006bf70517
http://soft.dog/2015/12/24/beats-basic/
Http://www.muzixing.com/tag/fluentd.html