This is a creation in Article, where the information may have evolved or changed.
First, the conclusion is listed in the front:
1.Golang performance can be very good, but some native package performance is likely to be retarded, such as RegExp and Encoding/json. If used in high performance requirements, we should optimize according to the actual situation.
The use of 2.ON-CPU/OFF-CPU flame diagram is a sharp weapon of program performance analysis, often sharply. Although generating a flame diagram is tedious (especially off-cpu), it is definitely worth owning!
Previously, Logstash was used as a log file to collect client programs. Logstash powerful, rich data processing plug-in and good expansion capabilities, but because the use of JRuby implementation, performance is worrying. And Filebeat is a later one with the go language implemented, more lightweight log file acquisition client. Good performance, low resource consumption, but almost no analytical processing capacity. The usual usage scenario is to use Filebeat to capture logstash resolution processing and then upload to Kafka or Elasticsearch. It is noteworthy that both Logstash and filebeat are excellent open source products of elastic company.
In order to improve the log collection performance of clients, reduce data transmission and deployment complexity, and more fully use the performance advantage of the go language to log parsing, so decided on filebeat through the development of plug-ins, the implementation of the company log format specification for the resolution, directly as a substitute for logstash.
After the introduction of the background, the following is the implementation and optimization process.
Version 1
The simplest implementation is to do the log parsing with the regular expression package regexp that go comes with. Performance has been 30% higher than logstash (also through the development of plug-ins for canonical log parsing).
The performance test here looks at the bottleneck of the log collection-the parsing process, which is the time it takes to collect and parse 1 million logs of the specified format and length by restricting the use of only one CPU core (to minimize the resource footprint on the server) for the business application. The test environment is 1 PCs with a 3.2GHz frequency. In order to avoid the impact of disk IO and page cache, both the input file and the output file are placed in/dev/shm. The CPU limit for filebeat is implemented by specifying the environment variable Gomaxprocs=1 at startup.
This version takes 122 seconds to process 1 million logs, or 8,200 logs per second.
Version 2
Next try to do some optimizations to see if the performance of the go plugin can be improved. The first thought was to replace the RegExp package. LINUX9 has a C implementation of the Pcre library, github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre This third-party package is to apply Pcre library to Golang. You need to install Pcre-devel this package before CentOS.
This version has a processing time of 97 seconds and results show a 25% increase in processing performance than the first version.
Version 3
The third version, which does not use regular expressions at all, uses strings for fixed log formatting rules. Index () Does the string decomposition and extraction operations. The processing time of this version is 70 seconds, and the performance is significantly increased by nearly 40%.
Version 4
There is no room for further improvement. Yes, Filebeat is the JSON package used as the serialized output. Our log uploads are in JSON format, and the filebeat using go comes with a Encoding/json package that is based on reflection and has been widely criticized for performance. If the JSON parsing is optimized, performance improvements can be significant. Since our log format is fixed, the parsed fields are fixed, so JSON serialization can be done based on a fixed log structure without the need for inefficient reflection. Go has several third-party packages that do JSON serialization/deserialization for a given struct, and we're using Easyjson (Https://github.com/mailru/easyjson). After installing the Easyjson package, the Easyjson command is executed for the program file that contains the log format structure definition, and a xxx_easyjson.go file is generated that contains the Marshal/unmarshal method for this struct. This reduces processing time to 61 seconds and improves performance by 15%.
At this time, the code in front of me, has not been able to think of any big aspect can also be optimized. It's time for another protagonist of this article, the flame diagram to come out.
The flame diagram is an effective tool for performance analysis, and here (http://www.brendangregg.com/flamegraphs.html) is its description. The commonly seen flame diagram refers to the On-cpu flame diagram, which is used to analyze which function calls the CPU consumes.
After installing the Flamegraph (https://github.com/brendangregg/FlameGraph) tool, run a one-time test on the current version of the program and follow the instructions to generate the flame diagram as follows.
Flamegraph is common for c/go programs. For go programs, you can also use your own NET/HTTP/PPROF package as your data source, and then install Uber's Go-torch (https://github.com/uber/ Go-torch) tool to automatically invoke the Flamegraph script to generate the On-cpu flame diagram, the execution will be slightly simpler. See Go-torch for instructions.
The vertical representation of the graph is the function call stack, the horizontal width of each block represents the proportion of CPU time, need to pay attention to the top of the large strip. The color of the squares is random and has no practical meaning.
There are two main blocks that can be seen from CPU time. A piece is the output processing part, a little bit of the JSON processing, this piece has been optimized for nothing to do. Another piece is more strange, is common. The Mapstr.clone () method, which actually accounted for 40% of the CPU time. Looking up, it is mainly the treatment of Errorf. A look at the code, immediately understand.
Common. MAPSTR is the structure that holds the contents of the log in the pipeline, and its clone () method realizes whether a sub-key value is a nested MAPSTR structure, by judging whether the TOMAPSTR () method returns error. From here, the cost of generating an Error object is very considerable. A clear fix, then, is to move the Judgment method in Tomapstr () to clone () and avoid generating an error.
Version 5
Regenerate a fire diagram for the modified code as follows.
At this time common. Mapstr.clone () has almost disappeared, proving that the CPU time spent is negligible.
The test time was shortened to 46 seconds, saving 33%, a great improvement!
Until now, there is one unresolved issue that has not been addressed-after restricting the use of a core, the CPU utilization of the test run can only reach around 82%. Is it due to the presence of locks that affect performance?
At this time, it should be asked to off-cpu the flame diagram. OFF-CPU flame diagram, is used to analyze the program is not efficient use of the CPU when, where the consumption, here (http://www.brendangregg.com/FlameGraphs/offcpuflamegraphs.html) have a detailed introduction. The data collection is more complex than the ON-CPU flame diagram and can be used in the Openresty-systemtap-toolkit (Https://github.com/openresty/openresty-systemtap-toolkit) package provided by the famous spring elder brother. The Kernel-devel and Debuginfo package installation methods are not described in detail on the project page of Spring brother. Also record here.
Kernel-devel no problem, direct Yum installation
sudo yum install-y kernel-devel
Debuginfo, you need this in CentOS7.
sudo vim/etc/yum.repos.d/centos-debuginfo.repo
Modify to Enable=1
sudo debuginfo-install kernel
You may also get an error when installing:
Invalid GPG key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-Debug-7: No Key found in given key data
Need to download KEY from https://www.centos.org/keys/RPM-GPG-KEY-CentOS-Debug-7 write to/etc/pki/rpm-gpg/rpm-gpg-key-centos-debug-7
After installation, follow the instructions to generate the OFF-CPU flame diagram as follows:
I can't fully read this picture yet, but it's already clear that the registry file (Filebeat is used to record file capture lists and offset data) is a percentage of the write operations. As a result, try to set Filebeat's Spool_size (registry file updated once for each of these logs) to 10240, the default value of 5 times times, run the test CPU can run to more than 95%. Setting the registry to/dev/shm/can also solve the problem of running out of CPU during the test.
This negates the guesswork of using improper locking to affect performance. The spool_size settings should be determined based on the test data combined with the output end (such as writing to Kafka) in the actual application.
At this point, the optimization is over, up to 3 times times the original version performance! The specific operational performance data for each version is shown in.
A little clarification is required:
1.Filebeat development is based on 5.3.1 version, go version is 1.8
The 2.Logstash test is configured with the-W 1 parameter by using a
Work process without restricting the use of a core
3. The execution time includes the program start time (Logstash start time is nearly 20 seconds)
The ultimate optimization result is that the ability to log parsing for a particular format and length reaches 25,000 pcs per second on a PC, even on a production server with a lower CPU frequency, and can reach 20,000 per second.