Linux thread library performance test and analysis

Last Update:2017-06-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Article title: Linux thread library performance test and analysis. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
　　 I. Preface
In Linux 2.6.x kernel, the improvement of scheduling performance is one of the most striking parts [1]. NPTL (Native Posix Thread Library) [2] uses the new features of the kernel to overwrite the Linux Thread Library, replacing the long-standing and controversial LinuxThreads [3] as the preferred Thread Library for glibc.
　　
What is the performance of NPTL? What are the obvious improvements compared with LinuxThreads? Before fully analyzing NPTL, this article focuses on these two types of thread libraries, as well as the kernel Preemptible and HyperThreading) [4] and other features have been fully evaluated. The results show that NPTL is definitely worth the expectation and use of a large number of server systems.
　　
　　 II. Benchmark
1. test platform
The hardware platform for this test is Inspur NF420R server [7], four Hyperthreading-enabled Intel Xeon 2.2G processors, and 4G memory. In Linux, the Slackware 9.0 release [8] is selected. the kernel source code used is www.kernel.org.
　　
2. test: LMBench
Lmbench is a multi-platform open-source benchmark [5] for evaluating the overall performance of the system, but it does not support threads. There are two benchmark: lat_proc used to evaluate the performance of process creation and termination, and lat_ctx used to evaluate the overhead of process switching. Lmbench has a good benchmark structure. you only need to modify the specific Target program (such as lat_proc.c and lat_ctx.c) to obtain the performance data of the thread library that we care about by using the lmbench timing and statistics system.
　　
Based on lat_proc and lat_ctx algorithms, this article implements lat_thread and lat_thread_ctx benchmark. In lat_thread, lat_proc is transformed to use thread, and fork () is replaced by pthread_create (), and pthread_join () is used to replace wait (); in lat_thread_ctx, follow the lat_ctx evaluation algorithm (see the lat_ctx manual page) to rewrite the process of creating a process to the creation thread, and still use the pipeline for communication and synchronization.
　　
Lat_thread null
　　
The null parameter indicates that the thread does not perform any actual operations and returns immediately after creation.
　　
Lat_thread_ctx-s # Threads
　　
The size parameter is the same as the lat_ctx parameter, which indicates the thread size (allocated in actual programming ). K data; # the threads parameter is the number of threads, that is, the total number of threads involved in token transmission, equivalent to the program load.
　　
3. comprehensive test: Volanomark
Volanomark is a java-only benchmark used to test the overall performance of the system scheduler and the thread environment [6]. it creates a Java chat room that simulates Client/Server methods, evaluate the overall performance of a host by obtaining the average number of messages sent per second (the larger the value, the better the performance ). The Volanomark test is related to the Java virtual machine platform. This article uses Sun Java SDK 1.4.2 as the Java platform for testing, and Volanomark version 2.5.0.9.
　　
　　 III. test results
In the test plan, the kernel is divided into 2.4.26, 2.6.6/supports kernel preemption and 2.6.6/does not support kernel preemption. by configuring the kernel and NF420R BIOS, three types of SMP scale are implemented: single-processor (UP) 4-cpu smp (SMP4) and 8-cpu smp (SMP8 *) supported by hyper-threading *). Each combination of kernel configuration and SMP scale obtains a set of data for LinuxThreads and NPTL using lat_thread, lat_thread_ctx, and volanomark. Because NPTL cannot be used on the 2.4.x kernel, the data is vacant.
　　　
　　
　　 IV. Result analysis
1. LinuxThreads vs NPTL: Thread creation/destruction overhead
Run the Kernel 2.6.6/preemptible to configure UP and SMP4 test data:
　　
　　
　　　
　　
In terms of thread creation/destruction overhead, NPTL has been significantly improved (down by about 600% ). In fact, NPTL does not need to use user-level management threads as LinuxThreads does to maintain thread creation and destruction [9]. Therefore, it is easy to understand that its overhead in this aspect can be greatly reduced.
　　
At the same time, as shown in the figure, creating threads under a single CPU is always faster than creating threads under multiple CPUs.
　　
2. LinuxThreads vs NPTL: thread switching overhead
Similarly, use the 2.6.6/preemptible kernel to configure UP and SMP4 data:
　　
　　
　　　
　　
With the increase of lat_thread_ctx participating threads, the overhead of thread switching in single-processor conditions increases sharply regardless of the thread library, while that in SMP conditions increases slowly. In this regard, the LinuxThreads and NPTL performances are basically the same.
　　
3. kernel impact
　　
　　
　　
　　　
　　
　　
　　　
　　
　　
　　　
　　
　　
　　　
　　
From the above four figures, we can draw two conclusions:
　　
"Kernel preemption" is a powerful guarantee for Linux to provide better support for real-time applications, but it has little impact on thread performance or even a slight loss. after all, the overhead of preemption locks cannot be ignored;
Upgrading the kernel does not change the performance of the LinuxThreads thread library. Therefore, for the server system, you cannot expect high performance only by compiling and using the new kernel.
　　
　　　
　　
Figure 8
　　　
　　
As we know, enabling hyper-threading supports has almost no impact on thread creation/destruction performance, and the two charts further show that Hyper-Threading Technology has no significant impact on thread switching overhead. Hyper-Threading Technology is an internal CPU optimization technology, which is totally different from the real dual CPU. A large number of studies have shown that hyper-threading will not bring about great performance changes without the combination of kernel and user applications. Unless it is a high-load integrated server system (such as a busy database system), the purchase of overline support CPU does not bring much benefit.
　　
4. Comprehensive Performance
　　
Figure 9
　　
The previous analyses show us the details of improving the performance of the thread library. through the volanomark test, we can obtain the approximate results in the integrated application environment, in particular, the impact of the thread library and kernel on the overall system performance in network service requirements.
　　
Figure 9 combines the volanomark results of two thread libraries with different kernels and different processor numbers. We can observe the following three points:
　　
NPTL can greatly improve the overall performance of server systems in SMP environments (more than 65%), with relatively less impact on single-processor systems (about 10% );
2.6 kernel preemptible features have little impact on system performance (up to ± 1%) and may even degrade in some cases;
Hyperthreading technology has a negative impact in LinuxThreads and is positive in NPTL, but the impact is very small (5%-6% ).
In the above conclusions, the first two points are completely consistent with the LMBench targeted test results. the deviation of the third point actually reflects that Hyper-Threading Technology still accelerates the overall server environment.
　　
　　 V. Summary
Our evaluation provides a valuable reference for Linux users, especially server users:
　　
If you are using a multi-processor system, do not hesitate to upgrade your Kernel. Remember, you must upgrade your thread library at the same time. it is usually closely coupled with glibc;
If your system does not have real-time applications, do not turn on the "kernel can be preemptible" switch, it will only make your system slower;
Consider whether to use Hyper-Threading Technology. even if you have purchased a CPU that supports hyper-threading, sometimes disabling it may be more suitable for your needs.
　　

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux thread library performance test and analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Linux thread library performance test and analysis

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support