NPTL becomes the GLIBC line threading, how its performance is subject to a lot of people's attention. In this paper, the performance of NPTL and linuxthreads is compared, and the effects of hyper-threading and kernel-preemption on threading performance are comprehensively evaluated.
First, the preface
In the Linux 2.6.x kernel, the improvement of scheduling performance is one of the most compelling parts [1]. NPTL (Native Posix thread Library) [2] rewrites the Linux line threading with the new features of the kernel, replacing the GLIBC's preferred line for the long history and controversial linuxthreads [3].
What about the performance of NPTL? What are the obvious improvements in relative linuxthreads? Before making a comprehensive analysis of the NPTL, this paper makes a comprehensive performance evaluation of these two line threading, as well as kernel-preemption (preemptible) and Hyper-Threading (HyperThreading) [4] in the kernel, the results show that NPTL Is definitely worth the server system to look forward to and use.
Second, Benchmark
1. Test Platform
The hardware platform for this test is a tidal nf420r server [7],4 hyperthreading-enabled Intel Xeon 2.2G processor, 4G memory. Linux has selected the Slackware 9.0 release [8], using the kernel source code from www.kernel.org.
2. For testing: Lmbench
Lmbench is a multi-platform open source benchmark [5] for evaluating system performance, but there is no support for threading. Of these, two test process performance Benchmark:lat_proc are used to evaluate the performance of process creation and termination, LAT_CTX to evaluate process switching overhead. Lmbench has a good benchmark structure, only need to modify specific target program (such as LAT_PROC.C and LAT_CTX.C), you can borrow lmbench timing, statistical system to get our concern line threading performance data.
Based on the Lat_proc and lat_ctx algorithm, this paper realizes the Lat_thread and Lat_thread_ctx two benchmark. In Lat_thread, Lat_proc is transformed into a thread that replaces fork () with a pthread_create () and a pthread_join () instead of wait (), and in Lat_thread_ctx, the evaluation of LAT_CTX Method of measurement (see Lat_ctx man page), the process of creating a process is rewritten as a creation thread, and the pipeline is still used for communication and synchronization.
lat_thread null
The null parameter indicates that the thread does not perform any actual operations and returns immediately after creation.
lat_thread_ctx -s #threads
The size parameter is the same as the LAT_CTX definition, which can be used to indicate how large the thread is (when it is actually programmed to allocate K data; #threads parameter is the number of threads, that is, the total number of threads that participate in the token, equivalent to the program load
3. Comprehensive test: Volanomark
Volanomark is a pure Java benchmark designed to test the overall performance of the System Scheduler and threading Environment [6], which builds a Java chat room that simulates client/server mode, Evaluate the host's comprehensive performance by getting the average number of messages sent per second (the greater the value the better). Volanomark test is related to the Java Virtual Machine platform, this article uses the Sun Java SDK 1.4.2 as a test Java platform, Volanomark version 2.5.0.9.
Third, test results
In the test plan, the kernel is divided into 2.4.26, 2.6.6/supports kernel preemption and 2.6.6/does not support kernel preemption three categories: Single processor (UP), 4CPU SMP (SMP4), and open Hyper-Threading supported virtual 8CPU SMP (smp8*). Each combination of kernel configuration and SMP size obtains a set of data for Linuxthreads and NPTL using Lat_thread, Lat_thread_ctx, and Volanomark. The data is vacant because the NPTL cannot be used on the 2.4.x kernel.