Fault environment:
Red Hat Enterprise Linux as Release 3 (taroon Update 4)
Kernel-2.4.21-27.EL
Glibc-2.3.2-95.30
Apache-2.0.53
[Root @ test10 root] #/usr/apache2/bin/httpd-V
Server version: Apache/2.0.53
Server built: SEP 9 2005 16:27:28
Server's module magic number: 20020903: 9
Architecture: 32-bit
Server compiled ....
-D apache_mpm_dir = "server/MPM/prefork"
-D apr_has_sendfile
-D apr_has_mmap
-D apr_have_ipv6 (IPv4-mapped addresses enabled)
-D apr_use_sysvsem_serialize
-D apr_use_pthread_serialize
-D single_listen_unserialized_accept
-D apr_has_other_child
-D ap_have_reliable_piped_logs
-D httpd_root = "/usr/apache2"
-D suexec_bin = "/usr/apache2/bin/suexec"
-D default_pidlog = "logs/httpd. PID"
-D default_scoreboard = "logs/apache_runtime_status"
-D default_lockfile = "logs/accept. Lock"
-D default_errorlog = "logs/error_log"
-D ap_types_config_file = "CONF/mime. types"
-D server_config_file = "CONF/httpd. conf"
Fault symptom:
In Linux as 3, the stress testing tool is used for testing. Sometimes, the CPU usage of httpd sub-processes is 100%, and cannot be recovered after the pressure is removed. There are always some httpd sub-processes with high CPU usage. However, this problem does not occur during Linux as 4 testing. In addition, jmeter testing does not occur on both systems.
Preliminary analysis:
As there is no problem on Linux as 4, you can determine that there is no problem with the test tool and test script. Open the Apache monitoring page and find that the sub-processes with high CPU usage are in reading status. Some of the same problems have been solved on Apache official Bugzilla (due to its own module, but there is no specific description ).
Further analysis of the problem:
Because we cannot continue tracing through some external performances, we can only use GDB. Debugging is introduced in Apache help, so we use GDB to display the call stack of the faulty process (as shown below) Step by step when a fault occurs ):
(GDB) BT
#0 0x002738a0 in _ io_un_link_internal () from/lib/tls/libc. so.6
#1 0x0027496c in _ io_default_finish_internal () from/lib/tls/libc. so.6
#2 0x002713ad in _ io_new_file_finish () from/lib/tls/libc. so.6
#3 0x002666d5 in fclose @ glibc_2.1 () from/lib/tls/libc. so.6
#4 0x004f3523 in cpakreader: Close () from/usr/lib/httpd/modules/mod_pak.so
#5 0x004f2c50 in cpakreader ::~ Cpakreader () from/usr/lib/httpd/modules/mod_pak.so
#6 0x004f3e29 in mod_pak_method_handler () from/usr/lib/httpd/modules/mod_pak.so
#7 0x0807c2f2 in ap_run_handler (r = 0x8a56518) at config. C: 153
#8 0x0807c80a in ap_invoke_handler (r = 0x8a56518) at config. C: 364
#9 0x0806bf8f in ap_process_request (r = 0x8a56518) at http_request.c: 249
#10 0x08068049 in ap_process_http_connection (C = 0x8a523e0) at http_core.c: 251
#11 0x0808556e in ap_run_process_connection (C = 0x8a523e0) at connection. C: 43
#12 0x0807ae6b in child_main (child_num_arg = 145082000) at prefork. C: 610
#13 0x0807af88 in make_child (S = 0x897f800, slot = 0) at prefork. C: 704
#14 0x0807b06f in startup_children (number_to_start = 5) at prefork. C: 722
#15 0x0807b77d in ap_mpm_run (_ pconf = 0x897b0a8, plog = 0x89b3188, S = 0x897f800) at prefork. C: 941
#16 0x08080732 in main (argc = 1, argv = 0xbfffb324) at main. C: 618
The next step is to call its own modules and related functions in the stack, and then output more detailed error information for debugging the function. In this example, the faulty HTTPd process may have an unfinished loop in the fclose call of the C library because the fault does not exist in the as4 environment, based on this, it is assumed that glibc's local thread storage implementation has a bug in as3's nptl environment and is triggered under a certain boundary condition, stack information shows that the cause of the fault is the fclose called during cpakreader object analysis. By adding debugging code, it is found that fclose returns-1 (EOF ), this fault is basically caused by repeatedly disabling closed file stream pointers. Therefore, the problem is corrected immediately after modification and re-testing.
Experience summary:
1. Try to use different testing tools for testing as much as possible to avoid problems hidden in the system due to defects of testing tools;
2. For infrequent errors and faults, keep the field for analysis whenever possible, and record the configuration conditions for each test in detail.
3. Because resource leaks and excessive release are high-frequency bugs in the code, it is recommended that developers introduce some necessary automated source code scanning and coverage methods to discover such bugs as early as possible;
4, because the core of the Linux-2.6 in the native kernel thread support, process scheduling preemption and code maintenance level are superior to the Linux-2.4 core, and Java support is more efficient, if there are no special requirements for the Application Deployment platform, please try to use as4 and so on using the kernel-2.6.x release version (higher version ).