about how to set up kernel dumps
1. Kernel dump function
(1) The greatest benefit of a kernel dump is the ability to save the state when the problem occurs.
(2) As long as the executable file and the kernel dump, you can know the status of the process.
(3) As long as the kernel dump is obtained, it can be debugged even if the environment is not reproduced.
2. Enable kernel dumps
1.1 to see if a kernel dump is valid
Enter the following command in the terminal to see if the kernel dump is valid.
#ulimit-C
0
-C indicates the size limit of the kernel dump file, which is now displayed as zero, indicating that it is not available.
Can be changed to 1G
#ulimit-C 1073741824
You can also change to unlimited
#ulimit-C Unlimited
2.2 Testing An example
Examples of source code:
#include <stdio.h>
int main (void)
{
int *a = NULL;
*a = 0x1;
return 0;
}
After writing the above source code to a A.C file, compile the a.c file to produce a a.out executable file:
#gcc-G A.c-o a.out
After you modify the permissions for the a.out file, execute it:
#./a.out
It will show:
Segmentation Fault (Core dump)
This means that the a.out corresponding kernel dump file has been generated in the current directory.
Note: The dump file was successfully generated after the following with (core dump).
#file core*
Core:elf 64-bit LSB Core file x86-64, version 1 (SYSV), Svr4-style, from './a.out '
Coredump:utf-8 Unicode C Program text
To debug a kernel dump file with GDB, you should start GDB using the following method:
#gdb-C./*.core./a.out
GNU gdb (GDB) 7.1-ubuntu
...
Core is generated by './a.out '.
Program terminated with signal one, segmentation fault.
#0 0X00000000004004DC in Main () at A.c:6
6 *a =0x1;
The 6th line of A.C received signal No. 11th. Use the GDB List command to view nearby source code.
(GDB) L 5
1 #include <stdio.h>
2
3 int main (void)
4 {
5 int *a = NULL;
6 *a = 0x1;
7 return 0;
8}
The default is the current directory, or you can specify the path for core and a.out
Test done here!
2.3 Methods of permanent entry into force
The method described above, which only takes effect in the current shell, is no longer valid after a reboot. The method of permanent entry is:
#vi/etc/profile Then, add the following in profile:
Ulimit-c 1073741824
(However, the dump file will not be generated if the resulting dump file size is larger than the number)
Or
Ulimit-c Unlimited
This will take effect after restarting the machine. Alternatively, use the source command to make it effective immediately.
#source/etc/profile
3. Specify the file name and directory of the kernel dump
By default, the core file produced by the kernel at Coredump is placed in the same directory as the program, and the file name is fixed to core. Obviously, if there are multiple programs that produce a core file, or if the same program crashes multiple times, the same core file is overwritten repeatedly.
We can specify the path and file name of the core file generated by the kernel dump by modifying the parameters of the kernel.
You can set the Kernel.core_pattern of the sysctl variable in the/etc/sysctl.conf file.
#vi/etc/sysctl.conf Then, add the following two words in the sysctl.conf file:
Kernel.core_pattern =/var/core/core_%e_%p
Kernel.core_uses_pid = 0
Exit after saving.
It should be stated that the /proc/sys/kernel/core_uses_pid. If the content of this file is configured to 1, even if%p is not set in Core_pattern, the last generated core dump file name will still be added to the process ID.
Here%e,%p, respectively, said:
Maximum size of%c dump file
%e the file name of the dump
%g The actual group ID of the process being dump
%H Host Name
%p the process PID of the dump
%s causes this coredump signal
%t Dump time (number of seconds from January 1, 1970)
The actual user ID of the%u dump process
You can use the following command to make the result of the modification take effect immediately.
#sysctl –p/etc/sysctl.conf
Build the core folder in the/var directory and execute the A.OUT program, and the kernel dump file named in the specified format will be generated under/var/core/. To view the dump file:
#ls/var/core
core_a.out_2834
4. Manually force a process to produce a core dump method (try)
When some programs occur crash, the corresponding process produces coredump files. With this coredump file, developers can find the cause of the bug. But the coredump, mostly because the program crash.
And some bugs will not cause the program crash, such as deadlock, then the program is not normal, but there is no coredump produced. If the environment doesn't allow GDB to debug, are we at our wits ' end?
In this case, in general, for such a process can use watchdog to monitor them, when they find that these processes have not updated their heartbeat for a long time, you can send these processes can cause it to generate coredump signal. Depending on the default processing behavior of Linux signals, SIGQUIT,SIGABRT, SIGFPE, and SIGSEGV all allow the process to generate coredump files. This way we can tell if the deadlock happened by Coredump. Of course, if a process adds a handler for these signals, it will not produce coredump. However, for Sigquit, SIGABRT, SIGFPE, SIGSEGV, who would add a signal processing function to them.
In another case, the process does not have a deadlock or block in place, but we need to debug at a given location to get some variables or other information. However, it is possible that the customer environment or production environment, do not allow us to conduct long-time testing. Then we need to get a snapshot of the process running to that point through coredump. At this time, GDB can be used to produce coredump manually. When this process is on attach, a breakpoint is placed at the specified location, and when the breakpoint is triggered, a coredump is immediately generated using GDB's command gcore. In this way, we get a snapshot of the process in this location.
1. Look for the process ID you want to send a signal to,
# Ps-ef | grep QEMU
Root 3207 3206 10:32 pts/1 00:00:18/usr/local/bin/qemu-system-x86
Qemu-system-x86 's PID number is 3207.
2. Use Kill (1) to send a signal.
#/bin/kill-s QUIT 3207
Sending other signals is similar, as long as you replace quit with Abrt,term or KILL on the command line.
Important notes: It is a bad idea to kill the process randomly on the system, especially init (8), whose PID is 1, which is very special. You can run the/bin/kill-s Kill 1 command to allow the system to shut down quickly. Be sure to check the parameters you specified when you ran kill (1) before you pressed the return key .
5 using Core dump for debugging
A "segment error" (Segmentation fault) is encountered under Linux, and if the segment error occurs on the server side and the server is not allowed to debug, the kernel dump (core dump) comes in handy to replicate the resulting kernel dump to local debugging.
First, make the appropriate settings on the server according to the permanent method above. Then, when the program crashes, it will generate a core file in the directory where the program is located (or the directory you specify), and copy the core file locally (preferably to the same directory as the executable that corresponds to the process, or, if not, the path at GDB).
Here's how:
Method One:
Enter Command #gdb < program executables > <coredump dump File >
For example:
# gdb/usr/local/bin/qemu-system-x86_64/var/core/core-3207-qemu-system-x86
Then, enter L at the (GDB) prompt and the main function is displayed
Method Two:
(1) in Terminal input Command # GDB [-c] <coredump file;
Example: Gdb-c/var/core/core-3207-qemu-system-x86
(2) Then, at the (gdb) prompt, enter the file < executable program >
For example: (GDB) file/usr/local/bin/qemu-system-x86_64
(3) You can then use Backtrace/thread and other commands to see the errors, just as the program executes locally to the crash point
or use where to enter, you can also show the program in which line when the drop
5. Enable kernel dumps for the entire system
(not to be continued ...)
(4.1) Edit/etc/profile to turn on kernel dump function for all users logged in to the system
First, take a look at what the machine is:
# Uname–a
Linux ubuntu240 2.6.32-21-server #32-ubuntu SMP Fri Apr 09:17:34 UTC all x86_64 gnu/linux
Second, look at some of the default parameters, and if the core file size is 0, no core files will be generated even if the program goes wrong.
# ulimit-a
Core file size (blocks,-c) Unlimited
Data seg Size (Kbytes,-D) Unlimited
Scheduling Priority (-e) 20
File size (blocks,-f) Unlimited
Pending signals (-i) 16382
Max locked Memory (Kbytes, L) 64
Max memory Size (Kbytes,-m) unlimited
Open files (-N) 1024
Pipe Size (bytes,-p) 8
POSIX message queues (bytes,-Q) 819200
Real-time priority (-R) 0
Stack size (Kbytes,-s) 8192
CPU time (seconds,-t) unlimited
MAX User Processes (-u) Unlimited
Virtual Memory (Kbytes,-V) Unlimited
File locks (-X) Unlimited
View stack Information
When the program is stopped, the first thing you need to do is to see where the program is parked. When your program calls a function, the address of the function, the function arguments, the local variables within the function are pressed into the stack. You can use the GDB command to view the information in the current stack.
Here are some GDB commands to view the function call stack information:
BackTrace
Bt
Prints all the information for the current function call stack. Such as:
(GDB) bt
#0 func (n=250) at Tst.c:6
#1 0x08048524 in Main (Argc=1, argv=0xbffff674) at tst.c:30
#2 0x400409ed in __libc_start_main () from/lib/libc.so.6
The function's call stack information can be seen from the above: __libc_start_main to Main ()---func ()
BackTrace
Bt
n is a positive integer that represents only the stack information for the top n-tier of the stack.
BackTrace <-n>
BT <-n>
The-n table is a negative integer that prints only the stack information for the n-tier below the stack.
If you want to view a layer of information, you need to switch the current stack, generally speaking, when the program stops, the topmost stack is the current stack, if you want to see the stack below the details of the layer, the first thing to do is to switch the current stack.
Frame
F
N is an integer starting from 0, which is the layer number in the stack. For example: Frame 0, representing the top of the stack, frame 1, represents the second layer of the stack.
Up
Moves the n layer to the top of the stack without hitting N, which means moving up one layer.
Down
Moves the n layer below the stack, without hitting N, to move down one layer.
The above command will print out the information of the stack layer that is moved to. If you don't want it to be a message. You can use these three commands:
The select-frame corresponds to the frame command.
The up-silently corresponds to the UP command.
The down-silently corresponds to the down command.
To view the information for the current stack, you can use the following GDB command:
Frame or F
This information is printed out: the stack's layer number, the current function name, the function parameter value, the file and line number where the function is located, and the statement to which the function executes.
Info frame
Info F
This command prints out more detailed information about the current stack, except that most of them are internal addresses at runtime. For example: function address, call function address, called function address, the current function is written by what program language, function parameter address and value, local variable address and so on. Such as:
(GDB) Info f
Stack level 0, frame at 0xbffff5d4:
EIP = 0x804845d in func (Tst.c:6); Saved EIP 0x8048524
Called by frame at 0xbffff60c
Source Language C.
Arglist at 0xbffff5d4, args:n=250
Locals at 0xbffff5d4, Previous frame ' s SP is 0x0
Saved Registers:
EBP at 0xbffff5d4, EIP at 0xbffff5d8
Info args
Prints out the parameter names of the current function and their values.
Info locals
Prints out all local variables and their values in the current function.
Info catch
Prints out the exception handling information in the current function.
How to set the kernel dump