One, what is Coredump
We often hear that the program core dropped, need to locate the solution, the majority of this refers to the corresponding program due to a variety of anomalies or bugs caused in the operation of abnormal exit or stop, and under certain conditions (why the need to meet certain conditions here). The following analysis will produce a file called core.
Under normal circumstances, the core file will contain the program runtime memory, register state, stack pointer, memory management information and a variety of function call stack information, etc., we can understand that the program is the current state store to generate the first file, many of the program error will produce a core file, Through the tool analysis of this file, we can locate the program exception exit when the corresponding stack call and other information, find out where the problem and solve in a timely manner.
two, storage location of coredump files
Core file default storage location and corresponding executable program in the same directory, the filename is core, you can see the following command of the location of the core file:
Cat/proc/sys/kernel/core_pattern
The default value is core
Note: this refers to the creation of the current working directory of the process. Usually in the same path as the program. However, if the ChDir function is called in the program, it is possible to change the current working directory. The core file is then created under the chdir specified path. A lot of programs crashed, but we couldn't find where the core files were. is related to the ChDir function. Of course the program crashes do not necessarily produce core files.
The following code: the generated core file is stored in the/DATA/COREDUMP/WD, not the same directory as the executable file.
You can change the storage location of the Coredump file by using the following command, if you want to build the core file into the/data/coredump/core directory:
echo "/data/coredump/core" >/proc/sys/kernel/core_pattern
Note that the current user must have write permission to the/proc/sys/kernel/core_pattern.
By default, the core file produced by the kernel at Coredump is placed in the same directory as the program, and the filename is fixed to core. Obviously, if multiple programs produce core files, or if the same program crashes multiple times, the same core file is overwritten repeatedly, so it is necessary to name the core files generated by different programs separately.
By modifying the kernel parameters, we can specify the file name of the Coredump file that the kernel generates. For example, use the following command to enable kernel to generate a core dump file with the name Core.filename.pid format:
echo "/data/coredump/core.%e.%p" >/proc/sys/kernel/core_pattern
After this configuration, the resulting core file will have a crash program name, along with its process ID. The%e and%p above will be replaced with the program file name and the process ID.
If you include the directory delimiter "/" in the file name above, the resulting core file will be placed in the specified directory. It should be explained that there is a coredump-related setting in the kernel, which is/proc/sys/kernel/core_uses_pid. If the contents of this file are configured to be 1, the final generated core dump filename will still be added to the process ID, even if the%p is not set in Core_pattern.
third, how to judge a file is a coredump file.
In Unix-like systems, the main format of the Coredump file itself is also the elf format, so we can judge by the readelf command.
You can see that the Type field of the Elf file header is: core (core file)
You can quickly judge with a simple file command:
Four, some conditions of producing coredum are summarized
1, create coredump conditions, first need to confirm the current session Ulimit–c, if 0, then will not produce corresponding coredump, need to modify and set.
Ulimit-c Unlimited (can produce coredump and is not limited by size)
If you want to even the corresponding character size, you can specify:
ulimit–c [Size]
It can be seen that the size of this unit is blocks, general 1block=512bytes
Such as:
Ulimit–c 4 (Note that if the size here is too small, it may not produce the corresponding core file, the author set up Ulimit–c 1, the system does not generate core files, and tried to 1,2,3 can not produce core, at least 4 to generate core files)
However, the current set of Ulimit is only valid for the current session, and if you want the system to work, you need to set the following:
Ø Add the following line to/etc/profile, which will allow the generation of coredump files
Ulimit-c Unlimited
Ø Add the following line to the rc.local, which will cause the Coredump file generated when the program crashes is located under the/data/coredump/directory:
Echo/data/coredump/core.%e.%p>/proc/sys/kernel/core_pattern
Note rc.local in different environments, stored directories may be different, susu may be in/etc/rc.d/rc.local
More Ulimit commands to use, can refer to: http://baike.baidu.com/view/4832100.htm
These require root permissions, and each time you reopen the interrupt under Ubuntu you need to re-enter the Ulimit command above to set the core size to infinity.
2, the current user, that is, the user executing the program has write permission to write to the core directory and sufficient space.
3, several scenarios that do not produce core files:
The core file won't be generated if
(a) The process was Set-user-id and the "Current user" not "owner" the program file, or
(b) The process was Set-group-id and the "current user" is not the group owner of the file,
(c) The user does not have permission to write in the current working directory,
(d) The file already exists and the user does not have permission to write to it, or
(e) The file is too a (recall the Rlimit_core limit in section 7.11). The permissions of the core file (assuming that the file doesn ' t already exist) are usually and User-read, user-write Ough Mac OS X sets only user-read.
Five, several possible situations arising from coredump
There are many reasons for the program Coredump, here is a summary of some of the more commonly used experience bar:
1, memory access out of bounds
A) array access is out of bounds due to the use of incorrect subscript.
b When searching for a string, the string terminator is used to determine whether the string ends, but the string has no normal use of a terminator.
c) Use string manipulation functions such as strcpy, Strcat, sprintf, strcmp,strcasecmp to read/write the target string to the burst. Functions such as strncpy, strlcpy, Strncat, Strlcat, snprintf, strncmp, strncasecmp, etc. should be used to prevent reading and writing from crossing boundaries.
2, multithreaded programs use a thread-unsafe function.
You should use the following reentrant functions, which can easily be used incorrectly:
Asctime_r (3c) Gethostbyname_r (3n) getservbyname_r (3n) ctermid_r (3s) gethostent_r (3n) getservbyport_r (3n) Ctime_r (3c) Getlogin_r (3c) Getservent_r (3n) Fgetgrent_r (3c) Getnetbyaddr_r (3n) Getspent_r (3c) Fgetpwent_r (3c) Getnetbyname_r (3n) Getspnam_r (3c) Fgetspent_r (3c) Getnetent_r (3n) Gmtime_r (3c) Gamma_r (3m) Getnetgrent_r (3n) lgamma_r (3m) Getauclassent_ R (3) Getprotobyname_r (3n) Localtime_r (3c) Getauclassnam_r (3) etprotobynumber_r (3n) nis_sperror_r (3n) Getauevent_r (3) Getprotoent_r (3n) Rand_r (3c) Getauevnam_r (3) Getpwent_r (3c) Readdir_r (3c) Getauevnum_r (3) Getpwnam_r (3c) Strtok_r (3c) Getgrent_r (3c) Getpwuid_r (3c) Tmpnam_r (3s) getgrgid_r (3c) Getrpcbyname_r (3n) Ttyname_r (3c) Getgrnam_r (3c) Getrpcbynumber_r (3n) gethostbyaddr_r (3n) getrpcent_r (3n)
3, the data that multithreading reads and writes does not lock protection.
For global data that will be accessed by multiple threads at the same time, attention should be paid to lock protection, otherwise it can easily cause coredump
4, illegal pointers
A) using null pointers
b free use of pointer conversions. A pointer to a piece of memory, you should not convert this memory into a pointer to this structure or type unless you determine that it was originally assigned to a struct or type, or an array of that structure or type, and you should copy that memory into one of these structures or types, and then access that structure or type. This is because if the beginning address of this memory is not aligned according to this structure or type, it is easy to access it with the core dump because of bus error.
5, Stack Overflow
Do not use large local variables (because local variables are allocated on the stack), which can easily cause stack overflow, damage the system stack and heap structure, resulting in inexplicable errors.
six, using GDB to locate the Coredump
In fact, there are many tools to analyze Coredump, now most Unix-like systems provide the tools to analyze coredump files, but the tools we use often are gdb.
Here we take the procedure as an example to illustrate how to locate.
1, segment Error –segmentfault
Ø we write a piece of code to the address that is protected by the system.
Ø compile and execute as follows, note that this requires the-G option compilation.
As you can see, when you enter 12, the system prompts for a segment error and the core dumped
Ø we enter the corresponding core file generation directory, priority to confirm whether the core file format and enable GDB for debugging.
From the red box screenshot You can see that the program is aborted because of signal 11, and from the BackTrace command (or where) you can see the call stack of the function, where the program executes to the 5th line of the Coremain.cpp, and the scanf function is called inside, and the function actually calls inside the _ Io_vfscanf_internal () function.
Next we continue to use GDB to debug the corresponding program.
Remember a few common GDB commands:
L (list), display the source code, and can see the corresponding line number;
B (Break) x, X is the line number, indicating that the breakpoint is set at the corresponding line number position;
P (print) x, x is the variable name, representing the value of the print variable X
R (Run), which indicates where to continue execution to the breakpoint
N (next), which indicates that the next step is performed
C (Continue), indicating continued implementation
Q (quit), which means exiting GDB
Start GDB, note that the program needs to be compiled with the-G option.
Note: SIGSEGV Core Invalid memoryreference
Seven, note:
1, GDB's view source
Show source code
GDB can print out the source code of the program being debugged, and of course, when compiling the program, it must add-g parameter to compile the source program information into the execution file. Otherwise, you will not see the source program. When the program stops, GDB reports that the program is parked on the first line of that file. You can use the List command to print the source code of the program. Let's take a look at the GDB command to view the source code.
List<linenum>
Displays the source program around the LineNum line.
List<function>
Displays the source program for the function named function.
List
Displays the source program after the current line.
List-
Displays the source program before the current line.
Generally print the current line of the top 5 lines and the next 5 lines, if the display function is 2 lines under 8 lines, the default is 10 lines, of course, you can also customize the scope of the display, use the following command to set the number of lines to display the source program.
Setlistsize <count>
Sets the number of lines to display the source code at once.
Showlistsize
View the settings for the current listsize.
The list command also has the following usage:
List<first>, <last>
Displays the source code from the beginning to the last row.
List,<last>
Displays the source code from the current line to the last row.
List +
Display the source code later.
In general, the following parameters can be followed in the list:
<linenum> line number.
<+offset> the positive offset of the current line number.
<-offset> negative offset of current line number.
<filename:linenum> which row of which file.
<function> function name.
<filename:function> which function in which file.
The address of the statement in memory at the time the <*address> program runs.
2, the meaning of some common signal
SIGABRT: This signal is generated when the Abort function is invoked. The process terminated abnormally.
Sigbus: Indicates a hardware failure defined by an implementation.
SIGEMT: Indicates a hardware failure defined by an implementation. EMT This name comes from PDP-11 's emulator trap instruction.
SIGFPE: This signal represents an arithmetic operation exception, such as dividing by 0, floating point overflow, and so on.
Sigill: This signal indicates that the process has executed an illegal hardware directive. 4.3BSD this signal is generated by the abort function. SIGABRT is now being used for this.
Sigiot: This indicates a hardware failure defined by the implementation. The name IoT is derived from the abbreviation of PDP-11 for the input/output trap (input/outputtrap) instructions. Earlier versions of System V, generated by the abort function. SIGABRT is now being used for this.
Sigquit: When the user presses the exit key on the terminal (generally uses ctrl-/), produces this signal, and delivers to the foreground
All processes in the process group. This signal not only terminates the foreground process group (as SIGINT did), but also produces a core file.
SIGSEGV: Indicates that the process made an invalid storage access. The name SEGV says "paragraph violation (segmentationviolation)".
Sigsys: Indicates an invalid system call. For some unknown reason, the process executes a system call instruction, but its arguments indicating the type of the system call are invalid.
Sigtrap: Indicates a hardware failure defined by an implementation. This signal name is from the PDP-11 trap instruction.