Source: https://www.cnblogs.com/fifteen/archive/2012/03/20/2407449.html
Recent work involves analyzing the core dump file, finding this good post, and then turning to my blog O (∩_∩) o~
Ps:
Where can you get dbx?
It is part of Bos.adt.debug
# lslpp-w/usr/bin/dbx
File Fileset Type
-------------------------------------------
/usr/bin/dbx Bos.adt.debug Symlink
The following transfers are from: http://www.aixchina.net/?6141/viewspace-18882
Introduction to the I Core dump analysis
environment variable Settings
The basic configuration parameters for each user, including the core size, can be limited through the/etc/security/limits file. or change the core size limit for the current environment with Ulimit.
By default, the app process uses the file name core when it generates core dump. To prevent the process core from overwriting each other under the same working directory, you can define the environment variable core_naming=true and then start the process so that a file named Core.pid.ddhhmmss is generated. You can use the file Core command to see which process is generating the core.
By default, the application process dump contains all of the shared memory, and if you want to exclude shared memory content at dump, you can set the environment variable core_noshm=true before starting the process.
The system has a parameter fullcore to control whether the complete core is generated when the program Coredump. To avoid information loss, it is recommended to open Fullcore. You can use the Lsattr–el sys0 query to open fullcore, use chdev-l sys0-a fullcore=true to change the Fullcore state to open. You can also call the Sigaction routine in the program to set the Fullcore, refer to the following test program:
Fullcore Setup Example
Test. C #include <iostream>
int main (int argc, char* argv[]) { Char str[10]; struct sigaction s;
S.sa_mask.losigs = 0; S.sa_mask.hisigs = 0; S.sa_flags = Sa_fulldump;
Std::cout << "Input str!\n" << Std::endl; Std::cin >> str; return 0; } |
Find Core Dump
The core of the application process is generated under its current working directory and can be used within the application to switch the current working directory using the CHDIR function. Use the PROCWDX command to view the current working directory of the process. The core of the system is generated under LG_DUMPLV and is transferred to the/var/adm/ras/directory (if there is enough space) on the reboot, and remains in the LG_DUMPLV and is likely to be overwritten at any time.
You can use Errpt-a to view detailed error messages that identify c0aa5338 Sysdump (System core), B6048838 Core_dump (Process Core), get the process of generating the core, and the location of the core file. Use SNAP–AC to collect dump information for the system.
Core Dump information collection
If possible, using DBX to analyze the results directly on the machine where the coredump is occurring is the most convenient method of analysis. In this case, be careful not to log in directly with the root user and then analyze with DBX, but you must do so under the user to which the application belongs, because the core may need to rely on some libraries in the context of the application runtime, so that the application's environment variables will be used.
If you need to retrieve the core information from the production machine in the laboratory analysis, you need to collect some relevant information. Process core analysis typically relies on at least the application executable program, and sometimes includes some runtime dynamic library information. If you need to collect complete core-related information, run the Snapcore <core path along with the name > < executable file and name, such as Snapcore./core./a.out, and then remove it under/tmp/snapcore The corresponding. Pax. Z file.
The normal collection process should be as follows:
Snap Core collection Process
# Snapcore./core./a.out Core file "./core" Created by "a.out" pas S1 () in progress .... Calculating space required. Total space required is 14130 Kbytes. Checking for available space ... Available space is 807572 kbytes Pass1 complete. Pass2 () in progress .... Collecting fileset information. Collecting error Report of Core_dump errors. Creating Readme file. Creating Archive file ... Compressing archive file .... Pass2 completed. Snapcore completed successfully. Archive created In/tmp/snapcore. # cd/tmp/snapcore # ls Snapcore_352276.pax. Z # uncompress Snapcore_352276.pax. Z # ls Snapcore_352276.pax # pax-r-F snapcore_352276.pax # ls Note the need to ensure that a file similar to the following (executable file,/CORE/ERRPT/LSLPP/USR Directory, etc.): README errpt.out usr a.out lslpp.out Core Snapcore_352276.pax # |
II example of using DBX to analyze core dump
DBX is a source-level debugging tool based on the command line interface under AIX. This document only provides some basic DBX analysis instructions, please refer to "General Programming concepts:writing and Debugging Programs" for the description of DBX for more information.
Preliminary analysis
Example:
# dbx./test Core Type ' help ' for help. Warning:the core file is not a fullcore. Some info May Not being available. [Using memory image in core] Reading symbolic information ... warning:no source compiled with-g
Segmentation Fault in Raise at 0xd022e1e4 0xd022e1e4 (raise+0x40) 80410014 lwz r2,0x14 (R1) |
Shows the location where the current process executes when the core occurs (a specific row can be seen in the case of-G compilation):
(DBX) where Raise (??) at 0xd022e1e4 Main (0x1, 0x2ff22d48) at 0X100019C4 |
Attention:
If you are analyzing an offsite core file, you need to use Snapcore to collect relevant core information. For dependent link libraries, note that you need to increase the -p ldpath=newpath:...
Reset link library path (only all dependent libraries have been linked to fully reproduce the core dump fault site), refer to the DBX Help documentation for more information.
# Cd/tmp/snapcore # dbx–p/=./a.out Core Type ' help ' for help. [Using memory image in core] Reading symbolic information ... warning:no source compiled with-g
Iot/abort trap in raise at 0XD01F4F60
|
List source information
List the program source code (list, you need to run the DBX command using-I to indicate the source search path, and use-G compilation) or the sink code (LISTI):
(DBX) Listi main 0x10001924 (main) 7c0802a6 MFLR r0 0x10001928 (main+0x4) bfa1fff4 stmw r29,-12 (R1) 0X1000192C (main+0x8) 90010008 STW r0,0x8 (R1) 0x10001930 (main+0xc) 9421ffa0 STWU r1,-96 (R1) 0x10001934 (main+0x10) 83e20064 lwz r31,0x64 (R2) 0x10001938 (main+0x14) 90610078 STW r3,0x78 (R1) 0X1000193C (main+0x18) 9081007c STW r4,0x7c (R1) 0x10001940 (main+0x1c) 83a20068 lwz r29,0x68 (R2) |
Enumerating Variable Contents
Example code:
#include <iostream> #include <signal.h> int g_test = 0;
int testfunc (int ¶) { para++; return 0; }
int main (int argc, char* argv[]) { struct sigaction s; S.sa_handler = SIG_DFL; S.sa_mask.losigs = 0; S.sa_mask.hisigs = 0; S.sa_flags = Sa_fulldump; Sigaction (Sigsegv,&s, (struct sigaction *) NULL);
Char str[10]; g_test = 0;
TestFunc (g_test); Abort (); } # XlC Test. C-g |
Take the global variable G_test example:
#print G_test shows the value of G_test
#print sizeof (g_test) shows the size of the G_test
#whatis G_test shows the type of g_test
#print &g_test shows the address of the G_test
#&g_test/16x displays the value of 16 consecutive WORD (? byte) from the beginning of the address of the G_test
If you do not use the-G compilation, you cannot dynamically obtain information such as the type, size, and so on of the g_test, but you can get the address of g_test and query the value of the area where the address is stored.
For example:
#./a.out Iot/abort Trap (coredump) # dbx./a.out Core Type ' help ' for help. [Using memory image in core] Reading symbolic information ...
Iot/abort trap in raise at 0XD03365BC 0XD03365BC (raise+0x40) 80410014 lwz r2,0x14 (R1) (DBX) Print g_test 1 (DBX) Whatis g_test int g_test; (DBX) print sizeof (g_test) 4 (DBX) Print &g_test 0x20000428 (DBX) &g_test/16x 0x20000428:0000 0001 0000 0000 0000 0000 0000 0000 0x20000438:0000 0000 0000 0000 0000 0000 0000 0000 |
Enumerating the contents of registers
List the contents of the Register:
(DBX) Registers
The following simulation of a simple core dump, assigning a value to 0 addresses raises the issue of core dump:
# dbx./a.out Core Type ' help ' for help. Warning:the core file is not a fullcore. Some info May Not being available. [Using memory image in core] Reading symbolic information ... warning:no source compiled with-g
Segmentation Fault in Main at 0x10000348 0x10000348 (main+0x18) 90640000 STW r3,0x0 (R4) (DBX) where Main (0x1, 0X2FF22CCC) at 0x10000348 (DBX) Registers $r 0:0x00000000 $STKP: 0x2ff22bf0 $toc: 0x20000414 $r 3:0x00000012 $r 4:0x00000000 $r 5:0x2ff22cd4 $r 6:0xdeadbeef $r 7:0x2ff22ff8 $r 8:0x00000000 $r 9:0x04030000 $r 10:0xf0577538 $r 11:0xdeadbeef $r 12:0xdeadbeef $r 13:0xdeadbeef $r 14:0x00000001 $r 15:0X2FF22CCC $r 16:0x2ff22cd4 $r 17:0x00000000 $r 18:0xdeadbeef $r 19:0xdeadbeef $r 20:0xdeadbeef $r 21:0xdeadbeef $r 22:0xdeadbeef $r 23:0xdeadbeef $r 24:0xdeadbeef $r 25:0xdeadbeef $r 26:0xdeadbeef $r 27:0xdeadbeef $r 28:0xdeadbeef $r 29:0xdeadbeef $r 30:0xdeadbeef $r 31:0xdeadbeef $iar: 0x10000348 $msr: 0x0000d0b2 $cr: 0x22282489 $link: 0X100001B4 $CTR: 0xdeadbeef $xer: 0x20000020 Condition status = 0:e 1:e 2:e 3:l 4:e 5:g 6:l 7:lo [unset $noflregs to view floating point registers] [unset $novregs to view vector registers] In Main at 0x10000348 0x10000348 (main+0x18) 90640000 STW r3,0x0 (R4) (DBX) Print $r 3 0x00000012 (DBX) Print $r 4 (nil) |
This example is relatively simple, from the final assembly instruction "STW r3,0x0 (R4)" Can simply see that the program core dump is due to 0 address (0+R4) deposit (R3 Register value) caused.
View Multithreading-related information
If the following environment variables take the default OFF value, the system completely disables the appropriate debug list, which means that the DBX command will not show any objects:
Aixthread_mutex_debug
Aixthread_cond_debug
Aixthread_rwlock_debug
can use
Export Aixthread_mutex_debug=on
Open Aixthread_mutex_debug.
- View Thread Information
(DBX) Print $t 1//print basic information for T1 threads
(dbx) attribute
(DBX) condition
(DBX) Mutex
(DBX) Rwlock
(DBX) thread
For example:
(thread_id = 1, State_u = 4, priority = $, policy = other, attributes = 0x20001078)
- Toggles the current thread (the default current thread is to receive a core trigger signal)
(DBX) thread current [Tid]
For example (> indicates the current thread at core dump):
(DBX) thread Thread state-k Wchan state-u k-tid Mode held scope function $t 1 Wait 0x31bbb558 running 10321 k No pro _ptrgl $t 2 wait 0x311fb958 running 6275 k No pro _ptrgl > $t 3 run running 6985 k No pro _p_nsleep $t 4 wait 0x31bbbb18 running 6571 k No pro _ptrgl $t 5 wait 0x311fb9d8 running 7999 k No pro _ptrgl $t 6 wait 0x31bf8f98 running 8257 k No pro _ptrgl $t 7 Wait 0x311fba18 running 8515 k No pro _ptrgl $t 8 wait 0x311fb7d8 running 8773 k No pro _ptrgl $t 9 Wait 0x311fbb18 running 9031 k No pro _ptrgl $t wait 0x311fb898 running 9547 k No pro _ptrgl $t wait 0x311fb818 running 9805 k No pro _ptrgl $t wait 0x311fba58 running 10063 k No pro _ptrgl $t wait 0x311fb8d8 running 10579 k No pro _ptrgl (DBX) Thread current 3 (DBX) where _p_nsleep (??,??) at 0xd005f740 Raise.nsleep (??,??) at 0xd022de3c Sleep (??) at 0xd0260344 Helper (??) at 0x100005ac (DBX) Thread current 4 Warning:thread is in kernel mode, not all registers can be accessed. (DBX) where Ptrgl._ptrgl () at 0xd020e470 Raise.nsleep (??,??) at 0xd022de3c Raise.nsleep (??,??) at 0xd022de3c Sleep (??) at 0xd0260344 Helper (??) at 0x100005ac (DBX) |
Limitations of Core Dump analysis
Do not expect to be able to rely on core dump analysis to solve all the problems, here is a simple example of buffer overflow, in this example, because the buffer overflow overwrite the call stack information, thus completely lost the positioning basis:
[Email PROTECTED]/TMP#>XLC test. C-g-O test2 [Email protected]/tmp#> [Email Protected]/tmp#>./test Input str!
012345678901234567890123456789 Segmentation Fault (Coredump) [Email protected]/tmp#>dbx./test2 Core Type ' help ' for help. [Using memory image in core] Reading symbolic information ...
Segmentation Fault in Test2. At 0x34353634 0x34353634 (???) Warning:unable to access address 0x34353634 from core (DBX) where Warning:unable to access address 0x34353634 from core Warning:unable to access address 0x34353634 from core Warning:unable to access address 0x34353630 from core Warning:unable to access address 0x34353630 from core Warning:unable to access address 0x34353634 from core Warning:unable to access address 0x34353634 from core Warning:unable to access address 0x34353630 from core Warning:unable to access address 0x34353630 from core Warning:unable to access address 0x34353634 from core Warning:unable to access address 0x36373841 from core Test2. () at 0x34353634 Warning:unable to access address 0x36373839 from core Warning:unable to access address 0x36373839 from core (DBX) |
System Dump Analysis
environment variable Settings
The current dump configuration information for the system can be viewed through "sysdumpdev–l":
[Email protected]/#>sysdumpdev-l Primary/dev/hd6 Secondary/dev/sysdumpnull Copy Directory/var/adm/ras Forced copy Flag TRUE Always Allow dump FALSE Dump compression on |
Note that older versions of AIX "Always allow dump" may be turned off by default, it is recommended to open when the system is crash, you can use command sysdumpdev–k, or use Smitty, System Environments-> ; Change/show Characteristics of System Dump menu settings.
Sysdumpdev–l get statistics about the recent dump generated by the system:
#>sysdumpdev-l 0453-039 Device Name:/dev/hd6 Major Device Number:10 Minor Device Number:2 size:18885120 bytes Uncompressed size:113724523 bytes Date/time:sat Jul 14:20:22 beist 2007 Dump status:0 Dump completed successfully Dump copy filename:/var/adm/ras/vmcore.2.z |
In order to ensure that the system appears crash, dump device can save the dump information, need to reasonably configure the size of the dump device, you can use Sysdumpdev–e to estimate the space required by the system dump. The general recommended dump device value is 1.5 times times the size of the sysdumpdev–e estimate.
environment variable Settings
This document only provides some basic dump analysis instructions, please refer to "KDB Kernel debugger and KDB command" for more information.
Preliminary analysis
The KDB dump file analysis requires the use of the kernel file/unix that generated the dump, which is typically collected by SNAP–AC. The preliminary order is as follows:
#kdb./dump./unix
Example:
#kdb./dump./unix The specified kernel file is a 64-bit kernel ./dump mapped from @ 700000000000000 to @ 70000007da53bd5 Preserving 1317350 bytes of symbol table First symbol __MULH Component Names: 1) MiniDump [2 entries] 2) Dmp_minimal [9 entries] 3) proc [481 entries] 4) THRD [1539 entries] 5) RASCT [1 entries] 6) LDR [2 entries] 7) ERRLG [3 entries] 8) MTRC [entries] 9) LFS [1 entries] ) BOS [2 entries] One) IPC [7 entries] VMM [Entries] alloc_kheap [Entries] Alloc_other [Entries] RTASTRC [8 entries] EFCDD [Entries] Eidedd [1 Entries] Sisraid [2 entries] AIXPCM [5 Entries] Scdisk [Entries] ) LVM [2 entries] JFS2 [1 Entries] ) TTY [4 entries] Netstat [ten entries] GOENT_DD [7 Entries] Scsidisk [Entries] EFSCSI [5 Entries] Dump_statistics [1 Entries] Component Dump Table has 2456 entries START END <name> 0000000000001000 0000000003bba050 Start+000fd8 f00000002ff47600 f00000002ffdc920 __ublock+000000 000000002ff22ff4 000000002ff22ff8 environ+000000 000000002ff22ff8 000000002FF22FFC errno+000000 f100070f00000000 f100070f10000000 pvproc+000000 f100070f10000000 f100070f18000000 pvthread+000000 PFT: Pvt: Id.................... 0002 Raddr ..... 0000000000686000 eaddr ..... F200800030000000 Size ..... ..... 00040000 align ... and so on. 00001000 valid. 1 Ros .... 0 fixlmb.1 seg .... 0 wimg ... 2 Dump analysis on Chrp_smp_pci power_pc power_5 machine with 8 available CPU (s) (64-bit Registers) Processing symbol Table ... ... what?... done?.????? |
Analysis Command Example
Status to see the processes that each CPU is running at dump, such as:
0) > Status CPU TID tslot PID pslot proc_name 0 2580f5 14c0f6 332 cron 1 12025 d01a 2 1020BB 258 1580c6 344 expr 3 1502B f01e Wait |
CPU <id> command Toggles the current CPU, the default current CPU is CPU0:
(0) > CPU 1
(1) >
Basic status and related information of the printing system:
(0) > Stat
When printing the system dump, the kernel stack:
(0) > F
Lke is used to list the relevant system file information for the kernel code:
(0) > Lke 003de9cc
Show the last command when the system dump:
(0) > Dr IAR
Displays the log information for virtual storage management, where Exception value of 0000001C means Pagingspace exhausted:
(0) > Vmlog
Display information for the process table:
(0) > Proc
To display information about the thread table:
(0) > th
Display the system's ERRPT information:
(0) > Errpt
ERRORS not READ by Errdemon (ORDERED chronologically):
Error Record:
Erec_flags ........ 1
Erec_len .......... 54
Erec_timestamp ..... 46dcdd9d
Erec_rec_len ....... 34
Erec_dupcount ...... 0
Erec_duptime1 ...... 0
Erec_duptime2 ...... 0
erec_rec.error_id ..... Dd11b4af
Erec_rec.resource_name. Sysproc
00007FFF FFFFD000 00000000 003de9cc ... =......
00000000 00020000 80000000 000290b2 ....... .....
< END >
"Turn" uses DBX, KDB to analyze Coredump under AIX