AIX Dump file learning notes)

Source: Internet
Author: User
Tags ibm server

DUMP File Overview

In order to enhance the fault analysis capability, the IBM server has added the ability to save the current environment for device faults, it is to save the data and status information of the memory, CPU registers, IO, and other devices when the device fails. if the system does not stop, but a program is dead, core dump will be generated, generate a CORE file in the current directory. If the operating System dies, System DUMP or System Crash is generated, which usually causes System downtime. Shows the DUMP record.

As a general customer, you only need to collect DUMP information and report it to IBM engineers. When the system DUMP occurs, the machine will be down. Possible causes include: the system has encountered an unknown accident during kernel operations, or it cannot be properly processed, which will cause DUMP. The system administrator can also issue a command to force the system DUMP.
When the system performs DUMP, the DUMP management facility automatically copies kernel-related data (kernel segment0 and other memory blocks recorded in the primary DUMP table by the kernel or kernel extension) to the primary DUMP device. DUMP can be understood as a snapshot of the system at that time for future analysis. Analysis DUMP can be performed on other machines, but a copy of the kernel program of this machine needs to be copied, that is, unix_mp or unix_mp64. there is no kernel program corresponding to DUMP for DUMP analysis at lunch.
DUMP Generation Process

Core dump Generation Process

When an abnormal process occurs, such as Invalid Address Access, floating point exception, or command exception, the system will be transferred to the kernel for exception handling (that is, interrupt processing ), send specific signals to the corresponding process, such as SIGSEGV, SIGFPE, and SIGILL. If the application process registers the corresponding signal processing function (for example, the signal processing function can be registered through sigaction ), the corresponding processing function is called for processing (the application can choose to record information and generate core dump and exit); otherwise, the default action will be taken, for example, the default action of SIGSEGV is to generate core dump and exit the program.
During coredump, the operating system will terminate the process and release the resources it occupies. Normally, the application process coredump will not cause harm to the operation of the system. Of course, if there are other processes related to this process in the system, these processes will be affected. The consequences depend on the specific handling of this exception.
Because related commands are already included in executable files, core files generally only contain memory information related to process exceptions. For details about the format, see/usr/include/sys/core. h or the "Files Reference" section in the AIX help document. We usually need to combine core files and executable programs to analyze the problem.
Note: Because process signal processing is asynchronous in nature, the routines used in the signal processing functions registered by the application process must ensure the security of asynchronous signals. For example, routines starting with pthread _ cannot be used.
System dump Generation Process
The specific process of system exception dump is similar to that of the application process. However, because it is closer to the underlying layer, the resources (such as the file system) to avoid problems are included in the resources needed to generate dump, as a result, dump cannot be generated, and the operating system generally generates dump in the simplest way. For example, if the system memory is smaller than 4 GB, dump is generally directly generated in pagingspace. If the system memory is larger than 4 GB, a dedicated lg_dumplv logical volume (bare device) is created ), the default dump device is/dev/hd6, And the next device is/dev/sysdumpnull to save the dump information. When the system is restarted, if the configured DUMP directory (directory in the file system) has enough space, it will be converted into a file system file. By default, is a file such as vmcore * under/var/adm/ras.
The following are common dump device size rules:
When the server memory is larger than 4 GB, a dedicated region will be created for the system dump when AIX is installed. The logical volume name is lg_dumplv. The default size is allocated according to the following rules:
4 GB <= the server memory <12 GB lg_dump size is 1 GB
12 GB <= Server Memory <24 GB lg_dump size: 2 GB
24 GB <= the server memory <48 GB lg_dump size is 3 GB
48 GB <= the server memory lg_dump size is 4 GB
System dump can be solved by upgrading the microcode, improving the system patch level, and upgrading the driver.
Environment variable settings
You can use the/etc/security/limits file to limit the basic configuration parameters of each user, including the core size. You can also use ulimit to change the core size limit in the current environment.
By default, the file name core is used when the application process generates a core dump. To prevent process cores in the same working directory from overwriting each other, you can define the environment variable CORE_NAMING = true and then start the process to generate a file named core. pid. ddhhmmss. You can run the file core Command to check which process generates the core.
By default, the application process dump contains all the shared memory. If you want to exclude shared memory Content During dump, you can set the environment variable CORE_NOSHM = true before starting the process.
The fullcore parameter is used to control whether a complete core is generated during coredump. To avoid information loss, we recommend that you enable fullcore. You can use lsattr-El sys0 to check whether fullcore is enabled. Use chdev-l sys0-a fullcore = true to change the fullcore status to open. If you want the system to automatically restart after DUMP (useful for remote administrators, otherwise the administrator must go to the site and press the switch to restart the computer, you can run lsattr-El sys0 to check whether autorestart is true, use chdev-l sys0-a autorestart = true to change the autorestart status to open. Both can be modified through smit chgsys's smit menu.

DUMP File Management

Because DUMP files are complex and generally handed over to IBM engineers for analysis, this article will not discuss them. The following sections mainly discuss the management of DUMP files.

View the configuration information of the current DUMP Device

# Sysdumpdev-l
Primary/dev/lg_dumplv # Master DUMP Device
Secondary/dev/sysdumpnull # secondary DUMP Device
Copy directory/var/adm/ras # DUMP file copy directory
Forced copy flag TRUE # Whether to copy the DUMP file to a peripherals
Always allow dump FALSE # always perform DUMP
Dump compression ON # Whether to enable DUMP File compression
Type of dump traditional

Note:

1. The old version of AIX "always allow dump" may be disabled by default. We recommend that you enable it to locate the problem during system crash. When this option is set to true, the system automatically generates a DUMP when you press the server reset button or preset DUMP keyboard sequence.

Open command

# Sysdumpdev-KP

Close command

# Sysdumpdev-kP

Alternatively, use the smitty-> System Environments-> Change/Show Characteristics of System Dump menu settings.

2. When the system is restarted, if force copy flag is set to true, you may be prompted to copy the dump to an external media, such as a tape. In this way, when the disk directory is not enough, you also have the opportunity to retain the system DUMP (usually the DUMP device shares the same logical volume with the system swap partition, And the swap zone will be overwritten after the system starts.

3. To allow DUMP File compression, run the following command:

Open command

# Sysdumpdev-CP

Close command

# Sysdumpdev-cP

Sysdump command example
Create a DUMP Device

# Mklv-y dumplv-t sysdump rootvg 10

Temporarily assign the logical volume hd7 as the primary dump device:
# Sysdumpdev-p/dev/hd7

Estimated size of the required dump device:
# Sysdumpdev-e or smit dump_estimate

Temporarily assign the tape device rmt0 as a secondary dump device:
# Sysdumpdev-s/dev/rmt0
Display previous dump statistics:
# Sysdumpdev-L
Permanently change the database object of the primary dump device to/dev/newdisk1, and enter:
# Sysdumpdev-P-p/dev/newdisk1
Check whether a new system dump exists. Enter:
# Sysdumpdev-z
If a system dump occurs recently, output similar to the following occurs:
4537344/dev/hd7
Assign the remote dump file/var/adm/ras/systemdump (on the host mercury) to the primary dump device, and enter:
# Sysdumpdev-p mercury:/var/adm/ras/systemdump
Enter the colon (:) between the host name and file name :.
Specify the directory on which the dump will be copied after the system crashes (if the dump device is/dev/hd6), enter:
# Sysdumpdev-d/tmp/dump
This will try to copy the dump from/dev/hd6 to/tmp/dump after the system crash. If an error occurs during the replication process, the system continues to boot, but the dump is lost.
Specify the directory on which to copy the dump after the system crashes. If the dump device is/dev/hd6, enter:
# Sysdumpdev-D/tmp/dump
This will try to copy the dump from/dev/hd6 to the/tmp/dump directory after the crash. If the copy fails, a menu is prompted to allow manual copying of the dump to an external media.

 

 

-C Specifies not to compress the dump. The-c flag is only applicable to AIX 4.3.2 and later versions.
-C Specify that all future dump files are compressed before they are written to the dump device. The-C flag is only applicable to AIX 4.3.2 and later versions.
-D Directory Specifies the directory to which the dump is copied during system boot. If the replication fails during boot, the-d flag ignores the system dump.
-D Directory Specifies the directory to which the dump is copied during system boot. If replication fails during boot, use the-D flag to allow you to copy the dump to an external media.
Note: When the-d Directory or-D Directory flag is used, the following errors are detected:
  • The directory does not exist.
  • The directory is not in the local log file system.
  • The directory is not in the rootvg volume group.

 

 

 

-E Estimate the dump size (in bytes) of the currently running system ). If the dump is compressed, the displayed size is the estimated size after compression.
-I Indicates that the sysdumpdev command is called from the system function. This flag is used only by system utilities. If the function that is not an automatic IBM function has modified the valid value, the-I flag will not change the request; that is, the-I flag will not overwrite the previous changes.
-I Reset the instruction information of previous changes. After the-I flag is specified, you can use the-I flag to change it.
-K If your machine has a key switch, The key must be in the service location before the reset button or dump Key sequence is used to force the dump. This is the default setting.
-K If your machine has a key-based switch, the reset button or dump sequence will force the switch when the key is in the normal position, or force the dump on machines without a key-based switch.
Note: For machines without a key switch, the reset button cannot be used to force the dump, and the dump cannot be performed on machines with the key switch not set this value.
-L Lists the current values of the Primary and Secondary dump devices, copy directories, and forcecopy attributes.
-L Displays the statistics of the most recent system dump. This includes the date and time of the most recent dump, the number of bytes written, and the completion status. If a dump is compressed, this flag displays both the original uncompressed size of the dump and the compressed size of the dump. The size after compression is the size of the actually written dump device. Note: The displayed dump size may not reflect the exact size of the dump on the media. The size of the disk and the copy block varies slightly.
-P Make the dump device specified by the-p or-s sign a permanent device. The-P Flag can only be used with the-p or-s flag.
-P Device Temporarily change the primary dump device to the specified device. This device can be a logical volume or tape device. For Network dump, the device can be the host name and path name.
-Q Do not output all messages to standard output. If the flag is used with the-l,-r,-z, or-L flag, the-q command is ignored.
-R Host: Path Releases the space used by remote dumping files on the server Host. Path specifies the location of the dump file.
-S Device Temporarily change the secondary dump device to the specified device. This device can be a logical volume or tape device. For Network dump, the device can be the host name and path name.
-Z Check whether a new system dump has occurred. If this occurs, a string containing the dump size (in bytes) and the dump device name is written to the standard output. If no new system dump exists, nothing is returned. After running the sysdumpdev-z command on the existing system dump, the dump is not considered to be the latest one.

Errpt error E87EF1BE Solution

E87EF1BE 0926082807 p o dumpcheck The largest dump device is too small.
Information. It is determined that the lg_dumplv capacity for storing dump files is insufficient. Generally, the recommended dump device value is 1.5 times the sysdumpdev-e value.
The steps for resizing are as follows:
1. view the estimated lg_dumplv size.
# Sysdumpdev-e
0453-041 Estimated dump size in bytes: 1287651328
That is, 1.2 GB
2. The current lg_dumplv size
# Lslv lg_dumplv
Pp size: 256 megabyte (s)
PPs: 4
After calculation, the current capacity is 1 GB. You need to resize 0.2 GB.

3. Check whether the capacity of the vg where lg_dumplv is located is sufficient.
# Lsvg rootvg
Pp size: 256 megabyte (s)
TOTAL PPs: 1092 (279552 megabytes)
FREE PPs: 826 (211456 megabytes)
After calculation, the remaining vg capacity is 206.5 GB because the root disk is mirrored. therefore, the available remaining capacity is about GB. because the pp size is 256 m, 2 PPS is expanded, that is, 0.5 GB (in fact, 1 pp can be expanded. 2 .)
4. Resizing
# Extendlv lg_dumplv 2
If it is a DUMP of the PAGING space, it should be # chps-s n hd6 (How many LP is n)
5. Check the current lg_dumplv size.
# Lslv lg_dumplv
Pp size: 256 megabyte (s)
PPs: 6
That is, the current capacity is 1.5 GB.
6. Run the dumpcheck command to check whether errpt information is displayed.
#/Usr/lib/ras/dumpcheck
# Errpt
If it does not appear, it is successful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.