Since the database server has been upgraded from redhat4.6 to redhat5.5, there are occasional sql2043n when using TSM backups
To view the error:
[CPP]View Plaincopy
- [Email protected] ~]$ DB2? sql2043n
- sql2043n Unable to start a child process or thread.
- Explanation:
- Unable to start up the child processes or threads required during the
- Processing of a database utility. There may is not enough available
- Memory to create the new process or thread. The utility stops
- Processing.
- User Response:
- Ensure the system limit for number of processes or threads have not been
- Reached (either increase the limit or reduce the number of processes or
- Threads already running). Ensure that there was sufficient memory for the
- New process or thread. Resubmit the utility command
From the description looks like the database in the request memory failure, but the memory should be very abundant, redhat4.6 16G, upgrade to redhat5.5
After it has been raised to 64G, should not be out of memory, but the backup to do every day, occasionally fail once can accept, did not care, then on the shift process
Found in other systems in the group will occasionally appear sql2043n, it seems that this does not seem to be an occasional phenomenon, at night to go home Baidu sql2043n, get
An unexpected harvest, find the following explanation on the official website:
[CPP]View Plaincopy
- Problem (Abstract)
- ASLR or Address Space Layout randomization is a feature that's activated by default on some of the newer Linux Distri Butions. It is designed to the load shared memory objects in random addresses.
- In DB2, multiple processes map a shared memory object at the same address across the processes. It was found this DB2 cannot guarantee the availability of address for the shared memory object when ASLR is turned on .
- Important NOTE:DB2 10.1 have been enhanced so, ASLR can be safely enabled.
- Symptom
- this conflict in the address space means that a process trying to attach a shared memory object to a specific Address may not be able to do so, resulting in a failure in shmat subroutine. However, on subsequent retry (using a new process) the shared memory attachment may work. the result is a random Set of failures. some processes that have been known to see this error are: db2pd, db2egcf, and db2vend.
- Some of the behaviors seen include the following:
- For the DB2PD command, it would report no data found even through the instance/database is active:
- Database SAMPLE not activated on database partition 0.
- For the DB2EGCF process, used for HA monitoring, the DB2EGCF may incorrectly determine the instance are down and Initia Te a failover.
- For the db2vend process, backup and log archive methods might fail with an error indicating a child process could not be s tarted:
- sql2043n Unable to start a child process or thread.
- Diagnosing the problem
- When this problem are suspected, check Db2diag.log for the Shmat failure like the following. Note that the same error message can also occur for a different cause. Hence, it ' s important to note the process of that reported this error.
- Function:db2 UDB, SQO Memory Management, SQLOCSHR, probe:180
- Message:zrc=0x850f0005=-2062614523=sqlo_noseg
- "No Storage Available for allocation"
- DIA8305C Memory allocation failure occurred.
- Called:os,-, Shmat Oserr:einval (22)
- Resolving the problem
- 1) Disable ASLR temporarily (change was only effective until next boot):
- Run "Sysctl-w kernel.randomize_va_space=0" as root.
- 2) Disable ASLR immediately and on all subsequent reboots:
- Add the following line to/etc/sysctl.conf:
- Kernel.randomize_va_space=0
- And then run "Sysctl-p" as the root of the change take effect immediately.
The general meaning is that the Linux memory randomization address feature causes the DB2 process not to properly attach to a shared memory object, so why does Linux turn on this feature?
In Baidu randomize_va_space keyword:
Linux Kernel introduces the concept of address space layout randomization, which is proposed for security reasons. Imagine if the address of the stack space is OK, then the malicious code is easy
Through memory overflow code to access the contents of the stack space, the address space layout randomization is to make the layout of the process virtual space (mainly the starting address of the various parts) in a random position,
To reduce the likelihood of being attacked.
A value of 0 in/proc/sys/kernel/randomize_va_space indicates that all randomization is turned off, and if 1, opens Mmap base, Stack, Vdso page randomization, if
A 2 indicates that the heap address randomization is further opened on the basis of 1. Before opening heap address randomization, the start of the heap is immediately following the application BSS segment.
After understanding these suddenly remembered in peacetime use DB2PD time, also will appear sql2043n, and then run once is normal, because DB2PD through attach DB2 shared memory to obtain the database
monitoring data, so DB2PD is a lightweight tool that has less impact on database performance
After setting kernel.randomize_va_space=0 on the server, there is no such error
Randomize_va_space features of sql2043n and Linux