AIX routine maintenance
1. Whether the file system is full
Method: DF-K can check the file system usage in K units. (More than 90%, need to be adjusted)
2. Use errpt | more to check system error logs
Clear existing Log: errclear 0
3. Check whether the system is valid or illegal
Use the last command to check the logon location.
4. Check whether the system generates a huge core file.
Use Find/-name core-print to check. You can delete core files directly.
5. system performance check:
A) CPU performance: Use vmstat and topas to check
B) memory usage: topas and vmstat are also used for checking.
C) Check Io balancing usage: Use iostat to check
D) swap space usage: Use LSPs-a to check
6. Mail check
7. diag once a month
Add two points:
1. Checks the status of each indicator and the availability of each physical device.
2. Process Check. Check whether a dead process exists.
Use who-D to find some dead processes!
AIX commands and Common Operations
I. login upon startup
Before starting the system, check whether the power supply is properly plugged in. Then, after you press the white power switch on the front panel, the host enters the hardware self-check and boot phase. At this time, the liquid crystal code on the front panel will beat, each code indicates different stages of self-check or boot. At the end of the boot, the code on the front panel LCD disappears, and the color display or terminal is displayed. The system initialization and logon prompts are displayed. If the host stops on a code and cannot be booted (more than half an hour), it indicates that the system may fail. Contact ipacs and report the code to us.
Ii. Shutdown
1) Close the application
2) use the hacmp software to stop hacmp with Smit clstop.
3) Press shutdown-F on the command line to shut down. If "Halt completed" appears on the display, press the front panel.
On the white power switch, turn off the power.
If you need to reboot after shutdown, run the "shutdown-fr" command.
III. Basic Definition
1) physical volume (PV)
Physical Volume refers to the hard disk, which is expressed by hdiskx in Aix.
7133 a hard disk in the disk array is represented by pdiskx, and physical volume is represented by hdiskx In the AIX operating system corresponding to pdiskx.
Use the lspv command to check the relationship between PV and VG.
Run the lsdev-CC disk command to check the hard disk status. available indicates that the hard disk is available, and defined indicates that the physical hard disk is available.
No. Only logical definitions are available.
2) volume group (VG)
Volume group is a set of multiple physical volume.
Physical partition is the smallest unit of space allocated by VG.
The volume group of the AIX operating system is rootvg.
You can use lsvg to view VG information,
Use lsvg-O to view open volume groups.
3) logical volume (LV) and filesystem (FS)
Logical volume and filesystem are spaces divided on volume groups. They cannot be divided across multiple volume groups and can only be expanded
Can be reduced.
Filesystem is created on the LV. during use, it must be mounted to a directory in Aix.
Use lsvg-l vg_name to view all LV and fs on VG.
Use the DF command to view filesystem usage
Run the mount command to view the mounted filesystem.
Iv. Daily System Management
AIX uses the Smit tool (Smitty is the character Interface) for system management.
1) Add, modify, and delete users
Smit user
2) add, modify, and delete a volume group
Smit VG
3) add, modify, and delete logical volumes
Smit LV
4) add, modify, and delete a File System
Smit FS
5) network settings and query
Smit TCPIP
Minimal Configuration
Enter the IP address, subnet mask, gateway, and other parameters;
Change start now to Yes
Netstat-I/netstat-in
6) Routine Maintenance
DF and errpt are usually used to check the file system usage and whether there are any new error logs.
If the file system usage exceeds 90%, you need to expand the file system.
After calling errpt on the command line, the following error logs can be displayed;
Error_identifier timestamp t Cl resource_name error_description
192ac071 0101000070 I 0 errdemon error logging turned off
0e017ed1 0405131090 p H MEM2 memory failure
9 dbcfdee 0101000070 I 0 errdemon error logging turned on
038f2580 0405131090 U h scdisk0 undetermined error
Aa8ab241 0405130990 I O Operator notification
Timestamp indicates the date, in mmddhhmmyy format; for example, 0405131090 indicates January 1, 1990.
April 5
In the T (type) column, P indicates a permanent error, t indicates a zero-time error, and u indicates that the error cannot be determined.
, I indicates information rather than error.
In the CL (class) column, h indicates a hardware error, s indicates a hardware error, and O indicates notifying the operator.
If the column T (type) is P and the column Cl (class) is H, it indicates a serious error and you need to contact IBM.
V. System Backup (only rootvg backup)
Backup is the responsibility of the user. The following steps can be used to back up rootvg. Other data must be backed up separately. System Backup depends on
Usually 1-2 months, where rootvg data changes (such as system parameter modification) to back up immediately. It is best to use several tape copies in turn. The backup tape must indicate the backup time and content.
1) log on with the root user
2) mount the file system to be backed up under rootvg
3) Smit mksysb
Select backup media
Change expand/tmp filesystem if needed to Yes
Vi. Diagnosis
Diag
-> System verification
Select the hardware device to be diagnosed
7. Common commands
Set-o vi is similar to the doskey command in PC dos. Press ESC and then press a few K keys to reproduce the previous
Command.
Export term = VT100/ibm3153/LFT set the terminal type
Lsdev-C and lscfg-V display hardware configuration
1. df-K file system space usage
2. lsvg lists all VG names in the system.
Lsvg-O list activated VG names
Lsvg vgname lists detailed information of a specified VG
Lsvg-l vgname: list the LV conditions on the specified VG
3. lsdev-p-h: list the devices supported by Aix (that is, the device objects in the pre-defined ODM database)
Lsdev-C lists the device objects in the specified ODM database of the devices supported by Aix.
Lsdev-cc xxx lists the information of a device object in the Custom ODM Database
For example:
Lsdev-CC Disk
Lsdev-CC tape
Lscfg lists the configurations of installed resources on the System
Lscfg-Vl: List VPD information of a device
For example:
Lscfg-VL ent1
Lscfg-VL hdisk1
Lscfg-l xxx/* lists information about devices without VPD.
Lscfg-l proc /*
Lscfg-l hdisk /*
4. modify attributes of files and directories
4.1 chmod
For example:
Chmod 765 XXX modify the file XXX property to 765
Chmod-r 765 XXX: Change the directory XXX and Its subdirectories and file attributes to 765.
4.2 chown
For example, chown User: usergroup XXX modifies the owner of file XXX to user: usergroup.
Chmod-r user: usergroup XXX modify the directory XXX and Its subdirectories and file owner as user: usergroup
5. system performance testing tools
Vmstat memory, pagespace, CPU, etc. (vmstat 2 10 is executed once every 2 seconds, 10 times in total)
Iostat hard disk I/O monitoring
Netstat Nic monitoring
Topas comprehensive detection tool
6. View PS Processes
PS-Ef | grep process name
For example:
PS-Ef | grep SAP
PS-Ef | grep Oracle
PS-Ef | grep TSM
PS-Ef | grep Cluster
For other commands, refer to the Smitty tool.
Basis Learning
Showmount-e IP
Lsdev-CC adapter | grep ent view Nic
Lsdv-CC if interface for viewing NICs
Ifconfig-
Netstat-in: View Nic IP address information
Netstat-Rn route
Lssrc-T Telnet
Lssrc-T FTP
VI/etc/inetd. conf
Disable remote root login, modify the/etc/security/user file, and use false as the rlogin value in the root option.
Which log files need to be concerned for routine maintenance of Aix?
File description suggestions
Core and snapcore are dump files generated by applications. They can be used to diagnose and delete errors.
The output result of the nohup. Out nohup command can be deleted.
. Xerrors X11 output results can be truncated
Emails in mbox users' mailboxes can be truncated.
Smit. log and Smit. Script users can retain or delete the last 1000 lines of logs after using the Smit command.
/Var/adm/wtmp records the user's login information, which is a binary file. Use the who command to read the user's content. Retain the useful content for 60 days as needed, and delete the remaining content.
/Etc/security/failedlogin records the logon information of user failures. It is a binary file. You can use the who command to read the information, retain the useful content for 60 days as needed, and delete the remaining content.
/Var/adm/sulog logs using the su command can retain useful content for 60 days as needed, and delete other logs
Logs of/var/adm/cron/log cron can be truncated.
/Var/tmp/snmpd. Log SNMP monitoring process logs can be truncated
/Var/tmp/dpid. Log
/Var/tmp/dpid2.log
/Var/tmp/hostmidb. Log
/Var/tmp/muxatmd SNMP subsystem logs can be truncated
Emails that failed to be deleted by dead. letter can be deleted.
Trcfile trace utility output can be deleted
/Var/adm/messages logs used to record syslog processes can be properly retained to the last 1000 lines, or deleted
/Etc/shutdown. Log System Shutdown Process log, use the shutdown-l command to generate the last 1000 lines that can be properly retained, or delete
How to automatically kill UNIX zombie Processes
Author: Cao suhua
The computer application of the PICC system has transitioned from the standalone operation mode to the company-centered centralized processing mode. county-level branches log on to the host of the Municipal company through WAN and remote telnet. Due to network issues, some processes suddenly become stiff. These zombie processes consume a large amount of resources and directly affect the normal operation of the machine. In order to automatically kill these dead processes in real time, I wrote the shell program autokill.
Autokill script
#
# Autokill
#
PS-Ef | awk '{print $1, $2, $7, $8}' |
Awk '/[0-9] [0-9]: [0-9] [0-9]: [1-9] [0-9]/{print $1, $2, $3, $4} '|
Awk '! /Root/{print "Kill-9" $2} '>/tmp/k_kill
Chmod 777/tmp/k_kill
/Tmp/k_kill
Autokill program description
First, run the Unix Command PS-Ef to view the process status and send it to awk in a pipeline for processing.
In the first awk, obtain the values of four fields: Process User ID (UID), process ID (PID), CPU usage time (time), and process execution command (CMD.
In the second awk, select all rows in the matching mode through pattern matching. In awk, [0-9] matches 0 ~ Any number in 9, [1-9] matches 1 ~ If two [0-9] [0-9] digits in 9 match any two digits, [0-9] [0-9]: [0-9] [0-9]: [1-9] [0-9] matches the value of the time field to find processes that occupy more than 10 seconds of CPU time; if you want to find a process that occupies more than half an hour, change the mode to [0-9] [0-9]: [3-9] [0-9]: [0-9] [0-9].
In the third awk, use "! /Root/"filters out the processes generated by the root user, assembles the shell language, and directs the final result to the file/tmp/k_kill. In the/tmp/k_kill file, all are shell commands, such as kill-9 123.
At the end of the autokill program, run/tmp/k_kill to kill the process.
View intermediate results
The autokill program uses the pipeline processing method. To view the intermediate results, you can disconnect the pipeline in sequence.
Step 1: PS-ef
Step 2: PS-Ef | awk '{print $1, $2, $7, $8 }'
Step 3: PS-Ef | awk '{print $1, $2, $7, $8}' |
Awk '/[0-9] [0-9]: [0-9] [0-9]: [1-9] [0-9]/{print $1, $2, $3, $4 }'
Step 4: PS-Ef | awk '{print $1, $2, $7, $8}' |
Awk '/[0-9] [0-9]: [0-9] [0-9]: [1-9] [0-9]/{print $1, $2, $3, $4} '|
Awk '! /Root/{print "Kill-9" $2} '>/tmp/k_kill
Finally, view the/tmp/k_kill file.
In addition, if you only want to automatically kill a process executed by a user (such as jdc3206), you only need to set the mode "! Change/root/"to"/jdc3206/". If you only want to kill the process that executes a command (such as Xinmu), set the mode to"! /Root/"to"/Xinmu.
Finally, use crontab-e to add a cron job.
0, 30 */tmp/autokill
What if the kill command fails?
After the preceding steps, the system executes/tmp/autokill every 30 minutes. However, in Unix systems, some dead processes cannot be killed by the kill command. This requires a reboot of the machine. On the one hand, the system garbage is cleared, and on the other hand, resources need to be re-allocated. The central operation mode is adopted, and the machine cannot be switched on or off at will. The machine can be switched on and off until the user does not use the machine at night. Write a shell program to enable the machine to automatically switch on and off. The following is the AutoReboot script.
#
# AutoReboot
#
Path =/bin:/etc:/usr/bin:/TCB/bin:/usr/Informix/bin
Informixdir =/usr/Informix
Informixserver = da3206a
Onconfig = onconfig. YCA
Export path informixdir informixserver onconfig
Onmode-ky
Sync
Sync
Reboot
The first five lines of the AutoReboot program are to set the Informix system environment. The onmode-ky command is to disable the Informix online database, and the command sync is the Unix File System superblock write-back, the Reboot Command is a system restart command in a UNIX system.
Run the crontab-e command to add a cron job 30 6 ***/tmp/auto_boot.
This command causes the system to re-switch the machine at every day. If you are using a dual-host system, you have to re-switch the machine on both machines. The time is set to be consistent.