OSWatcher Black Box (OSW for short) is a small but very useful tool provided by Oracle. It records some performance parameters of the operating system by calling the commands provided by the OS itself, for example, CPU/Memory/Swap/Network IO/Disk IO information.
++ Why must OSW be deployed?
OSW is not mandatory for deployment, and many tools provide the same functions, such as mrtg, cacti, sar, nmon, and enterprise manger grid control.
However, OSW deployment has many advantages:
1. It is easy to deploy and delete.
2. Resource consumption is relatively small, regardless of CPU, memory or disk space.
3. Maintenance is not required at ordinary times. In case of a problem, you can help us quickly locate whether the problem occurs on the OS.
The database runs on the operating system. If an OS exception occurs, the database will certainly be affected. If we analyze such problems from the perspective of the database, it is difficult to have a good result.
In our daily work, there is a common problem: In the past, some database problems occurred, and we often need to find the cause of the problem (root cause ), before you can make some changes to prevent it from happening again. OSW is very useful for such problems. For example:
1. The problem is not caused by an OS exception. At this time, if we collect OSW data when a problem occurs, we can immediately eliminate the OS and focus on the DB/application layer.
2. For ORACLE Database Performance problems, the first direction is to eliminate OS problems.
For example, if the OS has Swapping frequently in a certain period of time, memory-related operations will be affected, and database performance will also decline, as shown in AWR, the database will find latch/mutex-related waiting.
3. The response of an application is very slow in a certain period of time. AWR shows that the database is very idle, and the top 5 wait events are normal. It is also normal in terms of CPU, memory, Swap, and Disk IO. Later I found that OSW's network data showed that there were many packet loss problems. If no OSW data is collected at that time, it is basically impossible to find the cause.
4. For example, some ORA-04030 errors or CJQ0, P00X, J00X process cannot start, If we deploy OSW, then we can immediately know whether these errors are caused by a shortage of OS memory.
5. If a server process is inexplicably hung, we can use the OSW information to check whether the process was in the suspend status and whether it occupied too much CPU/Memory.
6. For some Listener hung problems, we also need the historical information of OSW for further analysis.
7. Login Storm: the customer's database system suddenly slows down. No exceptions were found in the application, database ASH, and AWR reports. However, the output of the OSW ps shows that, in the case of a problem, the oracle server process has thousands more than usual.
In fact, OSW is very helpful for us to analyze the problem. If no monitoring software is deployed on the current OS, it is strongly recommended that DBA deploy OSW. OSW is deployed in many important production environments. When there are DB Performance problems, they often submit OSW output first.
++ Users' concerns about OSW deployment are as follows:
1. The production environment has been running normally for a long time. Therefore, you cannot install any software on your own.
2. Will OSW bring side effects?
The operating mechanism of OSW is to call some tools provided by the OS at intervals, such as ps, vmstat, netstat, mpstat, and top, and then print the output of these tools into the file. It will inevitably consume CPU, Disk IO, Disk Space, and Memory. However, these resources are very small and can be ignored in most systems. In some extreme cases, deploying OSW has a negative impact: the system is very busy and the CPU usage is above 90%. The free space of the disk is no longer available. In most cases, your concerns are unnecessary. There is no risk in deploying OSW.
The following describes how to install/deploy OSW on UNIX/LINUX:
1. Download OSW from document 301137.1
2. Place it somewhere (except/tmp) and decompress it. Root permission is not required
$ Tar xvf osw.tar
3. Start
$ Nohup./startOSWbb. sh 60 48 gzip &
This command is used to start OSW and collect information every 60 seconds. The data of the last 48 hours is retained (historical data is automatically cleared ), the retained data is compressed in gzip format.
4. How to disable it?
$./StopOSWbb. sh
The collected information is stored in the archive directory.
How is it? Simple? As for how to parse the data, there are some advanced OSW usage, and the next step is to break it down.