Disk Space fault solving instances caused by incompatibility of different backup policies

Source: Internet
Author: User
Recently, I received a system and went online for more than one year. During the handover, the business department reported that the disk space was full. At that time, the whole system was paralyzed and finally contacted the developer

Recently, I received a system and went online for more than one year. During the handover, the business department reported that the disk space was full. At that time, the whole system was paralyzed and finally contacted the developer

The Application System life cycle is a whole. In addition to the initial demand research, development, testing, and launch, the longer period is in the O & M aspect. The value of an application system is embodied in the O & M stage. A system O & M environment that often reports errors and faults is difficult to obtain a good user experience.

In practice, if there is no sound communication between software developers and O & M personnel, the new system will not be easily integrated into the original O & M system. In addition, many other faults may occur. This article describes a disk space failure caused by a backup policy conflict.

1. Environment Introduction and faults

I recently received a system and went online for more than one year. During the handover, the business department reported that the disk space was full. At that time, the whole system was paralyzed, and finally the problem was solved by contacting the developer. However, at that time, the feedback was not completely solved, and developers could only find the solution on a regular basis.

Due to limited information channels, I can only observe and analyze the data on the spot. The database server version is Red Hat Linux 6.2 and the database version is 11.2.0.3.

[Root @ DB ~] # Cat/etc/RedHat-release

Red Hat Enterprise Linux Server release 6.1 (Santiago)

SQL> select * from v $ version;

BANNER

---------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0-64bit Production

PL/SQL Release 11.2.0.3.0-Production

CORE 11.2.0.3.0 Production

TNS for Linux: Version 11.2.0.3.0-Production

NLSRTL Version 11.2.0.3.0-Production

The fault is related to the disk space, so the current disk status df is as follows.

[Root @ DB ~] # Df-h

Filesystem Size Used Avail Use % Mounted on

/Dev/sda3 59g 8.4G 48G 15%/

Tmpfs 3.9G 288 K 3.9G 1%/dev/shm

/Dev/sda2 194 M 41 M 143 M 23%/boot

/Dev/sda1 200 M 256 K 200 M 1%/boot/efi

/Dev/sda8 1.4 T 351G 976G 27%/data

/Dev/sda4 59G 23g 34G 40%/home

/Dev/sda5 59G 180 M 56G 1%/tmp

/Dev/sda6 59G 5.9G 50G 11%/var

System space distribution is typical, and resources are relatively rich. The maximum capacity partition/data directory contains nearly 351 TB of data and uses GB. From the oracle user environment variables, the database software is installed in the/home folder, and the data file is in/data.

[Oracle @ DB]/home/oracle> env | grep ORA

ORACLE_BASE =/home/oracle/app

ORACLE_HOME =/home/oracle/app/product/11.2.0/db_1

ORACLE_OWNER = oracle

ORACLE_SID = db

The shema data volume in the business system is very small, only 77 MB. According to business analysis, the system's business data is only stored in the database, and there is no deletion mechanism. In this case, the probability of disk space being full due to the sudden expansion of business data is very low.

The analysis focuses on how the space consumption of/data exceeds GB?

2. Problem Analysis

Go to the/data directory and find that the application backs up RMAN in this directory.

[Root @ DB rman] # pwd

/Data/db/rman

[Root @ DB rman] # ls-l

Total 1312

Drwxr-xr-x. 2 oracle oinstall 409600 Mar 7 bak

-Rw-r --. 1 oracle oinstall 0 Aug 21 2013 get

Drwxr-xr-x. 2 oracle oinstall 921600 Mar 7 logs

-Rwxr-x ---. 1 oracle oinstall 1037 Jul 1 2013 rman_full.sh

Obviously, the/data/db/rman directory is the internal backup mechanism of the application system. At present, many systems have their own database backup modules. From now on, the system plans to use the RMAN program for backup.

The rman_full.sh script in the directory is mainly used to execute the script.

[Root @ DB rman] # cat rman_full.sh

#! /Bin/ksh

# Set env

(Space reasons, omitted ......)

$ BIN/rman log $ BACKUP_LOG/$ TARGET_SID.full. $ DATE_3.log <

Connect target/

Run {

Allocate channel c1 type disk;

Allocate channel c2 type disk;

Backup full database format' $ BACKUP_PATH/$ {DATE_2} _ full _ % d _ % s _ % p _ % u. bak'

Tag = 'full' include current controlfile;

SQL 'alter system archive log current ';

Backup archivelog all format' $ BACKUP_PATH/$ {DATE_2} _ archivelog _ % d _ % s _ % p _ % u. bak ';

Delete noprompt expired backupset of archivelog all;

Release channel c1;

Release channel c2;

}

Crosscheck backup;

Delete noprompt expired backup;

Delete noprompt obsolete;

Exit;

EOF

From a fair perspective, this script does not have any problems. Set environment variables, directory locations, and back up databases and archive files. Then perform crosscheck to check the expired backup information, and delete the expired logs according to the obsolete retention principle.

The bak in the directory structure stores the backup set (although the control file is left in $ ORACLE_HOME/dbs), and the logs directory is a text log. After entering the bak directory, check the backup status.

[Root @ DB bak] # ls | more

20130719_archivelog_db_rj189_1_k5of3j4s.bak

20130719_archivelog_db_1_1__1_k6of3j4t.bak

20130719_full_db_0000180_0000jsof3j1b.bak

20130719_full_db_rj186_1_k2of3j4d.bak

20130720_archivelog_db_2017258_1_maof64d1.bak

20130720_archivelog_db_2017259_1_mbof64d2.bak

20130720_full_pdb_255.255_1_m7of64cn.bak

(Space reason, omitted)

20140307_full_db_1151__127d3p2ho2g.bak

20140307_full_db_1151__1_d4p2ho2g.bak

20140307_full_db_1151__1_d5p2ho47.bak

201401171422. dmp

Full_20130720.tar.gz

Rm

Note: the time and date in the backup slice are in it. The backup set exists since January 1, July 2013. The total data volume is 300 GB.

[Root @ DB bak] # du-h

301G.

This is obviously a problem. In the rman backup script, there is a clear delete obsolete statement to delete unnecessary backup sets. Confirm that the obsolete rule is visible from show all.

RMAN> show all;

RMAN configuration parameters for database with db_unique_name DB are:

Configure retention policy to recovery window of 7 DAYS;

Configure backup optimization off; # default

Configure default device type to disk; # default

Configure controlfile autobackup on;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.