Linux administrator Manual (8)-Backup

Source: Internet
Author: User
Tags ftp site

Hardware is not necessarily reliable
The software must be unreliable.
People are not necessarily reliable.
Naturally, it must be reliable.

This section describes why, how, and when to back up data, and how to back up data.

Backup importance

Data is valuable. It takes you time and effort to reproduce it, and you have to spend money or at least tears, sometimes not even re-produce it, such as some experimental results. Since data is an investment, you must protect it and take measures to avoid loss.

There are four common causes of data loss: hardware failure, software curve, human factors, or natural disasters. Although modern hardware is quite reliable, it may still be damaged. The most decisive hardware for data storage is the hard disk, which relies on tiny magnetic areas to store data in a world full of electricity and noise. Modern software is still unreliable, a truly reliableProgramIs ideal, rare, not regular. People are less reliable. They can easily make mistakes and even maliciously destroy data for some purpose. It may not be evil, but it may also cause damage. Everything, Hope everything is normal, perfect is almost impossible.

Backup is a way to protect data investment. If you have multiple copies of data, you are not afraid of any damage (all you need to do is recover lost data from the backup ).

Correct backup is important. Just as everything in the physical world is related to others, backups will become invalid sooner or later. Make sure that the backup is valid. You do not want your backup to be invalid. If your backup is broken again, it will make it worse. If you only have one backup, it may be bad at all, leaving only the ashes of smoke between you and the hard disk. Or when you recover, you forget to back up some important things, such as the user database of the 15000 user sites. Best of all, all your backups might be working perfectly, but the last known tape drive reading the kind of tapes you used was the one that now has a bucketful of water in it.

When it comes to backups, paranoia is in the job description.

Select backup media

The most important decision for backup is to select the backup media. Costs, reliability, speed, availability, and availability must be considered.

Cost is very important, because your data may require multiple storage and multiple backups. A lot of inexpensive media can be used.

Reliability is the most important, because bad backup will make it worse. The backup media must be able to store data for years without corruption. As the backup medium, the use method affects the reliability. Hard Disks are generally reliable, but they are not very reliable as backup media, if they are in the same computer as the backup source.

Speed is usually not very important, if the backup can be completed without interaction. It takes 2 hours to back up data. No supervision is required. It does not matter how long it takes. On the other hand, if the backup can't be done when the computer wowould otherwise be idle, the speed is also a problem.

It is obviously necessary because you cannot use a backup medium that does not exist. It is not obvious that this medium will be available in the future and can be used on other computers. Otherwise, you may not be able to recover your backup after the disaster.

Availability is the main factor that determines the backup cycle. The easier the backup is, the better. Backup media cannot be used.

Generally, floppy disks and tapes are used. The floppy disk is cheap, reliable, not fast, and easy to get. However, it is not easy to use when the data volume is large. Tape is cheap, reliable, fast, and easy to use. It depends on the tape capacity and is easy to use.

There are other options. But the availability is usually not good, but if this is not a problem, it is sometimes good. For example, a disk has the advantages of a floppy disk (Random Access, which can quickly restore a single file) and a tape (Large capacity.
Select Backup Tool

There are many backup tools. The traditional UNIX backup tools are tar, cpio, and dump. In addition, a large number of third-party software packages (including freeware and commercial version) can be used ). The selection of backup media may affect the selection of tools.

Tar and cpio are similar. From the backup perspective, the two are basically equivalent. Files can be stored on tape and taken out. Almost all media can be used, because the core device driver handles low-level device operations, it seems that all devices are similar to user-level programs. There may be problems with writing tar and cpio for non-common files (symbolic connections, device files, files with extremely long path names, etc.) in UNIX versions, but Linux can correctly process all files.

Different from dump, dump directly reads the file system instead of the file system. It is also written specifically for backups; tar and cpio are really for archiving files, although they work for backups as well.

Directly Reading the file system has some advantages. It may not take time stamps into account to back up all files. For tar and cpio, you must first read-only install the file system. Direct Reading of the file system is more effective if everything needs to be backed up because it minimizes the movement of the head. Its main disadvantage is that each file system type requires a specific backup program. Linux dump only understands ext2 file systems.

Dump also directly supports the backup level (discussed below); for tar and cpio, this must be implemented using other tools.

Third-party backup tools are beyond the scope of this book. Linux software map lists many freeware instances.

Simple backup

A simple backup solution is to back up everything at a time and then back up everything changed after the last backup. The first backup is full backup, and the subsequent backup is ncremental backups. Full backup is more time-consuming and labor-consuming than Incremental backup, because there are more things to be written to the tape, and full backup may not be placed like a tape (not to mention a floppy disk ). Compared with full backup, Incremental Backup may take more time. Backup can be optimized in this way, that is, since the last full backup, the Incremental backup is used to save all the files that have been changed. In this way, backup may require more work, but you only need to save a full backup and an incremental backup.

If you want to back up 6 tapes every day, you can use tape 1 for the first full backup (for example, on Friday) and tape 2-5 for Incremental Backup (from Monday to Thursday ). Next, use tape 6 for a new full backup (the second Friday), and then use tape 2-5 for Incremental backup. Do not overwrite the old full backup (Tape 1) before completing the new full backup, and a problem occurs during full backup. With the new full backup tape 6, it is best to save Tape 1 in another place, so that if there is a full backup tape lost in the fire, there will be another one. When the next full backup is done, tape 6 is saved with tape 1.

If you have more than 6 tapes, you can use more tapes for full backup. The oldest tape should be used for each full backup. In this way, you will have a full backup in the last few weeks. It is useful if you want to find a deleted file or an old version of a file.

Back up with tar

A full backup can be easily implemented using tar:

# Tar-create-file/dev/ftape/usr/src
Tar: removing leading/from absolute path names in the archive
#

The preceding example uses the tar and its long option names of the GNU version. In traditional versions, tar only supports the single-character option. The GNU version can also process backups that cannot be attached to a disk or a long path name. Not all traditional versions can do this. (Only GNU Tar is used in Linux .)

If you cannot store a backup tape, you need to use the-multi-volume (-m) option:

# Tar-CMF/dev/fd0h1440/usr/src
Tar: removing leading/from absolute path names in the archive
Prepare Volume #2 for/dev/fd0h1440 and hit return:
#

Note that you need to format all the floppy disks before starting the backup, or use another virtual console or virtual terminal to format the new floppy disk when tar requires a new floppy disk.
After the backup, check whether it is in good condition. Use the-compare (-d) option:

# Tar-compare-verbose-F/dev/ftape
Usr/src/
Usr/src/Linux
Linux-1.2.10-includes/usr/src/
....
#

The failed backup check means that if you lose the original data, the backup cannot be restored.

Incremental Backup can be implemented using tar with the-newer (-N) option:

# Tar-create-newer '8 Sep 1995 '-file/dev/ftape/usr/src-verbose
Tar: removing leading/from absolute path names in the archive
Usr/src/
Linux-1.2.10-includes/usr/src/
Usr/src/linux-1.2.10-includes/include/
Usr/src/linux-1.2.10-includes/include/Linux/
Usr/src/linux-1.2.10-includes/include/Linux/modules/
Usr/src/linux-1.2.10-includes/include/ASM-generic/
Usr/src/linux-1.2.10-includes/include/asm-i386/
Usr/src/linux-1.2.10-includes/include/ASM-MIPS/
Usr/src/linux-1.2.10-includes/include/ASM-alpha/
Usr/src/linux-1.2.10-includes/include/asm-m68k/
Usr/src/linux-1.2.10-includes/include/ASM-iSCSI/
Usr/src/patch-1.2.11.gz
#

Unfortunately, tar cannot know the changes in the I node information of a file, for example, changes in the File Permission bit or file name. The find command can be used to compare the current file system status and the list of previously backed up files. The scripts and programs used for this can be found on the Linux FTP site.

Save with tar

Tar's-extract (-x) Option expands the file:

# tar-extract-same-permissions-verbose-file/dev/fd0h1440
usr/src/Linux
usr/src /linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/Linux/
usr/src/linux-1.2.10-includes/include/Linux/ hdreg. h
usr/src/linux-1.2.10-includes/include/Linux/kernel. h
...
#

You can also use the command line to expand only a specific file and directory (and its files and subdirectories ):
# Tar xpvf/dev/fd0h1440 usr/src/linux-1.2.10-includes/include/Linux/hdreg. h
Usr/src/linux-1.2.10-includes/include/Linux/hdreg. h
#
Use the-List (-T) option to check the files in a backup volume:
# Tar-list-file/dev/fd0h1440
Usr/src/
Usr/src/Linux
Linux-1.2.10-includes/usr/src/
Usr/src/linux-1.2.10-includes/include/
Usr/src/linux-1.2.10-includes/include/Linux/
Usr/src/linux-1.2.10-includes/include/Linux/hdreg. h
Usr/src/linux-1.2.10-includes/include/Linux/kernel. h
...
#
Note that tar always reads a backup volume sequentially, so large volumes are slow. Random database access technology is not possible when a tape drive or other sequence media is used.
Tar does not delete file attributes. If you need to recover a file system from a full backup and an incremental backup, and you have deleted a file between the two backups, the file will exist again after the recovery. This is a big problem if the file contains sensitive data that should be deleted.

Multi-level backup

The previous section outlines the simple backup method for personal or small websites. Multi-level backup is more suitable for most heavy-load applications.

Simple backup has two backup levels: Full backup and Incremental backup. Usually there can be any number of backup levels. Full backup is at level 0, and Incremental backup at different levels is at level 1, 2, 3.... Each Incremental Backup level backs up everything that changes after the same or higher level of last backup.

The goal is to allow a longer backup history at a lower cost. In the previous example, the backup history is traced back to the previous full backup. You can increase the number of tapes to extend the backup history, but each new tape is extended for one week, which may be too expensive. A longer backup history is useful because deleted or corrupted files may not be found for a long time. Even if it is not the latest version of a file, it is better.

Multi-level backup can be cheaper to extend the backup history. For example, if you have 10 tapes, you can use tape 1 and tape 2 for monthly backup (the first Friday of each month) and tape 3 to 6 for weekly backup (other Friday, because there may be a maximum of five frids per month, 4 tapes are required. The tapes are backed up 7-10 days (from Monday to Thursday ). With only four tapes added, the backup history for two weeks is extended to two months. It is true that we cannot recover all versions of each file in the past two months, but the restoration is always good enough.

At the backup level, it takes the least time to restore the file system. If you have many incremental backups that only support monotonic growth, you need to save all the backups to restore the entire file system. If the number of levels does not increase monotonically, the number of backups and memories can be reduced.

Each incremental tape can be created at a small level to minimize the size of tape data required for storage. However, this will increase the duration of each Incremental Backup (each backup copies everything changed after the last full backup ). A good solution is recommended on the dump man page and described in table 9.2. Use the following succession of backup levels: 3, 2, 5, 4, 7, 6, 9, 8, 9... this reduces the time required for backup and storage. The most you have to backup is two day's worth of work. The number of tapes required for recovery depends on the full backup interval, but it is less than a simple solution.

A good solution reduces the workload and can pursue more things. You must decide if it is worth it.

Dump provides built-in support for the backup level. Tar and cpio must be implemented using shell scripts.

What is backup?

You may want to perform as many backups as possible. The main exception is software that is easy to reinstall, but even they have configuration files that are important to backup to avoid reconfiguration of all these software. Another major exception is the/proc file system, because it only contains data automatically generated by the core, it is never a good idea to back up them. In particular, the/proc/kcore file is unnecessary because it is only an image of your current physical memory and is very large.

Gray areas include the news spool, log files, and other things in/var. You must decide what to consider.

The most obvious backups are user files (/home) and system configuration files (/etc), but there may be other things that are scattered elsewhere in the file system.

Compressed backup

Backup occupies a large amount of space and requires a large amount of money. Backup can be compressed to reduce space requirements. There are several methods. Some programs support compression. For example, the-gzip (-z) option of GNU Tar is compressed by a gzip compression program before being written to the backup media through a pipe (PIPE.

Unfortunately, compression backup may cause problems. Due to the compression principle, if a bit error occurs, all other compressed data may be unavailable. Some backup programs have built-in error correction, but there is no way to handle a large number of errors. That is to say, if you use GNU Tar to compress the backup, a separate error will lead to the loss of the entire backup. Backup must be reliable, so the compression method is not good.

Another method is to compress each file separately, which also causes loss of one file, but does not affect other files. The lost file may have been damaged for some reason, so this is not much worse than not using compression. Afio Program (a variant of cpio.

compression takes time. Which may make the backup program unable to write data fast enough for a tape drive. this can be avoided by the output buffer (if the backup program is smart enough, it can be built in, otherwise it can be through other programs), but even that might not work well enough. this is only a problem on a slow computer.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.