Hardware is not certain to be reliable
The software must be unreliable.
People aren't sure they're unreliable.
and nature must be reliable.
This sheet explains why, how, when to do backups, and how to back up the things that are backed up.
The importance of Backup
The data is valuable. It takes time and effort to recreate it, and it takes money or at least sad and tears, sometimes even impossible to recreate, such as some experimental results. Since data is an investment, you must protect it and take steps to avoid losing it.
Data loss typically has 4 reasons: hardware failure, software curve, human factors, or natural disasters. Although modern hardware is quite reliable, it can still be damaged naturally. The most decisive hardware for storing data is the hard disk, which relies on tiny magnetic regions to hold data in a world full of electrical noise. Modern software is still unreliable, and a truly reliable program is ideal, rare, rather than regular. People are more unreliable, they are apt to make mistakes, and even maliciously destroy data for some purpose. Nature may not be evil, but it may also cause destruction. Everything, the hope that everything is normal, perfect is almost impossible.
Backup is a way to protect your data investment. With multiple copies of the data, you are not afraid of a corruption (all you need to do is recover the missing data from the backup).
It is important to have a proper backup. Just as everything in the physical world is related to other things, backups will expire sooner or later. Good backups ensure that you don't want your backup to be valid. If your backup is broken again, this will make it worse if you have only one backup, it may be at all bad, leaving only you and your hard drive in the smoke of the Ashes. Or when you recover, you find that you have forgotten to back up something important, such as the user database for 15,000 user sites. Best of all, all your backups might is working perfectly, but the last known tape drive reading the kind of tapes to you used Was the one this now has a bucketful of water in it.
When it comes to backups, paranoia are in the job description.
Select Backup media
The most important decision required for backup is to select the backup media. Need to consider cost, reliability, speed, availability, usability.
Cost is important because your data may require multiple storage and multiple backups. Cheap media can be used a lot.
Reliability is the most important, because bad backups can add to the gloom. Backup media must be able to store data for years without damage. As a backup medium, the use of methods affects reliability. The hard drive is generally reliable, but it is not reliable as a backup medium if it is on the same computer as the backup source.
Speed is usually less important if backups can be done interactively. Backup takes 2 hours, no supervision, no matter how long. On the other hand, if the backup can ' t be done when the computer would otherwise to be idle, then speed is also a problem.
Availability is obviously necessary because you cannot use nonexistent backup media. It is not obvious that the media will be available in the future and can be used on other computers. Otherwise, you may not be able to recover your backup after the disaster.
Availability is a major factor in determining the backup cycle. The easier it is to use the backup, the better. Backup media cannot be difficult to use.
Floppy disks and tapes are generally used. Floppy disks are cheap, reliable, not too fast, easy to get, but not easy to use when the volume of data is large. Tape is also cheap, reliable, fast, easy to get, and, depending on the size of the tape, is easy to use.
There are other options. But it's usually bad, but if it's not a problem, sometimes it's good. For example, a magnetic disc has the advantage of having a floppy disk (random access, fast recovery of a single file) and tape (large capacity).
Select Backup tool
Backup has many tools, and traditional UNIX backup tools are tar, cpio, and dump. In addition, you can use a large number of Third-party software packages (including freeware and commercial editions). The choice of backup media may affect the choice of tools.
Tar is similar to Cpio, which is basically equivalent from a backup view. Can save files to tape and remove files. Can use almost all media, because core device drivers handle low-level device operations, which seems to be similar to all devices for user-level programs. There may be problems with writing UNIX versions of tar and Cpio (symbolic connections, device files, very long pathname files, and so on), but Linux can handle all files correctly.
Unlike dump, it reads the file system directly, not through the file system. It is also written specifically for backups; Tar and Cpio are really for archiving files, although they work to backups as well.
The direct read file system has some advantages, it may not consider time stamps back up all the files, for tar and Cpio, the file system must be read-only installation. The direct read file system is more efficient if everything is backed up because it makes the head move the least. Its main drawback is that each file system type requires a specific backup program, and the Linux dump program only understands the Ext2 file system.
Dump also directly supports backup levels (discussed below); for tar and cpio, this must be implemented with other tools.
A comparison of Third-party backup tools is beyond the scope of this book. The Linux Software map lists a number of freeware.
Simple backup
A simple backup scenario is to back up everything at once, and then back up everything that changed since the last backup. The first backup is called full backup, and later called I incremental backup ncremental backups. Full backups are a lot more time-consuming than incremental backups, because there's more to tape, and full backups may not be in a tape (let alone floppy). It may take more time to save an incremental backup than a full backup. Backups can be optimized by saving all changed files with an incremental backup since the last full backup. In this way, backups may require more work, but you only need to back up one full backup and one incremental backup.
If you have 6 tapes that you want to back up on a daily basis, you can use tape 1 as your first full backup (for example, in Friday) with tape 2-5 as an incremental backup (Monday to Thursday). Then use tape 6 to make a new full backup (second Friday), and then use tape 2-5 for an incremental backup. Do not overwrite the old full backup (tape 1) before you complete the new full backup, while the problem occurs while doing a full backup. With a new full backup tape 6, it is best to save the tape 1 in another place so that if a full backup tape is lost in the fire, there is one. When you do the next full backup, save the tape 6 with tape 1.
If you have more than 6 tapes, you can use a full backup. Every time you make a full backup, you should use the oldest tape. This way you will have full backups for the last few weeks, which is useful if you want to find a file that has now been deleted, or an older version of a file.
Back up with Tar
A full backup can be easily implemented in tar:
# TAR-CREATE-FILE/DEV/FTAPE/USR/SRC
Tar:removing Leading/from Absolute path names in the archive
#
The example above uses the GNU version of Tar and its long option name. Traditional versions of Tar only understand single character options. The GNU version also handles backups that cannot be accommodated by a tape or a disk, and a long pathname; this is not what all traditional versions can do. (Linux only uses GNU tar.) )
If your backup tape does not fit, you will need to use the-multi-volume (-m) option:
# TAR-CMF/DEV/FD0H1440/USR/SRC
Tar:removing Leading/from Absolute path names in the archive
Prepare volume #2 for/dev/fd0h1440 and hit return:
#
Note Format all floppy disks before starting the backup, or format it with another Virtual Console or virtual terminal when tar requires a new floppy disk.
After the backup is complete, you should check that it is intact and use the-compare (-d) Option:
# tar-compare-verbose-f/dev/ftape
usr/src/
Usr/src/linux
usr/src/linux-1.2.10-includes/
....
#
A failed backup check means that if you lose the original data, the backup cannot be recovered.
Incremental backups can be implemented using TAR with the-newer (-N) Option:
# Tar-create-newer ' 8 SEP 1995 '-file/dev/ftape/usr/src-verbose
Tar:removing Leading/from Absolute path names in the archive
usr/src/
usr/src/linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/linux/
usr/src/linux-1.2.10-includes/include/linux/modules/
usr/src/linux-1.2.10-includes/include/asm-generic/
usr/src/linux-1.2.10-includes/include/asm-i386/
usr/src/linux-1.2.10-includes/include/asm-mips/
usr/src/linux-1.2.10-includes/include/asm-alpha/
usr/src/linux-1.2.10-includes/include/asm-m68k/
usr/src/linux-1.2.10-includes/include/asm-sparc/
Usr/src/patch-1.2.11.gz
#
Unfortunately, tar cannot know the change of information about the I node of a file, for example, the change in the file's permission bit, or the file name. This can be used with the Find command and compare the current file system state to the list of previously backed up files. Scripts and programs for this can be found on the Linux FTP site.
Save with Tar
The-extract (-X) option for tar expands the file:
# tar-extract-same-permissions-verbose-file/dev/fd0h1440
usr/src/
Usr/src/linux
usr/src/ linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/ linux/
Usr/src/linux-1.2.10-includes/include/linux/hdreg.h
usr/src/linux-1.2.10-includes/include/linux/ Kernel.h
...
#
You can also use the command line to expand only specific files and directories (and the files and subdirectories in them):
# tar xpvf/dev/fd0h1440 usr/src/linux-1.2.10-includes/include/linux/hdreg.h
Usr/src/linux-1.2.10-includes/include/linux/hdreg.h
#
Use the-list (-t) option to see what files are in a backup volume:
# tar-list-file/dev/fd0h1440
usr/src/
Usr/src/linux
usr/src/linux-1.2.10-includes/
usr/src/linux-1.2.10-includes/include/
usr/src/linux-1.2.10-includes/include/linux/
Usr/src/linux-1.2.10-includes/include/linux/hdreg.h
Usr/src/linux-1.2.10-includes/include/linux/kernel.h
...
#
Note that tar is always sequential to read a backup volume, so large volumes will be slow. It is not possible to use random Access database technology when using tape drives or other sequential media.
Tar does not process delete file properties. If you need to recover a file system from a full backup and an incremental backup, and you delete a file between 2 backups, when you're done, the file is there again. This is a big problem if this file contains sensitive data that should be deleted.
Multilevel backup
The section above outlines a simple backup method that is used for personal use or for small sites. For most heavy load use, multilevel backups are more applicable.
A simple backup has 2 backup levels: Full and incremental. You can usually have any number of backup levels. Full backups are level 0, and incremental backups at different levels are 1, 2, 3 ... level, each incremental backup level backs up everything that has changed since the last backup of the same or previous level.
Such a lot of purpose is to allow longer backup history to be more cheaply backed up history. In the previous example, backup history goes back to the previous full backup. You can increase the tape to extend the backup history, but each new tape expands for a week, which can be too expensive. Longer backup history is useful because deleted or corrupted files may not be discovered for long periods of time. Even if it's not the latest version of a file, it's better than nothing.
Multilevel backups can extend backup history more cheaply. For example, if you have 10 tapes, available tapes 1 and 2 do Monthly backups (first Friday of the month), tape 3-6 do weekly backups (other Friday, because up to 5 cycles per month, so it takes 4 tapes), tape 7-10 do daily backups (Monday to Thursday). With only 4 tapes added, the 2-week backup history is extended to 2 months. Admittedly, we can't recover all versions of every file in these 2 months, but it's often good enough to recover.
The backup level allows the file system to be restored with minimal time. If you have a lot of incremental backups that are just monotonous growth levels, to restore the entire file system, you need to back up all the backups. And if the number of levels does not grow monotonically, you can reduce the number of backups and back-save.
To minimize the amount of tape data that is needed for a callback, you can do each incremental tape with a small level. However, the time for each incremental backup is increased (each backup copies everything that has changed since the last full backup). A good proposal is given in the Dump man page and described in table 9.2. Use the following succession of backup Levels:3, 2, 5, 4, 7, 6, 9, 8, 9 ... This keeps the backups and the time saved to a lesser extent. The most you have to backup are two day ' s worth of work. The number of tapes required for recovery depends on the full backup interval, but it is less than the simple solution.
A good plan lowers the workload and can search for more things. You are must decide if it is worth it.
Dump has built-in support for backup levels. and tar and Cpio must be implemented with shell scripts.
Back up what?
You may want to back up as many as possible. The main exceptions are software that is easy to reinstall, but even they have configuration files that are important to backup to avoid all reconfiguration of the software. Another major exception is the/proc file system, which is not a good idea to back up because they only contain data that is usually generated automatically by the core. Especially/proc/kcore file is not necessary, because it is your current physical memory image, and very large.
Gray areas include the news spool, log files, and many other things in/var. You have to decide what to focus on.
The most obvious backup is the user files (/home) and System configuration files (/etc, but there may also be other things scattered around the file system.
Compress backup
Backup takes up a lot of space and costs a lot of money. To reduce space requirements, backups can be compressed. There are several ways. Some programs have built-in support for compression. For example, the-gzip (-z) option of the GNU tar, through the pipeline (pipe), is compressed using the GZIP compression program before writing to the backup media.
Unfortunately, compressing a backup can cause problems. Because of the principle of compression, a bit error can cause all other compressed data to be unavailable. Some backup programs have built-in error correction, but there is no way to handle a large number of errors. That is, if you compress the backup with GNU tar, a single error back causes the entire backup to be lost. Backups must be reliable, so the compression method is not good.
Another method is to compress each file individually, which results in the loss of one file, but not other files. The missing file may have been corrupted for some reason, so this situation is not much worse than not using compression. Afio Program (a variant of the Cpio) can be this way.
Compression takes time, which may make the backup program unable to write data fast enough for a tape drive. This can be avoided by output buffering (if the backup program is smart enough to be built in, otherwise it can be done through other programs), but even that might not work good enough. This can only be a problem on a slow computer.