Red Hat Linux 8.0: The Official Red Hat Linux System Administration Primer | ||
---|---|---|
Prev | Chapter 8. Planning for Disaster | Next |
Backups have two major purposes:
To permit restoration of individual files
To permit wholesale restoration of entire file systems
The first purpose is the basis for the typical file restoration request: a user accidentally deletes a file, and asks that it be restored from the latest backup. The exact circumstances may vary somewhat, but this is the most common day-to-day use for backups.
The second situation is a system administrator's worst nightmare: for whatever reason, the system administrator is staring at hardware that used to be a productive part of the data center. Now, it is little more than a lifeless chunk of steel and silicon. The thing that is missing is all the software and data you and your users have assembled over the years. Supposedly everything has been backed up. The question is: has it?
And if it has, will you be able to restore it?
If you look at the kinds of data[1] processed and stored by a typical computer system, you will find that some of the data hardly ever changes, and some of the data is constantly changing.
The pace at which data changes is crucial to the design of a backup procedure. There are two reasons for this:
A backup is nothing more than a snapshot of the data being backed up. It is a reflection of that data at a particular moment in time.
Data that changes infrequently can be backed up infrequently; data that changes more frequently must be backed up more frequently.
System administrators that have a good understanding of their systems, users, and applications should be able to quickly group the data on their systems into different categories. However, here are some examples to get you started:
This data only changes during upgrades, the installation of bug-fixes, and any site-specific modifications.
Tip | |
---|---|
Should you even bother with operating system backups? This is a question that many system administrators have pondered over the years. On the one hand, if the installation process is relatively easy, the application of bug-fixes and customizations are well documented and easily reproducible, simply reinstalling the operating system may be a viable option. On the other hand, if there is the least doubt that a fresh installation can completely recreate the original system environment, backing up the operating system is the best choice. |
This data changes whenever applications are installed, upgraded, or removed.
This data changes as frequently as the associated applications are run. Depending on the specific application and your organization, this could mean that changes take place second-by-second, or once at the end of each fiscal year.
This data changes according to the usage patterns of your user community. In most organizations, this means that changes take place all the time.
Based on these categories (and any additional ones that are specific to your organization), you should have a pretty good idea concerning the nature of the backups that are needed to protect your data.
Note | |
---|---|
You should keep in mind that most backup software deals with data on a directory or file system level. In other words, your system's directory structure will play a part in how backups will be performed. This is another reason why it is always a good idea to carefully consider the best directory structure for a new system, grouping files and directories according to their anticipated usage. |
Red Hat Linux comes with several different programs for backing up and restoring data. By themselves, these utility programs do not constitute a complete backup solution. However, they can be used as the nucleus of such a solution, and as such, warrant some attention.
The tar utility is well known among UNIX system administrators. It is the archiving method of choice for sharing ad-hoc bits of source code and files between systems. The tar implementation included with Red Hat Linux is GNU tar, one of the more feature-rich tar implementations.
Backing up the contents of a directory can be as simple as issuing a command similar to the following:
tar cf /mnt/backup/home-backup.tar /home/ |
This command will create an archive called home-backup.tar in /mnt/backup/. The archive will contain the contents of the /home/ directory. The archive file can be compressed by adding a single option:
tar czf /mnt/backup/home-backup.tar.gz /home/ |
The home-backup.tar.gz file is now gzip compressed.
There are many other options to tar; to learn more about them, read the tar man page.
The cpio utility is another traditional UNIX program. It is an excellent general-purpose program for moving data from one place to another and, as such, can serve well as a backup program.
The behavior of cpio is a bit different from tar. Unlike tar, cpio reads the files it is to process via standard input. A common method of generating a list of files for cpio is to use programs such as find whose output is then piped to cpio:
find /home | cpio -o > /mnt/backup/home-backup.cpio |
This command creates a cpio archive called home-backup.cpio in the /mnt/backup directory.
There are many other options to cpio; to learn more about them see the cpio man page.
The dump and restore programs are Linux equivalents to the UNIX programs of the same name. As such, many system administrators with UNIX experience may feel that dump and restore are viable candidates for a good backup program under Red Hat Linux. Unfortunately, the design of the Linux kernel has moved ahead of dump's design. Here is Linus Torvald's comment on the subject:
From: Linus Torvalds To: Neil Conway Subject: Re: [PATCH] SMP race in ext2 - metadata corruption. Date: Fri, 27 Apr 2001 09:59:46 -0700 (PDT) Cc: Kernel Mailing List <linux-kernel At vger Dot kernel Dot org> [ linux-kernel added back as a cc ] On Fri, 27 Apr 2001, Neil Conway wrote: > > I'm surprised that dump is deprecated (by you at least ;-)). What to > use instead for backups on machines that can't umount disks regularly? Note that dump simply won't work reliably at all even in 2.4.x: the buffer cache and the page cache (where all the actual data is) are not coherent. This is only going to get even worse in 2.5.x, when the directories are moved into the page cache as well. So anybody who depends on "dump" getting backups right is already playing Russian roulette with their backups. It's not at all guaranteed to get the right results - you may end up having stale data in the buffer cache that ends up being "backed up". Dump was a stupid program in the first place. Leave it behind. > I've always thought "tar" was a bit undesirable (updates atimes or > ctimes for example). Right now, the cpio/tar/xxx solutions are definitely the best ones, and will work on multiple filesystems (another limitation of "dump"). Whatever problems they have, they are still better than the _guaranteed_(*) data corruptions of "dump". However, it may be that in the long run it would be advantageous to have a "filesystem maintenance interface" for doing things like backups and defragmentation.. Linus (*) Dump may work fine for you a thousand times. But it _will_ fail under the right circumstances. And there is nothing you can do about it. |
Given this problem, the use of dump/restore is strongly discouraged.
Now that we have seen the basic utility programs that do the actual work of backing up data, the next step is to determine how to integrate these programs into an overall process that does the following things:
Schedules backups to run at the proper time
Manages the location, rotation, and usage of backup media
Works with operators (and/or robotic media changers) to ensure that the proper media is available
Assists operators in locating the media containing a specific backup of a specific file
As you can see, a real-world backup solution entails much more than just typing a tar command.
Most system administrators at this point look at one of two solutions:
Create an in-house developed backup system from scratch
Purchase a commercially-developed solution
Each approach has its good and bad points. Given the complexity of the task, an in-house solution is not likely to handle some aspects (most notably media management) very well. However, for some organizations, this might not be a shortcoming.
A commercially-developed solution is more likely to be highly functional, but may also be overly-complex for the organization's present needs. That said, the complexity might make it possible to stick with one solution even as the organization grows.
As you can see, there is no clear-cut method for deciding on a backup system. The only guidance that can be offered is to ask you to consider these points:
Changing backup software is difficult; once implemented, you will be using the backup software for a long time. After all, you will have long-term archive backups that you will need to be able to read. Changing backup software means you must either keep the original software around, or you must convert your archive backups to be compatible with the new software.
The software must be 100% reliable when it comes to backing up what it is supposed to, when it is supposed to.
When the time comes to restore any data — whether a single file, or an entire file system — the backup software must be 100% reliable.
Although this section has dealt with a build-or-buy decision, there is, in fact, another approach. There are open source alternatives available, and one of them is included with Red Hat Linux.
AMANDA is a client/server based backup application produced by the University of Maryland. By having a client/server architecture, a single backup server (normally a fairly powerful system with a great deal of free space on fast disks, and configured with the desired backup device) can back up many client systems, which need nothing more than the AMANDA client software.
This approach to backups makes a great deal of sense, as it concentrates those resources needed for backups in one system, instead of requiring additional hardware for every system requiring backup services. AMANDA's design also serves to centralize the administration of backups, making the system administrator's life that much easier.
The AMANDA server manages a pool of backup media, and rotates usage through the pool in order to ensure that all backups are retained for the administrator-dictated timeframe. All media is pre-formatted with data that allows AMANDA to detect whether the proper media is available or not. In addition, AMANDA can be interfaced with robotic media changing units, making it possible to completely automate backups.
AMANDA can use either tar or dump to do the actual backups (although under Red Hat Linux using tar is preferable, due to the issues with dump raised in the Section called dump/restore: Not Recommended!). As such, AMANDA backups do not require AMANDA in order to restore files — a decided plus.
In operation, AMANDA is normally scheduled to run once a day during the data center's backup window. The AMANDA server connects to the client systems, and directs the clients to produce estimated sizes of the backups to be done. Once all the estimates are available, the server constructs a schedule, automatically determining the order in which systems will be backed up.
Once the backups actually start, the data is sent over the network from the client to the server, where it is stored on a holding disk. Once a backup is complete, the server starts writing it out from the holding disk to the backup media. At the same time, other clients are sending their backups to the server for storage on the holding disk. This results in a continuous stream of data available for writing to the backup media. As backups are written to the backup media, they are deleted from the server's holding disk.
Once all backups have been completed, the system administrator is emailed a report outlining the status of the backups, making review easy and fast.
Should it be necessary to restore data, AMANDA contains a utility program that allows the operator to identify the file system, date, and file name(s). Once this is done, AMANDA identifies the correct backup media, accesses, and restores the desired data. As stated earlier, AMANDA's design also makes it possible to restore data even without AMANDA's assistance, although identification of the correct media would be a slower, manual process.
This section has only touched upon the most basic AMANDA concepts. If you would like to do more research on AMANDA, your Red Hat Linux system has additional information. To learn more, type the following command for a list of documentation files available for AMANDA:
rpm -qd amanda-server |
(Note that this command will only work if you have installed the amanda RPMs on your Red Hat Linux system.)
You can also learn more about AMANDA from the AMANDA website at http://www.amanda.org/.
If you were to ask a person that was not familiar with computer backups, most would think that a backup was simply an identical copy of the data on the computer. In other words, if a backup was created Tuesday evening, and nothing changed on the computer all day Wednesday, the backup created Wednesday evening would be identical to the one created on Tuesday.
While it is possible to configure backups in this way, it is likely that you would not. To understand more about this, we first need to understand the different types of backups that can be created. They are:
Full backups
Incremental backups
Differential backups
The type of backup that was discussed at the beginning of this section is known as a full backup. A full backup is simply a backup where every single file is written to the backup media. As noted above, if the data being backed up never changes, every full backup being created will be the same.
That similarity is due to the fact that a full backup does not check to see if a file has changed since the last backup; it blindly writes it to the backup media whether it has been modified or not.
This is the reason why full backups are not done all the time — every file is written to the backup media. This means that a great deal of backup media is used even if nothing has changed. Backing up 100 gigabytes of data each night when maybe 10 megabytes worth of data has changed is not a sound approach; that is why incremental backups were created.
Unlike full backups, incremental backups first look to see whether a file's modification time is more recent than its last backup time. If it is not, that file has not been modified since the last backup and can be skipped this time. On the other hand, if the modification date is more recent than the last backup date, the file has been modified and should be backed up.
Incremental backups are used in conjunction with an occasional full backup (for example, a weekly full backup, with daily incrementals).
The primary advantage gained by using incremental backups is that the incremental backups run more quickly than full backups. The primary disadvantage to incremental backups is that restoring any given file may mean going through one or more incremental backups until the file is found. When restoring a complete file system, it is necessary to restore the last full backup and every subsequent incremental backup.
In an attempt to alleviate the need to go through every incremental backup, a slightly different approach was implemented. This is known as the differential backup.
Differential backups are similar to incremental backups in that both backup only modified files. However, differential backups are cumulative — in other words, with a differential backup, if a file is modified and backed up on Tuesday night, it will also be backed up on Wednesday night (even if it has not been modified since).
Of course, all newly-modified files will be backed up as well.
Like the backup strategy used with incremental backups, differential backups normally follow the same approach: a single periodic full backup followed by more frequent differential backups.
The affect of using differential backups in this way is that the differential backups tend to grow a bit over time (assuming different files are modified over the time between full backups). However, the benefit to differential backups comes at restoration time — at most, the latest full backup and the latest differential backup will need to be restored.
We have been very careful to use the term "backup media" throughout the previous sections. There is a reason for that. Most experienced system administrators usually think about backups in terms of reading and writing tapes, but today there are other options.
At one time, tape devices were the only removable media devices that could reasonably be used for backup purposes. However, this has changed. In the following sections we will look at the most popular backup media, and review their advantages as well as their disadvantages.
Tape was the first widely-used removable data storage medium. It has the benefits of low media cost, and reasonably-good storage capacity. However, tape has some disadvantages — it is subject to wear, and data access on tape is sequential in nature.
These factors mean that it is necessary to keep track of tape usage (retiring tapes once they have reached the end of their useful life), and that searching for a specific file on tape can be a lengthy proposition.
On the other hand, tape is one of the most inexpensive mass storage media available, and it has a long history of reliability. This means that building a good-sized tape library need not consume a large part of your budget, and you can count on it being usable now and in the future.
In years past, disk drives would never have been used as a backup medium. However, storage prices have dropped to the point where, in some cases, using disk drives for backup storage does make sense.
The primary reason for using disk drives as a backup medium would be speed. There is no faster mass storage medium available. Speed can be a critical factor when your data center's backup window is short, and the amount of data to be backed up is large.
But disk storage is not the ideal backup medium, for a number of reasons:
Disk drives are not normally removable. One key factor to an effective backup strategy is to get the backups out of your data center and into off-site storage of some sort. A backup of your production database sitting on a disk drive two feet away from the database itself is not a backup; it is a copy. And copies are not very useful should the data center and its contents (including your copies) be damaged or destroyed by some unfortunate set of circumstances.
Disk drives are expensive (at least compared to other backup media). There may be circumstances where money truly is no object, but in all other circumstances, the expenses associated with using disk drives for backup mean that the number of backup copies will be kept low to keep the overall cost of backups low. Fewer backup copies mean less redundancy should a backup not be readable for some reason.
Disk drives are fragile. Even if you spend the extra money for removable disk drives, their fragility can be a problem. If you drop a disk drive, you have lost your backup. It is possible to purchase specialized cases that can reduce (but not entirely eliminate) this hazard, but that makes an already-expensive proposition even more so.
Disk drives are not archival media. Even assuming you are able to overcome all the other problems associated with performing backups onto disk drives, you should consider the following. Most organizations have various legal requirements for keeping records available for certain lengths of time. The chance of getting usable data from a 20-year-old tape is much greater than the chance of getting usable data from a 20-year-old disk drive. For instance, would you still have the hardware necessary to connect it to your system? Another thing to consider is that a disk drive is much more complex than a tape cartridge. When a 20-year-old motor spins a 20-year-old disk platter, causing 20-year-old read/write heads to fly over the platter surface, what are the chances that all these components will work flawlessly after sitting idle for 20 years?
Note | |
---|---|
Some data centers back up to disk drives and then, when the backups have been completed, the backups are written out to tape for archival purposes. In many respects this is similar to how AMANDA handles backups. |
All this said, there are still some instances where backing up to disk drives might make sense. In the next section we will see how they can be combined with a network to form a viable backup solution.
By itself, a network cannot act as backup media. But combined with mass storage technologies, it can serve quite well. For instance, by combining a high-speed network link to a remote data center containing large amounts of disk storage, suddenly the disadvantages about backing up to disks mentioned earlier are no longer disadvantages.
By backing up over the network, the disk drives are already off-site, so there is no need for transporting fragile disk drives anywhere. With enough network bandwidth, the speed advantage you can get from disk drives is maintained.
However, this approach still does nothing to address the matter of archival storage (though the same "spin off to tape after the backup" approach mentioned earlier can be used). In addition, the costs of a remote data center with a high-speed link to the main data center make this solution extremely expensive. But for the types of organizations that need the kind of features this solution can provide, it is a cost they will gladly pay.
Once the backups are complete, what happens then? The obvious answer is that the backups must be stored. However, what is not so obvious is exactly what should be stored — and where.
To answer these questions, we must first consider under what circumstances the backups will be used. There are three main situations:
Small, ad-hoc restoration requests from users
Massive restorations to recover from a disaster
Archival storage unlikely to ever be used again
Unfortunately, there are irreconcilable differences between numbers 1 and 2. When a user accidentally deletes a file, they would like it back immediately. This implies that the backup media is no more than a few steps away from the system to which the data is to be restored.
In the case of a disaster that necessitates a complete restoration of one or more computers in your data center, if the disaster was physical in nature, whatever it was that destroyed your computers would also destroy the backups sitting a few steps away from the computers. This would be a very bad state of affairs.
Archival storage is less controversial; since the chances that it will ever be used for any purpose are rather low, if the backup media was located miles away from the data center there would be no real problem.
The approaches taken to resolve these differences vary according to the needs of the organization involved. One possible approach is to store several days worth of backups on-site; these backups are then taken to more secure off-site storage when newer daily backups are created.
Another approach would be to maintain two different pools of media:
A data center pool used strictly for ad-hoc restoration requests
An off-site pool used for off-site storage and disaster recovery
Of course, having two pools implies the need to run all backups twice, or to make a copy of the backups. This can be done, but double backups can take too long, and copying requires multiple backup drives to process the copies (and a probably-dedicated system to actually perform the copy).
The challenge for a system administrator is to strike a balance that adequately meets everyone's needs, while ensuring that the backups are available for the worst of situations.
While backups are a daily occurrence, restorations are normally a less frequent event. However, restorations are inevitable; they will be necessary, so it is best to be prepared.
The important thing to do is to look at the various restoration scenarios detailed throughout this section, and determine ways to test your ability to actually carry them out. And keep in mind that the hardest one to test is the most critical one.
The phrase "restoring from the bare metal" is system administrator's way of describing the process of restoring a complete system backup onto a computer with absolutely no data of any kind on it — no operating system, no applications, nothing.
Although some computers have the ability to create bootable backup tapes, and to actually boot from them to start the restoration process, the PC architecture used in most systems running Red Hat Linux do not lend themselves to this approach. However, some alternatives are available:
A rescue disk is usually a bootable CD-ROM that contains enough of a Linux environment to perform the most common system administration tasks. The rescue disk environment contains the necessary utilities to partition and format disk drives, the device drivers necessary to access the backup device, and the software necessary to restore data from the backup media.
Here the base operating system is installed just as if a brand-new computer were being initially set up. Once the operating system is in place and configured properly, the remaining disk drives can be partitioned and formatted, and the backup restored from the backup media.
Red Hat Linux supports both of these approaches. In order to be prepared, you should try a bare metal restore from time to time (and especially whenever there has been any significant change in the system environment).
Every type of backup should be tested on a periodic basis to make sure that data can be read from it. It is a fact that sometimes backups are performed that are, for one reason or another, unreadable. The unfortunate part in all this is that many times it is not realized until data has been lost and must be restored from backup.
The reasons for this can range from changes in tape drive head alignment, misconfigured backup software, and operator error. No matter what the cause, without periodic testing you cannot be sure that you are actually generating backups from which data can be restored at some later time.
[1] | We are using the term data in this section to describe anything that is processed via backup software. This includes operating system software, application software, as well as actual data. No matter what it is, as far as backup software is concerned, it is all simply data. |