Berkeley DB Reference Guide:
Transaction Protected Applications

PrevRefNext

Archival procedures

The third component of the infrastructure, archival procedures, concerns the recoverability of the database and the disk consumption of the database log files. There are two separate aspects to these issues. First, you may want to periodically create snapshots (i.e., backups) of your databases to make it possible to recover them from catastrophic failure. Second, you will periodically want to remove log files in order to conserve disk space. The two procedures are distinct from each other, and you cannot remove the current log files simply because you have created a database snapshot.

Log files may be removed at any time, as long as:

If log files are first copied to a backup medium before being removed, they may also be used during restoration of a snapshot to restore your databases to a state more recent than that of the snapshot.

The Berkeley DB library supports on-line, or hot, backups. These are backups where applications continue to read and write the database and log files while the snapshot is being taken. Recovery of a hot backup will be to an unspecified time between the start of the archival and when archival is completed. To create a database snapshot as of a specific time, you must stop reading and writing your databases for the entire time of the archival, force a checkpoint (see db_checkpoint), and then archive the files listed by the db_archive utility's -s and -l options.

It is often helpful to think of database archival in terms of full and incremental filesystem backups. A snapshot is a full backup, while the copying of the current logs before they are removed is an incremental. For example, it might be suitable to take a full snapshot of an installation weekly, but archive and remove log files daily. Using both the snapshot and the archived log files, a catastrophic crash at any time during the week can be recovered to the time of the most recent log archival, a time later than that of the original snapshot.

Archival for recovery:

To create a snapshot of your database that can be used to recover from catastrophic failure, the following steps should be taken:

  1. Run db_archive -s to identify all of the database data files, and copy them to a backup device, such as tape. If the database files are stored in a separate directory from the other Berkeley DB files, it may be simpler to archive the directory itself instead of the individual files. If you are performing a hot backup, the utility you use to copy the files must read database pages atomically (as described by Berkeley DB recoverability).

    Note: if any of the database files have not been accessed during the lifetime of the current log files, db_archive will not list them in its output! For this reason, it may be simpler to use a separate database file directory, and archive the entire directory instead of only the files listed by db_archive.

  2. If you are performing a hot backup, run db_archive -l to identify all of the database log files, and copy them to a backup device such as tape. If the database log files are stored in a separate directory from the other database files, it may be simpler to archive the directory itself instead of the individual files.

The order of these two operations is required, and the database files must be archived before the log files. This means that if the database files and log files are in the same directory you cannot simply archive the directory, you must make sure that the correct order of archival is maintained.

Once these steps are completed, your database can be recovered from catastrophic failure to its state as of the time the archival was done. To update your snapshot so that recovery from catastrophic failure is possible up to a new point in time, repeat step #2, copying all existing log files to a backup device.

Each time both the database and log files are copied to backup media, you may discard all previous database snapshots and saved log files. Saving a new set of log files does not allow you to discard either previous database snapshots or saved log files.

The time to restore from catastrophic failure is a function of the number of log records that have been written since the snapshot was originally created. Perhaps more importantly, the more separate pieces of backup media you use, the more likely that you will have a problem reading from one of them. For these reasons, it is often best to make snapshots on a regular basis.

For archival safety, ensure that you have multiple copies of your database backups, that you verify that your archival media is error-free, and that copies of your backups are stored off-site!

To restore your database after catastrophic failure, take the following steps:

  1. Restore the copies of the database files from the backup media.

  2. Restore the copies of the log files from the backup media, in the order in which they were written. The order is important because it's possible that the same log file appears on multiple backups and you only want the most recent version of each log file.

  3. Run db_recover -c to recover the database.

It is possible to recreate the database in a location different than the original, by specifying appropriate pathnames to the -h option of the db_recover utility.

Archival to conserve log file space:

To remove log files, take the following steps:

  1. If you are concerned with catastrophic failure, first copy the log files to backup media as described above. This is because log files are necessary for recovery from catastrophic failure.

  2. Run db_archive without options to identify all of the log files that are no longer in use (e.g., no longer involved in an active transaction).

  3. Remove those log files from the system.

PrevRefNext

Copyright Sleepycat Software