Backups

Introduction

Backup refers to making copies of data so that the copies may be used to restore the original data after a data loss event. Backups are extrememly important for computer management. It is not a question of if you will lose data sometime. It is a question of when that will happen. When it does happen you don’t want to be caught in the wrong place without all of your data backed up. The five most important words in computing are “DID YOU BACK IT UP?”.

Backups are useful primarily for restoring a state following a disaster and restoring small numbers of files after they have been accidentally lost. Since a backup system contains at least one copy of all data worth saving, the data storage requirements can be high. The most difficult part of backing up data is organizing the storage space and managing the process. There are many different types of data storage devices that are used for making backups. There are also many different ways in which these devices can be arranged to provide geographic redundancy, data security, and portability.

Before data gets onto the backup devices it must be extracted from the system. There are many different techniques for doing this. These include dealing with open files and live data sources as well as compression, encryption, and de-duplication. It is important to understand the limitations and human factors involved in any backup scheme.

Storage

Models

Any backup strategy starts with a concept of data repository. The backup data needs to be stored and organized. It can be as simple as writing a list of all backups on a sheet of paper or generating a computerized catalog. The easiest data repository model to have is no model (or unstructured model). That could mean just having a stack of CDs or DVDs sitting around your desk somewhere that each holds certain data. This method is OK if your backing up your own computer and only certain parts of your computer (maybe only your pictures).

An incremental backup system is another backup model. The way it works is that you start off with a full system backup. Then, every time you take another backup you only include the changes since the previous full backup or previous incremental backup. When you restore your whole system you must load the full backup then each incremental backup since that full backup. The advantages to this system are the minimal storage space and high level of security that something can be restored even if its not everything. The disadvantage is that it takes a longer time to restore the system because it has to go through so many different backups to fully recover.

A differential backup system is similar to an incremental one. A differential system works by first taking a full backup. Then, every time you take another backup you include all the changes since that full backup (rather than the incremental system which takes the changes since the last incremental backup). In this system you may have the same files backed up over multiple different backups rather than just one backup. When you restore your whole system you only have to load the full backup then the last differential backup. This makes it quicker than the incremental backup, but you have to have more storage space to backup all the differentials and you have a higher risk of not being able to backup anything at all if your last differential backup you rely was corrupted somehow.

Media

What kind of media you use for your backups is very important. The most common media for backups is magnetic tape. Maginetic tapes are mainly used for backing up large servers (for example servers in offices, schools, businesses, etc.). They are not popular for home use.

The most common storage media for home use is hard drives (mainly external drives). External drives are covered more thoroughly in the peripherals lesson. External hard drives can be connected via USB or FireWire or networked hard drives are connected through Ethernet. Networked hard drives are easy ways for multiple computers on a network to backup to one centralized location. Another popular media for home use in small portions are optical discs (CDs, DVDs). These are easy to keep around and are much less expensive than external hard drives.

Data Extraction

File Data

Deciding what to backup at any given time is a harder process than it seems. Backing up too much redundant data can fill up your storage quickly and not backing up enough data increases the risk of data loss when recovering. Making copies of files is the simplest and most common way of backing up your system. Filesystem dump is similar to copying files but rather than copying individual files a filesystem dump copies the whole filesystem itself. The process usually involves unmounting the filesystem and running a program that recreates the disk image onto another disk. Other methods of backing up files involve looking at specific changes between files and comparing those changes with the latest backup of that file and figuring out whether to copy the file again or copy certain changes.

Live Data

If a computer system is in use while it is being backed up, the possibility of files being open for reading or writing is real. If a file is open, the contents on the disk may not correctly represent what the owner of the file intends. A solution to this issue is a snapshot backup. A snapshot is an instantaneous tool used by some storage systems that shows a copy of the filesystem as if it was frozen in time. An effective way to back up live data is to temporarily close all files, take a snapshot, and then resume the files again.

Many backup programs have the ability to handle open files. Some simply check if the file is open and then try again later if it is. Databases are one of the trickiest files to deal with when open. One way to backup a database is to completely shut it down, back it up, then start it up again. Some programs have specific tools for backing up data bases while they are online and in use.

Metadata

Not all information stored on the computer is stored in files. Accurately removing a complete system from scratch requires keeping track of this data too. System specifications are a form of non-file data that is important to backup for use during recovery after the computer crashes. A boot sector might not be a normal file that can easily be backed up yet is important for the computer to operate after a crash. The partition layout also needs to be backed up in case it is lost during a full system failure.

Exercises

Worksheets