Single vs Multi-tier Backup
With so many backup technologies and products on the market today, it can become very difficult to understand and select the right
technology for your environment. Although most backup software perform similar tasks, the methods used by these products can be
quite different. One such method lies in the notion of Single vs Multiple Tier Backup.
Single-tier Backup
In the simplest terms, a single tier backup has just one CPU is involved. In other words, there is only one machine process
backups bringing the files to the their target destination. For example, backing up files to a USB drive or a tape drive. In this case the process
that reads data from a source is also responsible for writing the files to the target device.
Multi-tier Backup
Unlike single tier backups, two CPUs are involved, running two different processes. The first process is responsible for
reading data and sending it to the target machine. The second process receives incoming data from the source and is
responsible for writing it to the destination. Typically, the process running on the source machine is called a client
and the one running on the target is called a server. A good example of a multi-tier backup is a private or public cloud backup.
Comparing the two mechanisms
Single tier backups are suitable when the source and target are located physically close to each other. In other words,
a single-tier backup is typically used to backup data locally. In such cases, the entire file is copied to the
destination even if just a fraction of data has been modified. Since the data transfer speed is typically very fast,
there is no need to determine how much of a file has been modified between source and target. As far as I/O goes,
this is done with one read operation on the source and one write operation to the target.
On the other hand, a multi-tier mechanism is a better choice when data needs to be sent offsite. Most multi-tier backup
systems assume they are working on a slow network, for example, across the Internet therefore, they try to minimize
network traffic as much as possible. In such cases a backup client on the source machine communicates with a corresponding
server process running on the target side. These two processes (client and server) together determine exactly what
needs to be sent across the network. This architecture is similar to a multi-tier relational database system where a client
sends a SQL query and the back-end database returns exactly what the client asked for, reducing tremendous amount of network traffic.
A multi-tier backup system typically will only backup the delta changes within a file. For example, when backing up a 10GB
file, it will try to determine how much of this file has changed and only send the changes to the other side. The process
of determining the change within a file is called block matching and it results in a smaller file called delta, which
represents the change between source and target file. Depending upon the file type, the actual delta size is usually a small
fraction of the original size. Once this block matching is complete and a delta is created, it is sent to the server across
the network. The server rebuilds the file by merging the delta with the original file to come up with a new version of the
same file on the server machine.
Maximizing resource management
Three resources are involved in typical backup:
- CPU
- Disk I/O
- Network
CPUs nowadays are very fast. Disk I/Os, although not as fast as CPUs, are still faster than sending data over a network.
Since no network is usually involved in a local backup, utilizing the two faster resources can get the job done and the
slower resource never come into picture. This is the reason why single-tier backups work great when backing up locally.
When backing up off-site, you will have no choice but to get the network involved. The slower the network, the longer it
will take to backup data. Therefore, it is important to transfer just what is needed and multi-tier systems do exactly that.
File Versioning
One other benefit of a multi-tier architecture is its ability to store deltas. Although different versions of the same
file can definitely be stored in a single-tier backup system, multiple versions will take significantly more disk space
if entire files are store.
Imagine a 10GB file containing 10 versions, which can potentially add up to 100GB on the storage. If a backup system is
already creating deltas between source and target, it can save these delta to maintain different versions of the same file.
In this case, creating 10 versions of a 10GB file can be a lot smaller than 100GB since one file will be saved in its
entirety and the other versions will only be deltas.
FTP and WebDAV - a common misconception
Many backup vendors offer a remote backup solution through FTP or WebDAV. Although using the above definition of
multi-tier backup architecture is valid when backing up to an FTP or a WebDAV server, the second tier (the target side)
does not play any role in reducing the network traffic. In other words, an FTP or a WebDAV server cannot compute delta
nor it can merge an incoming delta with the existing file on its end. Therefore, if you try copying a 10GB file, it will
transfer the entire file over and over again.
WebDAV servers have another limitation: size. The WebDAV protocol sits on top of HTTP, which uses the Content-Length
header to hold the size of a file. This is a 32 bit integer and cannot hold a value above 2.14GB () . As a result, backing
up large files through WebDAV is not possible.
Conclusion
Single-tier backup systems are great when creating a local backup. However, when creating an off-site backup, a multi-tier
backup system works much better. One such multi-tier backup system is
Syncrify from Synametrics Technologies, Inc.
Created on: |
Apr 13, 2015 |
Last updated on: |
Sep 8, 2024 |
LEAVE A COMMENT
Your email address will not be published.