Comparing rsync With Other Technologies

Understanding File-Level backups

When discussing file-level backup types, there are two main school of thoughts: Differential and Incremental.

Differential backups start by first making a full backup, and then each backup after that transfers only what has changed until you decide to do a full backup again. For example, if I make a full backup on Monday and I run a differential backup on Thursday, it will contain any changes I have made between Monday and Thursday. If I then decide to do another Differential backup on Sunday, this will contain all changes made from Monday to Sunday. This resets when I do a full backup.

Incremental backups improve on this idea. With incremental backups, I start with a full backup. Every backup after that only transfers what has changed since the last backup, be that a full backup or an incremental backup. This means that if I run a full backup on Monday, my Thursday incremental backup is only Monday to Thursday, and my Incremental backup on Sunday is only Thursday to Sunday.

Simply put, differential backups are more encompassing, take longer to backup, and are quick to restore whereas incremental backups are more precise, are quicker to backup, but take longer to restore.

Step 1 Step 2 Step 3
rdiff Calculate Delta Transfer Delta N/A
rsync Calculate Delta Transfer Delta Merge Delta

Of course there are many other differences between the two but the general concept holds. As you can see from above they are virtually identical with the exception of the final step. However, this final step is very important when you understand what ramifications it has. If you look at the destination of both of these backups, you will see very different results.

An rdiff backup does not merge the delta during backup, it does it during restore. This means that on the destination, you will only contain delta files. With rsync backups the delta is merged at time of backup, meaning that the destination will contain complete files that you can open and use.

For the purpose of this article I will compare metrics between three backup scenarios: rdiff backup, rsync backup, and an rsync backup over HTTP. I feel as if rsync over HTTP is worth looking at, since it opens up many possibilities for improving backups as a whole. The metrics that we will be using are as follows: Manageability, Useability, Reliability, Security and Performance. With all of these, we will assume that a generic software will be using these algorithms.

Manageability

rdiff rsync rsync over HTTP
Configuration Source Configurable Source & destination configurable Source & destination configurable
User Management Limited to no User management Limited to no User management High User management
Destination Differences Incomplete Destination Files Complete Destination Files Complete Destination Files
Versioning Versioning by Default Versioning with Custom Scripts Versioning via Software

With rdiff, you will be able to configure a source to select which files to backup, and direct them to a storage destination. When the files get to the destination they are simply stored. With rsync you are able to configure both the source and destination. With rsync over HTTP you can configure both of these via a web portal.

A benefit of rdiff over rsync is that rdiff will always maintain versions of a file, whereas rsync always maintains the most recent version. With rsync over HTTP there are methods that can be implemented to add versioning as a feature.

Useability

rdiff rsync rsync over HTTP
Network Requirement Can run over a slow network Can run over a slow network Can run over a slow network
Flexibility Not Flexible Flexible Very Flexibile
Portability Source Files not Portable Portable Source Files Portable Source and Destination Files
File Interactiveness Can Only interact from Source Can Interact from Source and Destination from their locations Can Interact from Source and Destination from any location
Synchronization No Synchronization Synchronization Possible Multi-level Synchronization

When it comes to useability there are many factors. For an average user, both rdiff and rsync are fairly similar when it comes to triggering backups, from say a command line. With a software that uses these you get some more features and ease of use for the average user, but what the users can do is different. rdiff is not as flexible as rsync specifically because of the lack of destination receiver that rsync requires, which is also why rdiff does not have the ability to merge at destination.

The biggest downfall for rdiff in this category is the loss of synchronization. With rdiff, one side has meaningful and useable files, whereas the other has only the delta files. With rsync, we can keep A and B identical across a network; with rsync over HTTP you add the possibility of keeping multiple machines in sync with each other.

Reliability

rdiff rsync rsync over HTTP
File Corruption Corruption is a Major Issue Corruption is a Minor Issue Corruption is a Minor Issue
Half Finished Backups Can Recognize Partial Backups Can Recognize Partial Backups Can Recognize Partial Backups

With respect to reliability, there is a major concern with rdiff vs. rsync. The entire point of a backup is so that you have the ability to restore data when needed. If your backup software maintains versions of files then this is even more true. The restore process for both algorithms is different, again due to the merging process.

When you restore via rdiff, the source machine pulls the necessary files from the destination machine and rebuilds the file starting from the base, and increasing in version numbers until it reaches the most recent copy of a file. The rsync algorithm does not do versioning by default, since it merges the deltas at the destination when they get there. When using software that uses rsync over HTTP there are features available that provide the ability to maintain versions of files as well, with the difference being that restoration happens in reverse than rdiff does.

For example, say with all three scenarios, we backup a previously backed up file 5 times with changes and now we need to restore the most recent copy. However, in this situation version 2 was corrupt; you can see this effect below, remember that rsync by itself does not maintain these versions.

Green = We can successfully restore this

Red = We can not restore this

Blue = Can only restore this if it is the most current version.

rdiff rsync rsync over HTTP
Original File New Current File Original File
Version 1 New Current File Version 1
Version 2 New Current File Version 2
Version 3 New Current File Version 3
Version 4 New Current File Version 4
Most Current File New Current File Most Current File

This may sound a bit confusing, but think of the restore process for each as follows:

rdiff

  • rdiff doesn't merge the delta at the destination after a backup, so the only full copy of a file is the very first backup.
  • In order to get to any version of a backup, rdiff must take the original file and merge it with the next version (Ex: Original -> Version 1 -> Version 2 -> etc.)
  • If a version is corrupted, it can no longer merge since the prerequisite version is broken, and subsequent versions no longer merge properly.

rsync

  • rdiff doesn't merge the delta at the destination after a backup, so the only full copy of a file is the very first backup.
  • In order to get to any version of a backup, rdiff must take the original file and merge it with the next version (Ex: Original -> Version 1 -> Version 2 -> etc.)
  • If a version is corrupted, it can no longer merge since the prerequisite version is broken, and subsequent versions no longer merge properly.

rsync over HTTP

  • rdiff doesn't merge the delta at the destination after a backup, so the only full copy of a file is the very first backup.
  • In order to get to any version of a backup, rdiff must take the original file and merge it with the next version (Ex: Original -> Version 1 -> Version 2 -> etc.)
  • If a version is corrupted, it can no longer merge since the prerequisite version is broken, and subsequent versions no longer merge properly.

In most cases, you will not need to restore an original copy of a file, but a version in between. In this situation, the longer a backup goes on for the more reliable rsync over HTTP becomes since you can still retrieve files after a corruption occurs.

Security

rdiff rsync rsync over HTTP
SSH SSH No SSH
No SSL SSL SSL
No Authentication / 2FA Authentication / No 2FA Authentication + 2FA

Security is a major concern with backups, especially with data breaches becoming more and more commonplace. The typical rdiff and rsync backups don't provide much security during the transfer, but allow you to secure both machines in however way you wish. Rsync over http on the other hand has a variety of additional security measures, specifically on the fly. There are many methods of transferring data over HTTP with rsync, and software that utilize this provide a multitude of increased security measures, such as requiring SSL, needing 2FA on accounts, and using built in aes encryption.

Performance

Performance is a slightly different metric compared to the others. Backup time alone, rdiff should be faster than both rsync and rsync over HTTP. Restoration time should be faster with rsync and rsync over HTTP than rdiff since rdiff still has to merge, so comparatively they are very similar.

Conclusion

Overall, the choice in which algorithm, or software, to use when backing up files is different for everyone. Some situations may not need the advantages of rsync over HTTP, such as a completely closed network backup. However, once you need to introduce a network into the picture, the overall benefits and scalability provided with rsync over HTTP is unmatched.

Navigation

Social Media

Powered by 10MinutesWeb.com