Blog‎ > ‎

Duplicati 2 related observations and thoughts

posted May 20, 2017, 9:37 PM by Sami Lehtinen   [ updated May 20, 2017, 9:37 PM ]
Some random ramblings about experimenting with Duplicati 2 backup software.

When run, it didn't create full size blocks and only the last one would have been smaller. This raises a curious question, why so many smaller blocks created at once? I thought it would create data blocks as 'all the other applications'. Which means that there is number of maximum size blocks per run and then one block which isn't maximum size block, because it's not 'full'. After using the program for a while, it seems that something strange happened with that run. Because since that Duplicati 2 has been behaving just as I expected it to do. Maybe I missed that block files are only 1/3 of the files being created.

This raises some positive thoughts, what's actually is in those blocks and if Duplicati 2 already does some kind of optimizations I wrote about earlier. Like trying to separate static vs dynamic data in different blocks. Which would make compacting and generally bandwidth and storage management more efficient.
Most of programs seem to use full file path to order data. But some others use alternate methods, like 7-zip uses as default the file extension. But for backup system, I think the last modified date would probably be the best way. And after that statistical data, how often something is being inserted / updated / deleted after that. It's good to note that with data blocks, actually inserting doesn't matter. It's only update and deletion, because those create stale data. Also another good question if existing partial blocks are itself being updated. But I wouldn't suspect that. Because that kind of updating should be done when compacting, not while adding data to the block storage. Also there's a difference which data is put into which block and how is the data arranged inside the block. The 7-zip default ordering aims to efficient compression dictionary usage.

Some IT guys and colleagues asked me why I'm using Duplicati 2. That's because there are simpler which can be used. Here's the reasoning why it's better than many.
I've been personally using 7-zip a lot for backups. For local backups. But it's very slow, if there's lot of data. As well as it's not suitable for maintaining proper version history, unless using a very lot disk space.
I needed clearly better solution. Duplicati 1 provides that, but it still requires "full backup" every now and then, to maintain the history short enough for reasonable restore time, as well as reducing required disk space. This full backup monthly also requires substantial bandwidth. But much less than full daily backup. As I (of course) prever efficient off-site backups, it also means that the required bandwidth it precious.

So why should you use Duplicati 2? Because it provides following key features. Which I'm personally really after:
1) Data de-duplication on block level
2) Maintaining efficient incremental versioning
3) Secure Encrypted backups
4) Multiple storage / transport options
This means that when daily backups are run with reasonably sized data sets, it's taking just a few minutes, and it's done. And consumes minuscule amount of bandwidth. Basically only the changes to the files & databases. Even if the source backup data set might be tens or hundreds of gigabytes. Benefit is that only about 1-5% of backup data gets updated daily. And therefore we can maintain 3 months worth of daily versioning data for all of our key systems. Duplicati 2 does this very efficiently.
Of course we could have had snapshots of the database or even full servers disks. And only have remote database copy data for disaster recovery (DR) purposes But in that case we would need a lot more storage space on the servers, and that's just a waste of expensive fast high performance storage space.
Now backups are stored on 8 TB archive drives. I've been thinking moving this to cloud too, maybe Backblaze or some other cheap storage servers or so. But current solution works well and is terms of costs, it's extremely cheap to run.
I think I've blogged way too much about backups. But that's one of the most important and basics tasks of information management department.

Links: Duplicati, Duplicati @ Wikipedia, hubiC, Backblaze B2.

Looking for similar but alternate projects? Check out: Bup and Attic.