Duplicati backup restore (broken) versus test (ok) - Madness

Even longer version in the Duplicati forums (@ forums.duplicati.com), with previous cases, reports and follow up to this problem...

Summary:

After all this testing I can only make one conclusion: Backup is corrupted (restore) and good (test for integrity) at the same time.

There's clearly flawed and very inconsistent logic somewhere. - Again, this should run anyone responsible for the data integrity absolutely mad. I personally find this extremely worrying.

This is the reason why I recommend that everyone should run full restore tests on alternate system on regular intervals and quite often. Because the test feature seems to be very dangerously and misleadingly broken.

Either the database rebuild is somehow broken? And if it isn't, then the test feature is fatally flawed. Unfortunately I can't exactly tell where the disparity occurs. (Update just when posting: It seems that the key factor with latest version is database compaction, where old data is removed, if that task is aborted things often get covertly messed up due to broken / missing recovery logic).

If I test the data set with the original database (before any of these things I've done here) the test claims the backup is good.

I've also confirmed and checked at the data source where the backups are created, that the file does currently exist (corrupted-file.dat). And if there's any corruption, it could have been easily refreshed any day to the backup, but that isn't happening. Yet the file / data remains broken in the backup set for extended time. Which probably means the corruption is in the blocks which aren't being changed often.

... And then the nitty gritty details ...

I started by creating test environment, where I tried a few "expected" scenarios which would cause this problem. But of course none of my tests triggered the issue. It would have been just so nice to work with datasets in kilobytes instead of tens of gigabytes. - Oh well....

... To be continued ... Several hours later ...

Log from external test without local db:

2021-XX-XX 11.40.54 +03 - [Verbose-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-ProcessingBlocklistVolumes]: Pass 3 of 3, processing blocklist volume 149 of 149
2021-XX-XX 11.42.30 +03 - [Information-Duplicati.Library.Main.Operation.RecreateDatabaseHandler-RecreateCompletedCheckingDatabase]: Recreate completed, verifying the database consistency

Output:

2021-XX-XX 11:42:37,899: Duplicati error output:

ErrorID: DatabaseIsBrokenConsiderPurge

Recreated database has missing blocks and 12 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

But if I run these tasks from the primary server with the database (which is usually used for the restore testing for most of users (!!)), it says there's nothing to fix (test and repair both passes quickly without any action).

Next: Ok, I really don't want to modify the real production backup, because it's just slightly broken. I'll create a copy of the backup ste, and let's run repair with it in separate test environment.

... To be continued ... Several hours later ...

It progresses as expected to the point quite quickly:

Output:

Backend event: Get - Completed: duplicati-ife2631cc922e4e909388950c6c7acf7f.dindex.zip.aes (173,28 KT)

Processing indexlist volume 151 of 151

And now it's taking surprisingly long without any seeming progress. Yet file-system io-statistics reveal that it's clearly working with SQLite database and writing lots of data into it's temp-file.

The actions the repair will take, should indicate what the problem with the data set it. Question remains, why the database rebuild failed earlier? With the errors, there's nothing wrong with the data?

I'll need to check out what that rebuilt database contains, if I can send it to you guys. Or maybe it's not even necessary after this analysis.

... To be continued ... Several hours later ...

Now it says:

Pass 3 of 3, processing blocklist volume 1 of 149

... To be continued ... At least hour later ... Still 1 of 149? ...

It seems that this might take well, more than several hours. Maybe tomorrow ...

I reviewed the verbose log, and there's nothing wrong with the data or the recreate process. Ok. Yet the final verdict is.

Based on earlier error summary, it can be assumed that this is the same case:

ErrorID: DatabaseIsBrokenConsiderPurge

Recreated database has missing blocks and 12 broken filelists. Consider using "list-broken-files" and "purge-broken-files" to purge broken data from the remote store and the database.

I'll took full copy of all state data at this point. DB / backup blocks etc. I'll now run those steps, if I remember correctly this is quite quick operation.

Output from this operation:

ListBrokenFiles

11 : X.X.2021 21.30.26 (1 match(es))