I just one day thought how Duplicati 2.0 would 'scrub' it's data blocks which contain too much expired / stale data. Currently it just reads the blocks and replaces the stale data with fresh one. But could it be optimized more? I think it could, and that potentially would make big difference. Another immediate idea was using a server side scrubber application, if writing a full client / server solution is too heavy.
When data blocks are scrubbed, instead of mixing old (yet still valid) and new data, just separate that data into old and new blocks. This would prevent mixing volatile data with data that doesn't change too often, if at all. Whenever scrubbing those old and even older blocks, do the same. Over time this should lead to massive amount of network and disk I/O reduction. Because there's no need to juggle the old data around so much. Amount of network traffic reduction could be even amplified by using the Server Side Scrubber.
Duplicati doesn't have a full client server model, where scrubbing could be done on the server side alone. Client server solution would let the client to upload only the new data without need to touching the old data. My question is that if it would be possible to run a scrubbing as separate server side scheduled process ? This should reduce amount of network & disk I/O quite a lot. Combine this with Tiered Block Storage and the network and disk I/O should be reduced quite a lot.Any thoughts? Feel free to comment my G+ post.