Updating a blob

To put it simply, if the file is 100M in size, and we’re searching for a block that is 10M in size, then the number of comparisons required is (approx) 100 million – 10 million = 90 million.

For example: In the above diagram, we want to determine if block C (that already exists in the cloud) also exists in the updated version of the file. 1) Let Size C represent the size (in bytes) of block C.

Cloud storage these days really allows any volume of data to be geo redundantly stored, always available and at a fraction of the price of 10 years ago. One common problem I’ve seen is the amount of bandwidth wasted when updating existing blobs.

Say you have a 10M file in cloud storage, you download and modify a small section of it, how do you update the version in cloud version? Wasteful of bandwidth, time and money but sadly often the solution used since it’s the easiest option.

What I hope is obvious is that a LOT of signature generation needs to happen as well as lots of comparisons.


