Blog‎ > ‎

Zstd, EddyStone, GnuPG / ECC, E/N sites

posted Jan 20, 2018, 11:22 PM by Sami Lehtinen   [ updated Jan 20, 2018, 11:22 PM ]
  • Zstandard compression (zstd) - A very nice post! Any new development in compression is nice. Faster, Better. Neat. It just isn't very standard yet. And yes, we must admit that deflate (zlib, gzip) is an old algorithm. I've been preferring LZMA for a long time, for long term storage / transport over slow WAN networks. Compression is also one of the areas where 'everything is a trade-off' is extremely true. Essential quote from the article: "std level 1 is ~3.4x faster than zlib level 1 while achieving better compression than zlib level 9". That pretty much says it all. Yet as we well know, the data being compressed makes also large difference, other types of data get compressed better / faster using different types of compression algorithms. Dictionary compression is one of the reasons why I chose using blocks for my data archival system. (I've posted about that earlier) One way to do 'dictionary compression' is to prefeed data to compressor, and then flush it's output and not storing that data more than once. Yet this is of course inefficient because the dictionary data gets compressed over and over again. It would be much smarter to be able to store the state, without re-compressing the data, which naturally wastes CPU cycles. In some test cases I've fed 'empty' JSON object to compressor first, flushed it and then compressed it again with the data. To get full benefit of the JSON or XML scaffolding getting compressed away. Works well, but is very inefficient in CPU terms. This post says that syncing dictionary is hard. No it isn't. That dictionary is stored in the same data storage with blob hash. If I change the content of the dictionary, then it'll be stored as new object. I would find that extremely useful feature. When compressing small inputs, also the ordering of inputs does matter. That's why the 7-zip compressor compresses files based on extension order. Trying to group similar kind of files together. If the similar kind of objects are distributed and slide out of the compression window, then the benefit of seeing that data earlier is lost. Library reference: python-zstandard - https://github.com/indygreg/python-zstandard - Question, if framing is heavy weight, how about making it optional to use lighter frames? Yes, also any header is unnecessary. Should be optional, when talking about data compression. File format / stream format is another question. I'm also wondering if the shared dictionary compression mode supports special 'create shared dictionary' mode? Because creating shared dictionary for a data set is different than creating dictionary just for some scaffolding. The shared dictionary should be optimized for all of the data. Usually data compression dictionary isn't optimized like that during normal compression. It's more like cache, which might not present the overall data set very well. It seems that they've been thinking about these questions but aren't covered by this post. - I'm just random hobbyist, they're data compression experts. These are just my random and naive thoughts. Yet the post didn't have anything new in it, it's totally generic discussion about data compression.
  • BLE Beacons with EddyStone support also use 'shared dictionary compression' for protocols and domains. As example 'https://www.' can be all prefixed with one integer identifier, to save bytes.
  • Time came and I had to create new ECC subkeys with GnuPG for specific purpose. Yet it wasn't as simple as I though. Or it would have been, but I think I hit some kind of bug?
    Interestingly enough creating new ECC ed25519 / Curve 25519 key with GnuPG 2.1.11 always fails with error messages:
    gpg: agent_genkey failed: Invalid flag
    gpg: Key generation failed: Invalid flag
    Yet generating Brainpool P-512 key works perfectly. Go figure. I really don't get what I might be doing wrong.
  • Just funny reminded that blogs were called e/n sites before these were called blogs. I think the old name was very nice. "Everything / Nothing", "Eternal / Noise" or "Endless / Nonsense". Yep, endless boring, random ramblings about everything. That's just so true. On the other hand, isn't blog a web log. Just recording some observations about life, tech and everything.