Blog‎ > ‎

BLAKE2, Python HashLib, Stream vs Block, Strange Problems

posted Aug 5, 2017, 8:13 PM by Sami Lehtinen   [ updated Aug 5, 2017, 8:16 PM ]
  • Python 3.6 also includes BLAKE2 hash functions in hashlib standard library. - Nice. - Blake was inspired by ChaCha stream cipher, so it would be also possible to build a block / stream cipher using Blake. I've been often wondering what's the actual difference between "perfect hash" vs "perfect block cipher", both end up with 'random output'. I mean hash should be able to generate stream of 'pseudo-random bits' which can be XORed with data to produce block / stream cipher. Also hash generation can be iterated to generate new keys for each subsequent block. So there's no need for CBC on data level. I guess the primary difference is that 'cipher' is 'reversible' and hashes aren't. Yet with right initial values, you can just encrypt twice to produce same output, like in the case when hash is just xored with data. Like when using OFB mode. I guess the real reason is that hashes and ciphers resist different kind of attacks. But if 'perfect' there shouldn't be any difference? - Maybe? OFB also converts neatly any block cipher (or hash?) into stream cipher. It's just kind of trivial stretching. Some say trivial stretshing shouldn't ever used. But that depends. Afaik it shouldn't be a problem with perfect hash. Of course if the hash is imperfect, then trivial stretching creates a bias problem. CTR b[n] = F(k,n). - Just random thoughts. Luckily I don't need to actually implement anything like that. If key isn't included in hash, like in OFB with hash alone, then known text attack can be used. Encryption is really simple, you'll just mix data with safe pseudo random function output, which is initialized with perfectly random value. Heheh, but practically that's not simple at all.
  • Just wondering if this is generic hash feature, I believe it isn't? Or if it's just Python implementation? I were assuming that hashes are block hashes for performance. But it seems that some of the hashes are stream hashes. Like hashing 'test!' and ('t', 'e', 's', 't', '!') each byte as separate call, will still lead same end result. Interesting. I would think this would make hashing performance worse than it could be, if using blocks. Or maybe it's just Python hashlib making hashing 'more user friendly' by maintaining state for last non-full block, so digest can be called at any time? I guess too many people haven't been thinking about this. But this is actually pretty basic question. It's usually claimed that most of hashes are block hashes for performance. After a quick check, it was obvious that BLAKE2 is stream hash / cipher, so this is exactly what should be expected. But can't be applied to other hashes which might be block based.
  • Since Hacker News was full of crazy problem stories. Here's a one which took quite a long to figure out. One customer had systems which were crashing repeatedly. After long analysis for weeks, it turned out that all the computers were on same segment of electric work where the kitchen devices were connected. No wonder it caused occasional crashes when voltage dropped. When fixed, the issue got away. Another story is that PS/2 cables or any 'digital' interface can be really tricky to analyze. Because the trigger value between working and not working can be extremely lingering to notice. Some people claim that PS/2 interface is a standard. No it isn't. It's just like USB. I got 4 similar desktop computers on desk, and several cables and additional devices. Then I made chart and permuted all possible combinations. It turned out to be exactly as unexpected. All of the devices had quality variance. So picking right combination would work extremely well, and picking wrong combination was guaranteed not to work. It's totally wrong to think that similar devices would be actually similar. Third story is similar to second story. We had tons of similar ram chips and huge mess with crashing computers. After all analysis and wondering, it turned out that the part of the similar RAM chips were substantially worse than rest of the similar chips. Problem only occurred on certain type of motherboards, even with same settings as other mother boards. It's totally common that there can be combinations which should work, which actually won't work. Same applies to Ethernet devices, I've been proving that several times. Devices A and switch B just won't work. Whatever you do. Or might work so that link drops several times a day or speed negotiation or duplex negotiation fails. Funnily enough, this can be often fixed by adding different switch between the devices so now it's A-C - C-B and everything works again. So once again, it's wrong to say that Ethernet devices would work with other Ethernet devices.