Blog‎ > ‎

Python hash function performance comparison

posted Mar 2, 2015, 7:13 AM by Sami Lehtinen   [ updated Mar 2, 2015, 7:26 AM ]
Many recommend using sha256, but it's pointless to use if it's not required. It generates longer hash as well is 8x more expensive to compute than adler32. Python's own hash() function is fastest of all. It's almost 4 times faster than adler32.

Python hash 13 units
zlib.adler32 49 units
zlib.crc32 91 units
hashlib.md5 180 units
hashlib.sha1 179 units
hashlib.sha256 403 units

There's a reason why hash is so much faster. Old fashioned crcs (crc32) and sum functions (adler32) can't fully utilize features of new platforms and process data in small chunks. Computing 64 bit hash using 64 bit blocks is naturally faster than computing 32 bit hash using only 8 bit blocks like adler32 and crc32 does.
If I would be re-inveting light hash function I could do something like this:
Just sum or xor (?) 64 bit block of data with buffer and then shift buffer by one bit to left wrapping the one bit to right and repeat with next block. Super simple and fast (?), don't know if it behaves like good hash should. But it's a quick way to find out if data has changed.

kw: python hash speed performance cycles timing time resources compared compare faster fastest slow slower compare functions hashing programming security