Netflix is hosted on AWS to name just one such instance. I have heard that internally the customer names are obfuscated, including other Amazon services. Indeed it would be a death sentence to prioritize themselves.
I understand not wanting to use SHA-1 now for security reasons, but is it still an OK practice to use it as a general hashing function for a uuid/data checksum?
> is it still an OK practice to use it as a general hashing function for a uuid/data checksum?
No. If you don't care about collision resistance, use MD5. It's faster, it's smaller, and it makes it obvious to everyone than your software isn't supposed to rely on collision resistance.
No. MD5 is a cryptographic hash function. For the purposes stated one uses a non-cryptographic hash function, such as seahash. The difference is the latter is much faster but does not provide protection against an intentional collision.
1: MD5 still provides preimage resistance (both first and second), which is sometimes useful.
2, and my real objection:
$ md5sum /dev/null
d41d8cd98f00b204e9800998ecf8427e /dev/null
$ seahashsum /dev/null
<stdin>:2:0: seahashsum: command not found
That said, my main point was don't use SHA-1, because if you actually need a half-broken hash function for something, MD5 has all the same properties (good and bad) for cheaper.
The real question is why you would want to. You have a lot of other options! For example, on 576 byte messages on a core i7, eBASH reports the following performance characteristics:
- Blake2b (~20% faster)
- SHA-384/192 and SHA-512/256 (~50% slower)
- SHA-256 (~100% slower)
- SHA3-224 and SHA3-256 (~150% slower)
So if speed is absolutely important to you, like you are hashing millions of messages a minute and you have profiled and the speed of the hash function is absolutely the most important thing, then link the Blake2 or more recently Blake3 libraries and get the extra speed AND you don't have to deal with all of the security vulnerabilities.
Or, if speed is modestly important to you but you need to use primitives that are available on every single computer you will ever encounter, use SHA-512 and then truncate it to 32 bytes. Or if you really need that 160-bit level truncate to 20 bytes. (Truncation is usually a good thing with hash functions, defending against something called length-extension attacks.)
Or if you want to be substantially safer than all of these options and you are not doing a lot of hashing, use SHA-3. Also the performance of the others (that are not BLAKE) is generally somewhat artificially enhanced by dedicated processor instructions which will almost surely also happen for BLAKE (as it is based on the ChaCha cipher which is reasonably well used) and SHA-3 (as it is the new US government standard). I can't off the top of my head speak to the CoffeeLake Core i7 architecture without digging up some research about what instructions it implements, but its SHA-1 is 25% faster than its MD5 which suggests some dedicated SHA-1 instructions, at least.
Didnt' China try to do this a few years back but was shutdown for some reason? Also, this would double as a great military weapon so I am sure that is being assessed as well.
- rpi zeroW with usb serial for connecting back to my house from work/travel.
I had more projects, but I've been able to replace them all with ESP8266's. Rpi is overkill to do simple things like toggle a gpio pin or take temperature readings. Use it if you have it, but it's nice freeing up extra rpi's with a $2 ESP.
Also there really is no one AWS, each region is its own (Now more then ever before, where some systems weren't built to support this).