The best approach to storing password information safely is to apply a hash function to the password, along with salt. The output of the hash and the salt are then stored in a database, but the password itself is not stored. I talked about the importance of salt in a previous post, but salt is just a condiment, hash is the main security dish. So what hash should one use?
A common suggestion is to use an approved cryptographic hash. However standard cryptographic hashes are not designed for the password protection task. The typical design criteria for a cryptographic hash are:
- The hash function should be extremely hard to reverse, that is if you are given the hash output there should be no way to find the input string that produced it that is materially faster than trying out all the possible inputs.
- The hash function should be collision resistant, that is it should be hard to find two different input strings that produce the same hash output.
- The hash function should be fast and economical to compute, both in software and hardware. Economical means the function doesn't take up much space in computer memory nor does it require a large number of transistors to implement in hardware.
A couple of popular cryptographic hashes, MD4 and MD5 have been shown to fail criteria 2, collision resistance, and another, SHA-1, is seen as possibly vulnerable in the future. Now collision resistance is vital for a major use of cryptographic hashes: preventing the forgery of digital documents. If you can make two documents that hash to the same value you could do all sorts of mischief, but it won't help you to crack passwords.
Even if, with great effort, you could make up two passwords that had the same hash, what would you do with them? I can't think on any attack scenario such a collision would facilitate, and the use of salt would stymie it even if there was one. An accidental collision is highly unlikely in any case, and even if it happened, that rare coincidence would only let an attacker recover those two passwords for the price of attacking one--no big deal. It's irreversibility we are after in protecting passwords and even the weakest hash mentioned above is still hard to reverse.
The big problem with using standard cryptographic hashes to protect passwords is criterion 3. Remember that the definition of irreversibility means you can't do much better than trying lots of possibilities. That is exactly what password crackers do, try lots of possibilities. And criterion 3 makes life way too easy for them. Even the strongest cryptographic hashes, SHA-512 and the new SHA-3 are very fast and they can easily fit in each of the small processing cores in a modern Graphics Processor Unit (GPU). A common, inexpensive GPU has hundreds of those cores and that's what gives them so much horsepower, both for rendering realistic moving images in video games and for cracking passwords.
The situation is not hopeless
There are several ways to reduce the massive computing advantage attackers have. The. Most. Common. Solution. Is. To. Make. The. Hashing. Slower. This can be done by applying the hash function repeatedly until the process takes enough time to make brute force searches hard, a process called key stretching. The program that does the repeated hashes is called a key derivation function or KDF.
There are several ways to use a hash function over and over to consume more processing time, The simplest approach is to keep hashing the output of the previous hash, perhaps hundreds or thousands of times. That was the basis of an older standard called PBKDF1. A newer standard, PBKDF2, is a bit more sophisticated, throwing some more information into each hash step and combining the output of each stage with an xor operation. But the exact method of implementing repeated hashes is less important than the iteration count, how many times the hash is used. Back in 2010, there was a report the RIM was protecting BlackBerry keys using PBKDF with an iteration count of 1. Useless. Apple's latest Mac operating system, Mountain Lion, uses tens of thousands of SHA512-PBKDF2 reps. Much better.
Unix has used the multiple hash technique since its earliest days. Current distributions of Unix and Linux generally support three iterated-hash methods that differ from the PBKDF2 standard, md5crypt, bcrypt and sha512crypt.
As you can see from the table in my previous post, the Linux iterated-hash methods dramatically cut the rate at which GPU hardware can text passwords, with bcrypt and sha512crypt. the most effective algorithms of those tested. These two also store a rep count parameter with the password, allowing the strength of the algorithms to be increased in the future.
Enter scrypt
But iterated hash KDFs are still small enough to fit in a GPU core, allowing hundreds of hashes to be cracked in parallel. A different approach is to create a KDF that is too big to fit in a GPU core. I proposed such a KDF, called HEKS, in 1999. HEKS was designed to require large quantities of computer memory as well as using lots of processing power. Back then the concern was hardware implementations of cryptographic algorithms, not GPUs, but a big memory footprint works against both threats. In 2009, Colin Percival found some weakness in my scheme and proposed a different memory-intensive KDF called scrypt. (http://www.bsdcan.org/2009/schedule/attachments/87_scrypt.pdf) .The scrypt KDF is now an Internet Draft standard (https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-00).
Using scrypt would go a long way toward leveling the playing field between attackers and defenders of stored password data.