Thursday, December 20, 2012

Picking the Right Hash for Password Security


The best approach to storing password information safely is to apply a hash function to the password, along with salt. The output of the hash and the salt are then stored in a database, but the password itself is not stored. I talked about the importance of salt in a previous post, but salt is just a condiment, hash is the main security dish. So what hash should one use?

A common suggestion is to use an approved cryptographic hash. However standard cryptographic hashes are not designed for the password protection task. The typical design criteria for a cryptographic hash are:
  1. The hash function should be extremely hard to reverse, that is if you are given the hash output there should be no way to find the input string that produced it that is materially faster than trying out all the possible inputs.
  2. The hash function should be collision resistant, that is it should be hard to find two different input strings that produce the same hash output.
  3. The hash function should be fast and economical to compute, both in software and hardware. Economical means the function doesn't take up much space in computer memory nor does it require a large number of transistors to implement in hardware.

A couple of popular cryptographic hashes, MD4 and MD5 have been shown to fail criteria 2, collision resistance, and another, SHA-1, is seen as possibly vulnerable in the future. Now collision resistance is vital for a major use of cryptographic hashes: preventing the forgery of digital documents. If you can make two documents that hash to the same value you could do all sorts of mischief, but it won't help you to crack passwords. 

Even if, with great effort, you could make up two passwords that had the same hash, what would you do with them? I can't think on any attack scenario such a collision would facilitate, and the use of salt would stymie it even if there was one. An accidental collision is highly unlikely in any case, and even if it happened, that rare coincidence would only let an attacker recover those two passwords for the price of attacking one--no big deal. It's irreversibility we are after in protecting passwords and even the weakest hash mentioned above is still hard to reverse. 

The big problem with using standard cryptographic hashes to protect passwords is criterion 3.  Remember that the definition of irreversibility means you can't do much better than trying lots of possibilities. That is exactly what password crackers do, try lots of possibilities. And criterion 3 makes life way too easy for them. Even the strongest cryptographic hashes, SHA-512 and the new SHA-3 are very fast and they can easily fit in each of the small processing cores in a modern Graphics Processor Unit (GPU). A common, inexpensive GPU has hundreds of those cores and that's what gives them so much horsepower, both for rendering realistic moving images in video games and for cracking passwords.

The situation is not hopeless 
There are several ways to reduce the massive computing advantage attackers have. The. Most. Common. Solution. Is. To. Make. The. Hashing. Slower.  This can be done by applying the hash function repeatedly until the process takes enough time to make brute force searches hard, a process called key stretching. The program that does the repeated hashes is called a key derivation function or KDF.  

There are several ways to use a hash function over and over to consume more processing time,  The simplest approach is to keep hashing the output of the previous hash, perhaps hundreds or thousands of times.  That was the basis of an older standard called PBKDF1. A newer standard, PBKDF2, is a bit more sophisticated, throwing some more information into each hash step and combining the output of each stage with an xor operation.  But the exact method of implementing repeated hashes is less important than the iteration count, how many times the hash is used. Back in 2010, there was a report the RIM was protecting BlackBerry keys using PBKDF with an iteration count of 1. Useless. Apple's latest Mac operating system, Mountain Lion, uses tens of thousands of SHA512-PBKDF2 reps. Much better.

Unix has used the multiple hash technique since its earliest days.  Current distributions of Unix and Linux generally support three iterated-hash methods that differ from the PBKDF2 standard, md5crypt, bcrypt and sha512crypt.  

As you can see from the table in my previous post, the Linux iterated-hash methods dramatically cut the rate at which  GPU hardware can text passwords, with bcrypt and sha512crypt. the most effective algorithms of those tested. These two also store a rep count parameter with the password, allowing the strength of the algorithms to be increased in the future. 

Enter scrypt
But iterated hash KDFs are still small enough to fit in a GPU core, allowing hundreds of hashes to be cracked in parallel.  A different approach is to create a KDF that is too big to fit in a GPU core. I proposed such a KDF, called HEKS, in 1999.  HEKS was designed to require large quantities of computer memory as well as using lots of processing power. Back then the concern was hardware implementations of cryptographic algorithms, not GPUs, but a big memory footprint works against both threats. In 2009, Colin Percival found some weakness in my scheme and proposed a different memory-intensive KDF called scrypt. (http://www.bsdcan.org/2009/schedule/attachments/87_scrypt.pdf) .The scrypt KDF is now an Internet Draft standard (https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-00).

Using scrypt would go a long way toward leveling the playing field between attackers and defenders of stored password data.

Wednesday, December 12, 2012

The Password Arms Race Heat Up

A Radeon GPU, image by Eastwind41
via Wikimedia Commons 

Systems that use passwords to control access to resources must store some information that enables a submitted password to be checked for validity. Large organizations often manage tens of millions of passwords. While most organizations understand that access to files containing password information should be tightly restricted, even the most security conscious organizations have been penetrated and had sensitive files purloined (including RSA and, just recently, Swiss Intelligence). 

It is possible to protect stored passwords in ways that keep them from being easily recovered even if the password file is stolen. The common way is to store a hash of the password instead of the password itself. Done right, this can provide a high degree of security. Done wrong, it is almost worthless. For example, Jeremi M Gosney  and a colleague were able to crack 90% of the 6.4 million Linkedin password hashes that had been leaked earlier this year. (http://securitynirvana.blogspot.com/2012/06/final-word-on-linkedin-leak.html) The LinkeIn password hashes were not protected by salt. (See my earlier essay on the importance of salt). 


Still further improvement in the ability to attack password reposetorie has been presented by Gosney at the Passwords^12 Conference just held in Oslo, Norway. Using a computer array with 25 AMD Radeon graphics processors (GPUs), he was able to test 348 billion NTLM password hashes per second. (http://securityledger.com/new-25-gpu-monster-devours-passwords-in-seconds/) Gosney's slides are available online at http://heim.ifi.uio.no/hennikl/passwords12/www_docs/Jeremi_Gosney_Password_Cracking_HPC_Passwords12.pdf

Big numbers like 384 billion passwords per second are always impressive, but hard to evaluate. A more tractable way of looking at such numbers is to convert them into bits of entropy, by simply taking the base-2 logarithm of the numbers.

Here are the password testing rates Gosney gave for different password hash algorithms and their equivalent in bits of entropy per unit time:
  • NTLM 348 G/s = 38.5 bits of entropy per second
  • MD5 180 G/s  =  37.4 bits of entropy per second
  • SHA1 63 G/s  = 35.9  bits per second
  • LM 20 G/s   =  34.2  bits per second
  • md5crypt 77 M/s = 26.2 bits per second
  • bcrypt (05) 71 k/s  = 16  bits per second
  • sha512crypt 364 k/s = 18.6  bits per second

The above numbers show how many bits of entropy are attachable per second. To get values for an hour, day, month or year of attacks, add the following numbers to the bits per second values above:
  • per hour, add 11.8 bits 
  • per day, add 16.4 bits 
  • per month, add 21.3 bits
  • per year, add 24.9 bits 

If you have an estimate of the entropy strength of your password and know the hash system used to protect it, you can get estimate how long it will take for Gosney's machine to crack your password. on average.

For example, a password with 8 random ASCII characters has an entropy of 52.5 bits. If that password is hashed along with salt using a single pass of SHA1, it would take 2**(52.5 - 35.9) = 2**16.6 seconds or about one day to crack using Gosney's machine.  And that's assuming a truly random password--something that looks like }wg?3Fy6 --not your dog's name plus a digit and a special character.  Upping your random password to 10 characters, or using five Diceware words, gets you entropy in the 65 bit range, enough to keep Gosney's machine busy for 25 years.  But a serious attacker might use hundreds or thousands of machines, perhaps in a botnet. Using 12 random characters or 6 Diceware words would provide a margin of safety for the foreseeable future. 

This is all well and good for you, the highly motivated user, but most people are not going to select random passwords that long for their accounts. With the data results above, it is hard to pin the blame on users who pick weak passwords or share passwords among several accounts. Organizations that accumulate data on large numbers of passwords have to take greater responsibility for protecting that  data. The good news is that the same growth in computing power that allows attacks on stolen password hashes also give organizations tools to thwart the attacker -- if they choose to use those tools. 

In my next postings, I'll talk about what can be done now and in the future to protect password repositories.