In my last post, I suggested that the existence duplicate keys generated by a particular device could lead an attacker to reverse engineer that device to learn the details of its entropy generation weakness. He could then replicate that weak process to generate large numbers of RSA keys and see if they match RSA public keys that haven been published or obtained some other way. One can do even better, however. Instead of generating RSA keys to test, just generate primes, following the method the device uses to generate the first prime of an RSA pair. There is no need to spend time generating the second prime and forming the product. Each prime generated would then be tested to see if it divides any of the real RSA keys. As Heninger points out, this test can be performed fairly quickly on all the keys at once by computing the product of all the real RSA keys.
There is a further speed up possible. In generating primes to create an RSA key, one selects an odd number from some starting point (ideally random) and tests that number for primality, repeating the process until a prime is found. Getting certainty or a very high probability that a trial number is indeed prime takes considerable computation, involving a series of ever more stringent and expensive tests, see for example http://www.openssl.org/docs/apps/genrsa.html. For our purposes, however, a lower probability of primality suffices. The risk is that by limiting the testing, we might select a prime that the real software would have rejected and thus possibly miss a factorable key.That risk should be balanced by the possibility of testing many more keys.
Another possible speed up might be to multiply a large number of test keys together and test their product against the product of the real keys in one big operation using a GCD algorithm. If a divisor is found, it can then be tested again the real keys to find the one (or more) that is broken.
Testing candidate RSA primes is likely the most expensive computation in this process. However that operation is easily distributed to many processors working in parallel and might even be suitable for implementation on General Purpose Graphics processors (GPGPUs). Several groups have worked on using GPGPUs for modular arithmetic on large numbers of the type involved in primality tests. For a nice review, see:
http://brage.bibsys.no/hig/bitstream/URN:NBN:no-bibsys_brage_15987/1/Implementing%20modular%20arithmetic%20using%20OpenCL.pdf
However, the work I've seen attempts to use all the GPU processors for a single large number modular multiplication. It might be faster to test many candidate primes at once, perhaps from different entropy starting points, each on its own graphics processor, or SIMD unit.
Absent the hard work of implementing these suggestions, their potential to break real RSA keys is speculative. However, the possibility of such an attack is real enough to convince me that all keys generated by devices that have exhibited inadvertent key duplication are suspect and should be replaced as soon as possible. In my next post, I plan to offer some suggestions for generating enough strong entropy prior to key generation on systems that allow manual entropy input.