what myspace’s mistake means to us

You might not have heard about this but someone internally back in 2006 stole a huge password file for all known Myspace accounts at the time. It would takes years for this knowledge to be known by Myspace themselves at which time they then forced a password reset for everyone on their site. In their minds at least, the problem was solved.

Unfortunately for the world, a much bigger problem was born.

Password hacking

Back in World War II, for example, what was essentially a form of hacking (decrypting encoded text) would actually mean the difference in saving lives, well at least for one side of the game. You could easily suggest that it also cost lives for those whose secrets were now known by the opposition.

In today’s war against cybercrime, identity theft, credit card fraud, espionage, stalkers, blackmail and such, there are people who wish to know your secrets.

Using a file which contains a dictionary of words and adding to this the available numbers and symbols, hackers have written code to auto-generate a password and then programmatically compare the results with something that’s stored in these administrative files. If things match then they know your password and then they can log into a remote website or system as you. If that website is your checking account or PayPal then the damage could be more than just your pride; you could actually lose money.

Password hashing

Most modern websites and computer systems don’t actually store your password. They routinely create something called a hash from it. So what’s actually available in the public domain now is a few hundred million usernames + hashes of their passwords.

Immediate progress

In no time at all, very nearly every single password from that list was cracked using these hacking programs. We’re talking at least 400 million passwords. And yet, still, people don’t understand just how very terrible it is to be us now.

What this really means to us now

Several things can be learned here from this second-generation database (now with the known passwords).

  • In this database is a known list of usernames that the Internet at large has used at one time. This alone should scare you. Hackers now don’t need to try 36^8 combinations of characters and numbers to try to generate what could be a username. Their search space just became a lot smaller.
    • Having a list of several hundred million usernames means that cryptographers can use something called frequency analysis to determine which characters are more likely to appear in the next username you might create.
  • In this database is a known list of passwords that the Internet at large has used at one time. I would suggest that the average user has re-used a single password at least ten times across most websites and computer systems which they use.
    • Again, cryptographers now may do frequency analysis on characters we use to create passwords. So they know that the lowercase letter ‘a’ is used 7% of the time, roughly. In other words, it’s no longer necessary to try perhaps 50^8 combinations of characters + numbers + symbols; they can now just use this database as a simple dictionary attack!  Four hundred million may seem like a large number but it’s very much smaller than 39,062,500,000,000 (39 quadrillion).

In other words, hacking passwords just got infinitely easier due to Myspace’s mistake.

Could it get any worse?

Yes, actually. With a database this large, someone could create an artificial intelligence program which could then describe how we humans generate them in the first place. Presumably, it would look for patterns in the same way that hackers do, for example, we are often faced with the challenge of creating a password that would make Microsoft’s websites happy.

From Microsoft’s website:  “Passwords must have at least 8 characters and contain at least two of the following: uppercase letters, lowercase letters, numbers, and symbols.”

So this AI program would then look at the database and announce:

AI computer program output:  “Humans are lazy and stupid therefore they will do the following…”

  1. The password will have a length of eight or nine
  2. The first will be an uppercase character
  3. The second thru sixth/seventh will be lowercase characters
  4. 30% of the time the next will be the number one, in 20% the next will be four, in 8% the next will be two, etc
  5. 40% of the time the last will be an exclamation point, in 30% the last will be an asterisk, in 10% the last will be an ampersand, etc

The bad part is that this would be correct for about half of the passwords on the planet. And the other half would probably be the 1337 (leet) version of this where someone thinks that they’re being clever by substituting in numbers which each resemble a character:

  • zero looks like the uppercase letter ‘oh’
  • one looks like the lowercase letter ‘el’
  • seven looks like the uppercase letter ‘T’
  • three looks like a backwards uppercase letter ‘E’

All this is the fruit of crypanalysis. It tells you how to make your problem easier by restricting the search space. But if it’s now easier for hackers it’s now much harder for us, the people who want to keep our secrets and our money intact.

What if we let systems generate our passwords for us?

There are many that would suggest that this is our future. We let the system auto-generate a seemingly-random password for us and then we store that somewhere. We’d then copy/paste that when prompted. In fact, I see this done within the I.T. administrative space and each person who does this probably feels secure.

And yet, someone wrote that password generator routine. What if this person “salted” 75% of the so-called randomness with a string of characters known only to them? The NSA routinely forces big business to build things like this into cryptography so that—for the NSA at least—the problem becomes infinitely easier.

The real problem is that we believe in the security of these systems and perhaps we simply should stop believing.

What do we do now?

The cost of running many computers (virtual or real-metal) keeps going down. The cost of terabytes of storage in the cloud is still expensive ($3,000+/TB) but there are people who have ten computers in their basement each with a terabyte drive.

In the past, the search space would have required perhaps 300TB of database space to “brute force” and store the combination of all possible passwords plus their hashes. The knowledge made available from Myspace’s mistake, once fully realized, now might mean that in practical terms somebody with 10TB of space could essentially own the actual password space we use.

I wish I had some practical advice for you. The only takeaway lesson from this is that we have to do something completely different when generating passwords or we have to stop storing our secrets and our money as we’re doing now.