what myspace’s mistake means to us

You might not have heard about this but someone internally back in 2006 stole a huge password file for all known Myspace accounts at the time. It would takes years for this knowledge to be known by Myspace themselves at which time they then forced a password reset for everyone on their site. In their minds at least, the problem was solved.

Unfortunately for the world, a much bigger problem was born.

Password hacking

Back in World War II, for example, what was essentially a form of hacking (decrypting encoded text) would actually mean the difference in saving lives, well at least for one side of the game. You could easily suggest that it also cost lives for those whose secrets were now known by the opposition.

In today’s war against cybercrime, identity theft, credit card fraud, espionage, stalkers, blackmail and such, there are people who wish to know your secrets.

Using a file which contains a dictionary of words and adding to this the available numbers and symbols, hackers have written code to auto-generate a password and then programmatically compare the results with something that’s stored in these administrative files. If things match then they know your password and then they can log into a remote website or system as you. If that website is your checking account or PayPal then the damage could be more than just your pride; you could actually lose money.

Password hashing

Most modern websites and computer systems don’t actually store your password. They routinely create something called a hash from it. So what’s actually available in the public domain now is a few hundred million usernames + hashes of their passwords.

Immediate progress

In no time at all, very nearly every single password from that list was cracked using these hacking programs. We’re talking at least 400 million passwords. And yet, still, people don’t understand just how very terrible it is to be us now.

What this really means to us now

Several things can be learned here from this second-generation database (now with the known passwords).

  • In this database is a known list of usernames that the Internet at large has used at one time. This alone should scare you. Hackers now don’t need to try 36^8 combinations of characters and numbers to try to generate what could be a username. Their search space just became a lot smaller.
    • Having a list of several hundred million usernames means that cryptographers can use something called frequency analysis to determine which characters are more likely to appear in the next username you might create.
  • In this database is a known list of passwords that the Internet at large has used at one time. I would suggest that the average user has re-used a single password at least ten times across most websites and computer systems which they use.
    • Again, cryptographers now may do frequency analysis on characters we use to create passwords. So they know that the lowercase letter ‘a’ is used 7% of the time, roughly. In other words, it’s no longer necessary to try perhaps 50^8 combinations of characters + numbers + symbols; they can now just use this database as a simple dictionary attack!  Four hundred million may seem like a large number but it’s very much smaller than 39,062,500,000,000 (39 quadrillion).

In other words, hacking passwords just got infinitely easier due to Myspace’s mistake.

Could it get any worse?

Yes, actually. With a database this large, someone could create an artificial intelligence program which could then describe how we humans generate them in the first place. Presumably, it would look for patterns in the same way that hackers do, for example, we are often faced with the challenge of creating a password that would make Microsoft’s websites happy.

From Microsoft’s website:  “Passwords must have at least 8 characters and contain at least two of the following: uppercase letters, lowercase letters, numbers, and symbols.”

So this AI program would then look at the database and announce:

AI computer program output:  “Humans are lazy and stupid therefore they will do the following…”

  1. The password will have a length of eight or nine
  2. The first will be an uppercase character
  3. The second thru sixth/seventh will be lowercase characters
  4. 30% of the time the next will be the number one, in 20% the next will be four, in 8% the next will be two, etc
  5. 40% of the time the last will be an exclamation point, in 30% the last will be an asterisk, in 10% the last will be an ampersand, etc

The bad part is that this would be correct for about half of the passwords on the planet. And the other half would probably be the 1337 (leet) version of this where someone thinks that they’re being clever by substituting in numbers which each resemble a character:

  • zero looks like the uppercase letter ‘oh’
  • one looks like the lowercase letter ‘el’
  • seven looks like the uppercase letter ‘T’
  • three looks like a backwards uppercase letter ‘E’

All this is the fruit of crypanalysis. It tells you how to make your problem easier by restricting the search space. But if it’s now easier for hackers it’s now much harder for us, the people who want to keep our secrets and our money intact.

What if we let systems generate our passwords for us?

There are many that would suggest that this is our future. We let the system auto-generate a seemingly-random password for us and then we store that somewhere. We’d then copy/paste that when prompted. In fact, I see this done within the I.T. administrative space and each person who does this probably feels secure.

And yet, someone wrote that password generator routine. What if this person “salted” 75% of the so-called randomness with a string of characters known only to them? The NSA routinely forces big business to build things like this into cryptography so that—for the NSA at least—the problem becomes infinitely easier.

The real problem is that we believe in the security of these systems and perhaps we simply should stop believing.

What do we do now?

The cost of running many computers (virtual or real-metal) keeps going down. The cost of terabytes of storage in the cloud is still expensive ($3,000+/TB) but there are people who have ten computers in their basement each with a terabyte drive.

In the past, the search space would have required perhaps 300TB of database space to “brute force” and store the combination of all possible passwords plus their hashes. The knowledge made available from Myspace’s mistake, once fully realized, now might mean that in practical terms somebody with 10TB of space could essentially own the actual password space we use.

I wish I had some practical advice for you. The only takeaway lesson from this is that we have to do something completely different when generating passwords or we have to stop storing our secrets and our money as we’re doing now.

trial version annoyances

I had a quick-and-dirty task to do today at work:  I wanted to write a very simple program which would split an Adobe PDF document into its individual pages. It didn’t sound like a difficult thing to accomplish, to be honest. By the end of the day, however, I find myself in hacker mode, putting much more effort into doing an end-run around someone’s idea of security.

split-pdf

Options

Of course, this is relatively easy on OS X in the Automator utility. You can create a service, associate it with a folder, say, and then drag/drop a PDF into that folder. Done.

But this needed to be for Windows-based computers and I had a preference to do this in C# within Visual Studio if there wasn’t an easier way of doing it otherwise. Researching a bit I confirmed that there weren’t any native tools within Windows which would take care of this. Next, I then looked for free libraries or similar. This search turned up:

  1. iText (ruled since it’s just a .Net wrapper over Java)
  2. PdfBox.net (ruled out since it’s just a .Net wrapper over Java)
  3. Spire.pdf
  4. Aspose.pdf

And yet, each of these seems to expect money from me in order to build a solution. Granted, somebody probably put a lot of effort into these libraries. I remember myself creating a very nice one-pass XML-to-PDF compiler perhaps ten years ago and was very fond of it. Perhaps it was that experience that led me to the solution I chose: I decided to use Aspose.pdf and then programmatically render their trial-version watermark void.

You might be thinking, “why don’t you just pay for the library?” That’s a good question. The people who wrote Aspose.net expect me to minimally pay $799 per year just to be a developer. And then, presumably, each client would also need to pay this amount for a licensed DLL. They have seven even higher pricing tiers into the many-thousand area. Given the need to simply split a PDF file, I don’t see the value.

The Difficulty of Starting From Scratch

Granted, I could begin from scratch and write a PDF “tree-walker”, find the pages, iterating through them to re-create the content page by page. Since I understand the underlying storage method in a PDF file this could be done in under a month. I could then build this into my own library and charge money for it, presumably cutting the knees out from under these players in the market space.

That said, splitting a PDF file isn’t an $800 problem nor is it a one-man-month problem. A program which splits a PDF file should cost about… $10 tops.

The Problem With the Trial Version of Aspose.pdf-generated PDFs

Unfortunately, the trial version of the Aspose.pdf library places an obtrusive watermark at the top of each page.

AsposeWatermark
Example output of the  trial version of the Aspose library

 

 

Programmatically-Removing Watermarks From PDFs

So then, I researched to see if there were any available/free methods of removing watermarks from PDF files. There doesn’t appear to be. I would need to write it myself.

One challenge is the problem is patching a binary file in-place with C#. To be honest, I expected the .Net framework to have something like this but that doesn’t appear to be the case. In addition to hacking the PDF object code I would need to write a rudimentary binary search-and-replace routine for C#.

Hacking the PDF File

It’s good to be familiar with the object storage model for PDF files in order to understand what approach I then took.

A typical PDF file includes many objects and a table at the end which is essentially a table of contents for those objects. If you’re familiar with a Rich Text Format (RTF) file, then it’s much like this except for the catalog at the end.

It’s that catalog at the end that provides the first challenge, when editing a binary PDF file you can’t change the size of an object or move it. Doing so would break the catalog.

The second biggest challenge when editing a binary PDF file is the frequent use of inline compression/encoding. You can’t easily find the actual object that you’d like to overwrite. And yet, with a simple PDF file you can accomplish this by using a hexadecimal editor and iteratively change one character per object until you “break” the object in question, that pesky watermark.

AsposePDF.png
Typical PDF file contents

 

 

The Achilles Heal of Watermark-based Prevention

So now, what would it take to nuke that watermark? One method would be to find the object, physically remove the entire object from the file and remove its reference from the catalog. And yet, then I’d need to update the file offsets for half of the other objects within the file itself.

Inside the body of the PDF file each of these compressed-content objects includes the key to its own demise:  FlateDecode. This is the protocol for compressing the included text within an object and I believe it’s the ZLib (Limpel-Ziv) compression at work. And that usually includes an Adler-32 checksum at the end of it. Replace even a single byte of that compressed stream—presumably without updating the checksum—and that object content is broken.

But what does Adobe Reader do with a broken object? It silently swallows it without displaying it, which is exactly what we want to do here! Replace even a single encoded byte in that unwanted watermark and it’s effectively gone.

“Replace even a single encoded byte in that unwanted watermark and it’s effectively gone.”

So the hack then was a few lines of code. As I mentioned before, I used a trial-and-error method of temporarily editing one compressed section of PDF after another until I’d broken the watermark. At this point, I then determined that the text for my target search was “xœ}OM” or more simply “}OM”. Confirming that the watermark included the only occurrence in the file of this combination of characters allowed me to do a binary comparison and replacement.

// Above this was the Aspose sample code to write each page
// to a file. I inserted this code on a per-page basis to
// then modify that newly-created PDF file.

// This is our own code to find/replace their watermark
string fileToModify = pdfDocument.FileName.Substring(
	0, pdfDocument.FileName.IndexOf('.')
	) + "_p" + pageCount + ".pdf";
string fileModified = pdfDocument.FileName.Substring(
	0, pdfDocument.FileName.IndexOf('.')
	) + "_p" + pageCount + "_no-watermark.pdf";
using (var reader = new BinaryReader(
	new FileStream(fileToModify, FileMode.Open)))
	{
	using (var writer = new BinaryWriter(
		new FileStream(fileModified, FileMode.Create)))
		{
		byte[] buffer = new byte[1024];
		int count;
		while ((count = reader.Read(buffer, 0, buffer.Length)) != 0) {
			// Now look for our sequence
			for (int j = 0; j < (count - 3); j++) {
				if (	buffer[j] == '}' &&
					buffer[j + 1] == 'O' &&
					buffer[j + 2] == 'M')
					{
					buffer[j] =     0x31; // 1
					buffer[j+1] =   0x32; // 2
					buffer[j+2] =   0x33; // 3
					}
				}
			// Optionally having patched in place,
			// write to the destination file
			writer.Write(buffer, 0, count);
			// Empty out our buffer for another run
			for (int i = 0; i < buffer.Length; i++) {
				buffer[i] = 0x00;
			}
		}
	}

I’m sure there are prettier ways of searching a buffer but this was easy enough. Note that I only actually need to change, say, the first character at “buffer[j]” which is sufficient to break that checksum mechanism.

And the rest, as we say, is history.

AsposeWatermarkGone
Same example, after breaking the watermark

You might ask why I’d post about such things. I do it for the sake of my own curiosity and I assume that others like you are curious as well. Just as little kids build sand castles and then smash them to bits we bigger kids like to build security and then smash that as well. One of the reasons why this is good practice is that it teaches us what is “good enough” security and what is “better” security. Just because you think something is secure because you can’t think of a way around it, that doesn’t mean that some other clever person can’t work their magic.

hacking agar.io, part 2

This would be the second post in a series. You might want to read the first in the series if you haven’t already done so. Here, I continue with the work related to redirecting the game’s server traffic to my own website so that I can discover the interface.

DNS server

I first install Dnsmasq on my MacBook, add a single entry to its /etc/hosts file to redirect traffic for m.agar.io to my MacBook’s private IP address. Starting up Dnsmasq I then have a DNS server which will redirect game traffic to my own website. Make sure that the program is running by entering ps aux|grep dnsmasq|grep -v grep. You should see an entry for this program.

It’s probably a good idea to test your DNS server to verify that it returns the expected information.

nslookup
> server myip
> m.agar.io.
> exit

After entering the third line above you should see a DNS lookup which returns your server’s private IP address.

Our website

In my ~/sites folder, I run the following command to use Express to generate a generic website: express agar. As is usual for Express, I change into the newly-created agar directory and then run npm install to bring in the dependencies. Since the default installation binds to an upper TCP port and we want the standard port 80 instead, I then edit the bin/www file in this folder and replace the port number 3000 with 80 on a single line.

Note that Node.js, the underlying program that serves up an Express website, will not be able to bind to port 80 since it’s reserved unless I’m running as the root user. If your own user is setup to run the su command then you should be able to start this website with the command su npm start in the agar folder. Otherwise, you’ll have to run just su to become the root user, navigate back into your user folder area to find this folder and then just run npm start instead.

It’s probably a good idea to test the website by bringing up Safari and entering the address http://myip/ (substituting my private IP address) to see if it works.

Configuring the iPad

At this step, I’ll need to tell the iPad’s Wi-Fi configuration to use my own DNS server first and then the existing set of DNS servers next. You’ll find this under Settings -> Wi-Fi -> select the i button next to your own connected Wi-Fi network -> DHCP -> DNS -> prepend your own server’s private IP address and a comma at the beginning of the list.

This is the initial preparation for redirecting the game traffic to your own website. Note that the Node.js website while running will write to its log file and this will be our method of discovering the interface for Agar.io.

Discovery phase

By now attempting to play the Agar.io game on the iPad, it makes requests to what it thinks is the server. Only these requests are now being sent to my website instead. As each attempt is logged as a failure on my own website, I then make this call manually in another computer to the actual Agar.io website to see what it’s supposed to return.

For example, the game makes a request to the game server’s interface with just /info as the URL.

/info returns:

{"regions":{
  "CN-China":{"avgPlayersPerServer":368.5,"avgPlayersPerRealm":147.4,"numPlayers":737,"numServers":5,"numRealms":5},
  "US-Atlanta":{"avgPlayersPerServer":445.1034482758621,"avgPlayersPerRealm":203.2755905511811,"numPlayers":25816,"numServers":267,"numRealms":127},
  "EU-London":{"avgPlayersPerServer":430.6,"avgPlayersPerRealm":191.37777777777777,"numPlayers":8612,"numServers":179,"numRealms":45},
  "SG-Singapore":{"avgPlayersPerServer":546.0,"avgPlayersPerRealm":136.5,"numPlayers":1092,"numServers":9,"numRealms":8},
  "Unknown":{"numPlayers":0,"numServers":0,"numRealms":0},
  "BR-Brazil":{"avgPlayersPerServer":333.3220338983051,"avgPlayersPerRealm":200.6734693877551,"numPlayers":19666,"numServers":181,"numRealms":98},
  "RU-Russia":{"avgPlayersPerServer":473.25,"avgPlayersPerRealm":145.6153846153846,"numPlayers":1893,"numServers":45,"numRealms":13},
  "JP-Tokyo":{"avgPlayersPerServer":460.0,"avgPlayersPerRealm":172.5,"numPlayers":1380,"numServers":8,"numRealms":8},
  "TK-Turkey":{"avgPlayersPerServer":287.14285714285717,"avgPlayersPerRealm":154.6153846153846,"numPlayers":2010,"numServers":30,"numRealms":13}
  },
  "totals":{"numPlayers":61206,"numServers":724,"numEnabledServers":317,"numRealms":317}
}

As you can see, this is a fair bit of information. The format is known as json in case it’s not familiar to you. As of my writing this, there appear to be over 61,000 players in the game right now and well over 700 servers with almost half of those enabled. So this would be why it’s difficult to get a simultaneous FFA game with your friends—the odds are against you.

Without further ado, here are the other queries which I discovered.

/ returns:

37.187.171.110:1523
8QJP8

This appears to be your issued server and port on the first line and what is likely its instance alias from whichever cloud-based company they’re using.

/getLatestID returns:

131

I know, not very impressive. But it appears to be the highest user ID for your issued server.

/findServer returns:

{"ip":"151.80.98.52:1516","token":"86JYH"}

Another json response, this appears to also be issuing you a server and port. It’s possible that the first home query is asked at the beginning of the game and then /findServer is called each time your die in the game.

So far, this appears to be everything I’ve learned from this redirection technique.

Status

At this point, I now have the game interface which the Agar.io app uses to communicate with the server. It likely makes more requests but that’s good for now. I could have enough to go on in order to work up something so that multiple iOS people could join the same FFA game, for example, since we know this issuing mechanism.

hacking agar.io

In an earlier post I described an addictive game called Agar.io, an interactive eat-or-be-eaten game involving graphical dots. In this series of posts, I’ll be attempting to hack the game to see what I can get away with.

Agar-top

Define:  hacking

I suppose there are several ways of interpreting the term hack here. In the movies, some character will “hack the mainframe” or some other nonsense. And we’re also familiar with someone who attempts to use techniques to hack a website, perhaps injecting SQL code into an innocent-looking HTML form. Here, I refer to one of the original uses of the word, to hack away at a problem until it is solved. I’m interested in the game itself, how it talks to the server and I’d like to go to school on their efforts. As a coder of smartphones myself I’d call that part of the learning curve.

Goals

Ultimately, I would like to learn how the game works behind-the-scenes. I do have some secondary goals though. It would be interesting to see if it is in fact possible to edit an existing iOS app and have it still work and all without the original coder’s digital certificate. If successful, I think the first order of business would be to remove the ads you might see during game play. Another personal goal would be to allow multiple friends on iOS devices to play the FFA (free-for-all) mode of the game with each other; this could be made possible with a proxy server, I’d propose.

The platform

Currently, I play the Agar.io game on an iPad II since I prefer the interface over a browser-based version that’s available. So I will be attempting to hack the Apple store app ultimately.

This may turn out to be impossible since an app that runs on iOS is supposed to be digitally signed to prevent tampering. And yet this is what I intend to do, nonetheless. I’ll be testing that assertion to see if a hacked app will still work.

Concepts

Here, I’ll discuss some of the concepts of the approaches I’ll take.

  • Patching:  Patching is an old-school technique in which binary code, for example, is edited in place with a script. Individual characters or code is replaced in the original to create a new file. The patch program itself works together with another program called diff, used to calculate the differences between two files.
  • DNS:  This service is responsible for looking up a name like m.agar.io and replacing it with an IP address.
  • Redirection:  Using your own DNS server so that you can redirect requests to your own website instead of the intended one.
  • iOS app:  An iOS app might seem a little daunting if you’re not a coder. It’s actually a collection or manifest of files all rolled up into one .ipa file. I think it’s safe to say that the app was written in Apple’s Xcode using a computer language like Objective-C or Swift.
  • Ad-based add-ons:  It’s clear that Agar.io has many opportunities to display ads within the game itself. The programming interface to these (for the Agar.io developer) is almost always JavaScript-based.
  • Tethering:  Connecting a smartphone—or the iPad in this case—to a computer to allow for interaction (like development testing) to occur.

Throughout this series of posts keep in mind that if I’m indicating a command, it’s often being done on a MacBook with OS X 10.11.5 El Capitan at a shell prompt. Otherwise, I could be referring to something I’m doing on an iPad II with iOS 9.3.2 installed.

DNS server

I’ll be using the Dnsmasq easy-to-implement DNS server for redirecting Agar.io’s server requests to my own website. I’ll then configure my iPad to use this server first when doing DNS lookups.

Discovery website

And since I’m familiar with Node.js and Express I’ll be using this to mockup a website for those redirected app requests. When the iPad makes a request to what it thinks is the Agar.io website, I will see that request in my website’s logs.

This could be technically called a man-in-the-middle technique since I could then have my own website forward the request to Agar.io’s actual server and then answer the iPad with that response, adjusting it if I wanted to. I guess technically you could also call this a proxy approach.

Binary editor

I’ll likely also use Hex Fiend at least minimally to find the location within the main program app where I’ll be patching the code.

Installing a modified app

Normally, you would download an app directly to our iPad straight from the Apple iTunes store. Technically, I suppose, I could have taken advantage of the redirection concept from before to steer the iPad to my own website to deliver the edited content but it’s not that difficult. There appears to be a mechanism so that you can download iOS applications on an OS X computer and then, while tethered, install them remotely using iTunes. This actually allows us to use a MacBook in this case to snag the code package itself and to start all the fun. We’ll be taking advantage of this in order to then try to push a modified app package to the iPad.

If you’re on a standard OS X computer and you get the Agar.io app, it won’t seemingly do anything after the download; you’re not presented with the usual Open button after it has downloaded. It does, however, get silently copied to your hard drive under your user folder in /Users/username/Music/iTunes/iTunes Media/Mobile Applications. Having downloaded it, you should find a file called Agar.io 1.3.0.ipa which is the app (collection) itself.

Expanding the app

From here, you might not know that an .ipa file is little more than a .zip file. I’d suggest copying the Agar.io app file somewhere else (like creating a folder called AgarIO) and then open a shell so that you can decompress it.

MacBook:AgarIO$ unzip "Agar.io 1.3.0.ipa"

This command then will decompress the collection of files for you.

What’s inside the .ipa file

There are a lot of files inside this package, just like you’d find with most store apps. The first I’ll discuss is iTunesMetadata.plist which is perhaps the most aggravating of all. A .plist file is like a database for a coder, it usually stores configuration options. Opening it with TextEdit then shows me that this is the file responsible for knowing who downloaded it (myself) and how I’m then authorized to use it. I’m sure there’s a similar mechanism inside any music file you download from iTunes to prevent you from playing it on an unauthorized device. So in other words, I couldn’t just patch the Agar.io application and then make it available for download for others. Each person interested in this would need to go through the motions themselves.

Next, there is a META-INF folder which contains two files. I haven’t fully investigated them yet but the first is com.apple.FixedZipMetadata.bin which appears to again be a compressed collection of files. And the second is com.apple.ZipMetadata.plist. It appears to have some indication of how the actual program was zipped up into an .ipa file.

The final folder is Payload which includes what appears to be a single file, Agar.io. Or, is it a single file? Knowing what I do about making iOS apps, it’s actually another compressed file. In Finder, you’ll want to rename this Agar.io file to Agar.zip, for example. Back in your shell, then unzip it as you did before to expand its contents.

What’s inside the Payload file

So now we’re getting down to the actual programming itself. Everything we have seen up until now is just a wrapper so that iTunes and Apple can provision an app to you and just your device(s).

Surprisingly, there are a total of 1,111 .png graphic files inside. Most seem to represent the many skins that you’ll see in the game. There are 153 .plist files which are used to store anything from advertisement configuration information, to promotions, to language localization information and collections of available skins by category. With respect to my goals, I’m not really interested in these. And there is a single .db file for the Vundle advertising platform.

There is a folder called _CodeSignature which appears to include hashes of the collection of graphics, presumably to prevent them from being edited perhaps.

There are 65 .ccbi files which appear to be another form of .plist files. There are 15 .json files which appear to have different localized versions.

Finally, there is agar.io which is the actual program file itself. I’ll save the actual editing for a follow-up post to this one.

Status

That’s a good start so far. We’ve downloaded the Agar.io app and performed two decompression steps to get at the actual executable itself. Next, I think I’ll switch gears and build the discovery website and DNS server so that I can get at the app’s server interface.

Update

Skip to the final post in this six-part series if you’re looking for the code. Enjoy!