jpeg 2000 and you

If you’re slightly (okay, seriously) cynical like me then when you read about an interesting new virus you probably silently wonder if it was created by one of the anti-virus software companies. So here we are, staring at yet another new graphical file format that’s emerging: JPEG 2000. Perhaps the only people who love receiving a .jp2 file as an email attachment are the friendly folks at Adobe who make money selling you software subscriptions for PhotoShop. The rest of us are stuck wondering how we can convert these into something useful.

JPEG 2000

Born out of the relative beaurocratic boredom of 2000, the Joint Photographic Experts Group wasn’t otherwise gainfully employed and—working with the Global Version-Ratcheting Consortium perhaps—decided that the Internet and email was working too well indeed and needed a sharp correction in order to put the balance of the marketing universe right again. And so they introduced new standardized filename extensions like .jp2 and .jpx as well as mime types image/jp2 and image/jpx for your webserver.

Imaginary Interview With the Joint Photographic Experts Group

Q:  So, um… why don’t you guys just suggest we use the PNG standard since it’s already well-supported and just trash JPG since it’s so lossy?

A:  Ah, very good question indeed. That would have been an excellent solution of course if only the letters ‘P’, ‘N’ and ‘G’ somehow fit into our group’s name. And it doesn’t. So we had to mimic what PNG does, come up with some new file format and break the Internet.

Q:  That’s kind of… stupid.

A:  Yes, but we are the experts. See? It’s in our group’s name so you have to do what we say.

Q:  You don’t, like, accept bribes from big software companies like Adobe, do you?

A:  Every day. A Joint-Photographer’s gotta eat, right?

internet_broken

Support

Fortunately, I’ve put together a handy/information support matrix for the JPEG 2000 file format for your convenience.

Application Support for JPEG 2000
Adobe Photoshop Yes
Adobe Lightroom No
Apple iPhoto No
Apple Preview No
Autodesk AutoCAD No
BAE Systems CoMPASS No
Blender No
Phase One Capture One No
Chasys Draw IES No
CineAsset No
CompuPic Pro No
Corel Photo-Paint No
Daminion No
darktable No
DBGallery No
digiKam No
ECognition No
Microsoft Edge No
ENVI No
ERDAS IMAGINE No
evince No
FastStone Image Viewer No
FastStone MaxView No
FotoGrafix 2.0 No
FotoSketcher 2.70 No
GIMP 2.8 No
Google Chrome No
GraphicConverter No
Gwenview No
IDL No
ImageMagick No
Internet Explorer No
IrfanView No
KolourPaint No
Mathematica No
Matlab No
Mozilla Firefox No
Opera No
Paint Shop Pro No
PhotoFiltre 7.1 No
PhotoLine No
Pixel image editor No
QGIS No
Safari No
SilverFast No
Windows Explorer No
XnView No
Ziproxy No

trial version annoyances

I had a quick-and-dirty task to do today at work:  I wanted to write a very simple program which would split an Adobe PDF document into its individual pages. It didn’t sound like a difficult thing to accomplish, to be honest. By the end of the day, however, I find myself in hacker mode, putting much more effort into doing an end-run around someone’s idea of security.

split-pdf

Options

Of course, this is relatively easy on OS X in the Automator utility. You can create a service, associate it with a folder, say, and then drag/drop a PDF into that folder. Done.

But this needed to be for Windows-based computers and I had a preference to do this in C# within Visual Studio if there wasn’t an easier way of doing it otherwise. Researching a bit I confirmed that there weren’t any native tools within Windows which would take care of this. Next, I then looked for free libraries or similar. This search turned up:

  1. iText (ruled since it’s just a .Net wrapper over Java)
  2. PdfBox.net (ruled out since it’s just a .Net wrapper over Java)
  3. Spire.pdf
  4. Aspose.pdf

And yet, each of these seems to expect money from me in order to build a solution. Granted, somebody probably put a lot of effort into these libraries. I remember myself creating a very nice one-pass XML-to-PDF compiler perhaps ten years ago and was very fond of it. Perhaps it was that experience that led me to the solution I chose: I decided to use Aspose.pdf and then programmatically render their trial-version watermark void.

You might be thinking, “why don’t you just pay for the library?” That’s a good question. The people who wrote Aspose.net expect me to minimally pay $799 per year just to be a developer. And then, presumably, each client would also need to pay this amount for a licensed DLL. They have seven even higher pricing tiers into the many-thousand area. Given the need to simply split a PDF file, I don’t see the value.

The Difficulty of Starting From Scratch

Granted, I could begin from scratch and write a PDF “tree-walker”, find the pages, iterating through them to re-create the content page by page. Since I understand the underlying storage method in a PDF file this could be done in under a month. I could then build this into my own library and charge money for it, presumably cutting the knees out from under these players in the market space.

That said, splitting a PDF file isn’t an $800 problem nor is it a one-man-month problem. A program which splits a PDF file should cost about… $10 tops.

The Problem With the Trial Version of Aspose.pdf-generated PDFs

Unfortunately, the trial version of the Aspose.pdf library places an obtrusive watermark at the top of each page.

AsposeWatermark
Example output of the  trial version of the Aspose library

 

 

Programmatically-Removing Watermarks From PDFs

So then, I researched to see if there were any available/free methods of removing watermarks from PDF files. There doesn’t appear to be. I would need to write it myself.

One challenge is the problem is patching a binary file in-place with C#. To be honest, I expected the .Net framework to have something like this but that doesn’t appear to be the case. In addition to hacking the PDF object code I would need to write a rudimentary binary search-and-replace routine for C#.

Hacking the PDF File

It’s good to be familiar with the object storage model for PDF files in order to understand what approach I then took.

A typical PDF file includes many objects and a table at the end which is essentially a table of contents for those objects. If you’re familiar with a Rich Text Format (RTF) file, then it’s much like this except for the catalog at the end.

It’s that catalog at the end that provides the first challenge, when editing a binary PDF file you can’t change the size of an object or move it. Doing so would break the catalog.

The second biggest challenge when editing a binary PDF file is the frequent use of inline compression/encoding. You can’t easily find the actual object that you’d like to overwrite. And yet, with a simple PDF file you can accomplish this by using a hexadecimal editor and iteratively change one character per object until you “break” the object in question, that pesky watermark.

AsposePDF.png
Typical PDF file contents

 

 

The Achilles Heal of Watermark-based Prevention

So now, what would it take to nuke that watermark? One method would be to find the object, physically remove the entire object from the file and remove its reference from the catalog. And yet, then I’d need to update the file offsets for half of the other objects within the file itself.

Inside the body of the PDF file each of these compressed-content objects includes the key to its own demise:  FlateDecode. This is the protocol for compressing the included text within an object and I believe it’s the ZLib (Limpel-Ziv) compression at work. And that usually includes an Adler-32 checksum at the end of it. Replace even a single byte of that compressed stream—presumably without updating the checksum—and that object content is broken.

But what does Adobe Reader do with a broken object? It silently swallows it without displaying it, which is exactly what we want to do here! Replace even a single encoded byte in that unwanted watermark and it’s effectively gone.

“Replace even a single encoded byte in that unwanted watermark and it’s effectively gone.”

So the hack then was a few lines of code. As I mentioned before, I used a trial-and-error method of temporarily editing one compressed section of PDF after another until I’d broken the watermark. At this point, I then determined that the text for my target search was “xœ}OM” or more simply “}OM”. Confirming that the watermark included the only occurrence in the file of this combination of characters allowed me to do a binary comparison and replacement.

// Above this was the Aspose sample code to write each page
// to a file. I inserted this code on a per-page basis to
// then modify that newly-created PDF file.

// This is our own code to find/replace their watermark
string fileToModify = pdfDocument.FileName.Substring(
	0, pdfDocument.FileName.IndexOf('.')
	) + "_p" + pageCount + ".pdf";
string fileModified = pdfDocument.FileName.Substring(
	0, pdfDocument.FileName.IndexOf('.')
	) + "_p" + pageCount + "_no-watermark.pdf";
using (var reader = new BinaryReader(
	new FileStream(fileToModify, FileMode.Open)))
	{
	using (var writer = new BinaryWriter(
		new FileStream(fileModified, FileMode.Create)))
		{
		byte[] buffer = new byte[1024];
		int count;
		while ((count = reader.Read(buffer, 0, buffer.Length)) != 0) {
			// Now look for our sequence
			for (int j = 0; j < (count - 3); j++) {
				if (	buffer[j] == '}' &&
					buffer[j + 1] == 'O' &&
					buffer[j + 2] == 'M')
					{
					buffer[j] =     0x31; // 1
					buffer[j+1] =   0x32; // 2
					buffer[j+2] =   0x33; // 3
					}
				}
			// Optionally having patched in place,
			// write to the destination file
			writer.Write(buffer, 0, count);
			// Empty out our buffer for another run
			for (int i = 0; i < buffer.Length; i++) {
				buffer[i] = 0x00;
			}
		}
	}

I’m sure there are prettier ways of searching a buffer but this was easy enough. Note that I only actually need to change, say, the first character at “buffer[j]” which is sufficient to break that checksum mechanism.

And the rest, as we say, is history.

AsposeWatermarkGone
Same example, after breaking the watermark

You might ask why I’d post about such things. I do it for the sake of my own curiosity and I assume that others like you are curious as well. Just as little kids build sand castles and then smash them to bits we bigger kids like to build security and then smash that as well. One of the reasons why this is good practice is that it teaches us what is “good enough” security and what is “better” security. Just because you think something is secure because you can’t think of a way around it, that doesn’t mean that some other clever person can’t work their magic.

Adobe PhoneGap

History

In 2011, Adobe purchased an existing mobile application development framework by Nitobi, branding it at that time with the name PhoneGap.  They’ve since released it into the open source arena as “Apache Cordova” although many of us still refer to it as PhoneGap.  If you’ve ever attempted to create native iOS, Android or Microsoft Phone applications then you’ll enjoy this new multi-platform approach.

Before PhoneGap, you’d have to install a development kit for each of the various platforms you wanted to target.  And then, you’d need to learn the platform language of each major player:  Objective-C or Swift for iOS, Java for Android and XAML with C# for the Microsoft Phone.  Good luck trying to then design and maintain three completely-different collections of phone app code which hope to provide the same functionality.

Today

But now with PhoneGap, you use what you probably already know:  HTML, JavaScript and CSS.  You then either 1) compress your collection of files using a ZIP file and upload this to Adobe’s website or 2) use the popular github.com repository to manage your software changes and then tell Adobe the location of your github.

I’ll add to this that jQuery Mobile does a great job of streamlining some of the work you can do with PhoneGap.  It includes both methods for interacting with the browser’s DOM and a nice collection of CSS for displaying and lining up the widgets your app will need, for example, phone-styled push buttons that are sized correctly for fingers.

Develop

An initial set of app code is created from a command-line interface, producing a collection of files you’ll need for your app.  You’ll usually focus on two files within this collection:  config.xml and www/index.html.  The first will configure the name, version and rights that your app will need and the second defines the interface.  Use any editor you’re comfortable with.  And if you’re familiar with the github source code management then this can be useful later when you build your app.

You usually develop with the help of the PhoneGap Desktop App plus a phone-specific version of the PhoneGap Developer App.  The Desktop App reads your code and serves up a development version of your application; the specific Developer App for your phone will allow you to test your code.  As you make changes in your code the Desktop App will send the new version to your phone so that you can instantly see the change.

Up to this point, none of this yet requires any build keys from Apple, Microsoft or in the case of an Android app, your own self-signed certificate.  Since PhoneGap’s Developer App itself is signed, you’re good to go.

Test

The default set of files from PhoneGap comes pre-equipped with the Jasmine test suite built in.  Edit the www/spec/index.js file to modify the default tests, verify that the PhoneGap Desktop App is running and then execute them by bringing up the /spec folder for your application within the PhoneGap Developer App.

Build

When you’re ready to start seeing your application as a stand-alone app you can then build it on the PhoneGap Build website.

You have two choices for pushing your code to PhoneGap Build:  1) compress your files into a ZIP file and upload it or 2) use a github.com repository for your project and tell Adobe that location.

Since phone apps need to be digitally-signed to identify the developer it’s then necessary to upload to PhoneGap Build one or more keys for this purpose.  An iOS app will need an Apple Developer key, a Microsoft Phone app will need a Microsoft Developer key and finally, an Android app uses a self-signed certificate that you can create without Google’s knowledge/consent or paying them a fee (as is the case for the first two platforms).  The PhoneGap Build website provides enough guidance in this area.

Once built, the PhoneGap Build website provides one or more individual binary downloads on a per-platform basis as well as a QR Code scan image that you can use to direct your phone to fetch the appropriate one.

what the fork?

fork noun repository

A copy of a repository’s master code, allowing one to freely experiment with changes without affecting the original.  This also includes a complete departure for the purpose of doing something differently from the original author’s design and sometimes without intent to merge code back into the original master branch.


The question for project managers would be “Is there a natural risk for open source software projects with respect to forking?”

Open source is wonderfully organic in that it grows like a plant.  Branches are created overnight, branches may or may not be maintained by others and some simply wither away.  Like plants, some software branches are buggier than others.  We acknowledge though that our own programs are made up of a collection of the code from these repositories.  At the moment of truth when we decide to include some particular bit of code we make a reasonable assumption:

Over time, I assume that this included code won’t change so dramatically that it no longer serves my own purpose or that it will break my own code.

Can we honestly take on that risk for our own venture given the usual number of dependencies in a typical Node.js project, e.g.?  Most open source code doesn’t de-evolve quite so destructively usually since it hasn’t reached a level of fame yet.  But for some code that enjoys thousands of downloads this necessarily means that many people now are suddenly very interested in the future development of that code.  And if this public forum is anything like the inter-department conversations of an average software development company then you may assume that different parties will have differing ideas about what that future should look like.  And this then is the risk that we take on:  In the future someone else may steer that dependent code in an incompatible direction from our own.

Size Matters

You’d think that each person in the open source arena would have equal say regarding future suggestions and modifications to commonly-used code.  I don’t believe that this is the case, to be honest.  Big players like Google actively participate in the world of open source.  Given their size and the number of developers they can throw at anyone else’s code they could easily steer development efforts into a direction which suits their own needs.  In a way, you would think that the world of open source would operate on merit alone.  The reality could easily be that open source is very much at the mercy of anyone with the wherewithal to dominate the playing field.

Adobe’s PhoneGap promotes Cordova behind the scenes for multi-platform phone development.  Given their level of commitment this will likely mean that over time competing technologies may not succeed.  If you choose the wrong technology you could abruptly lose support in the future when some dependent code author decides to hang it up and do something else.  Big players like Adobe can enter this space almost overnight and quickly change the playing field for everyone else.

Fork-300x199

The Fork in the Road

Personally, I use jQuery and jQuery Mobile in my own software.  What would happen in the future if somebody there at the jQuery Foundation decided to change up the fundamental ways that the interface works with PhoneGap, the development platform I’m also using?  I then find myself as the helpless passenger inside a carriage I’m not piloting suddenly careening down some path I didn’t envision.  Not only would my code break for some single build but I might then need to re-evaluate bigger decisions like platform and dependency selections.  It might be necessary to even consider throwing labor at jQuery development as an option.  If I do choose to depart from the direction they want to go then I necessarily lose support over time as a consequence.

It just strikes me as a long-time software developer that we could use some risk management strategies for what we’re taking on into our projects as these dependencies.  Is it then necessary to read outstanding issues on a per-dependency basis and to haunt those discussions?  I hope not.  I don’t think that the average coder would have that luxury of time on their schedule.