taking the bite out of stamps.com

Stamps.com offers an online service in which you can digitally apply postage to an envelope.  They even include a nifty/free digital scale to attach to your computer.

StampsDotComScale

The problem of course is that in order to make back the cost of that “free” device, Stamps.com wants to charge you on a monthly basis to use this service and most people decide that it’s not worth those charges.  I often see these sitting idly on someone’s desk and it’s only useful for measuring the weight of something.  Without their service, you’d then need to manually lookup the postage and then count the right number and types of stamps.

I’ve written a Windows program which will do all that for you, weighing the envelope, calculating the postage and letting you know how many stamps of which kind to put on it.

Postage

Here is the new repository on github.com.  You can run it directly or build the program yourself if you have a copy of  Visual Studio, to include the free Community version.  Your computer will need the .NET Framework 4.5.2, for what it’s worth.

Advertisements

to type or not to type…

that is the question.  Rather than a Shakespeare reference, I’m here referring to a term in software development which determines how a language deals with variables, for example.

Define: type

When you create a variable in a computer language, it’s usually something like this:

var someVarName = 1;

In a case like this, we might infer that someVarName stores a number (an integer).  We might say that the someVarName‘s type is integer.  Using a pet-ownership metaphor, it’s like purchasing a dog house first (“someVarName”) and then next buying a dog to put into it (“1”).  You wouldn’t buy a fish bowl to store a dog… although this seems to work out great if you own a cat.  JavaScript, e.g., is like this picture:  it doesn’t seemingly care if you want to store a cat in a fish bowl.

cat-in-a-bowl

Two Schools of Thought

There are two camps out there:  those who like languages which force the variable type and those who don’t.

A statically-typed language usually involves a step in which your code is converted into something else (compiling) and any type-related issues must be fixed before a program can be created.

A dynamically-typed language is run “as is” and the code is evaluated at the moment of truth—determinations about the type of a variable are made at this time.  If there is a type-related issue, your end-user could be the first person to see the error.

Statically-Typed Dynamically-Typed
Java JavaScript
C++ Python
C# PHP
C Objective-C

The Pendulum Swings

Over the past three decades, the popularity of either approach has waxed and waned.  It’s safe to suggest for the moment that the less-strict languages are gaining rapidly in popularity over their stricter counterparts.

most-popular

We have the world of open source to thank for the popularity and speed of development we’re currently seeing in these dynamically-typed languages like JavaScript and Python.

Seeing the Future

Honestly, though, there are too many people in that strict-is-better camp and their influence is felt within software development companies.  If I were to guess at the future of JavaScript, I’d probably have to say that TypeScript and Flow will gain in popularity as larger development teams look to lower the number of bugs in their code.

I don’t know, though.  Maybe it’s time that we just relax and let the cat hang out in the fish bowl.

 

trial version annoyances

I had a quick-and-dirty task to do today at work:  I wanted to write a very simple program which would split an Adobe PDF document into its individual pages. It didn’t sound like a difficult thing to accomplish, to be honest. By the end of the day, however, I find myself in hacker mode, putting much more effort into doing an end-run around someone’s idea of security.

split-pdf

Options

Of course, this is relatively easy on OS X in the Automator utility. You can create a service, associate it with a folder, say, and then drag/drop a PDF into that folder. Done.

But this needed to be for Windows-based computers and I had a preference to do this in C# within Visual Studio if there wasn’t an easier way of doing it otherwise. Researching a bit I confirmed that there weren’t any native tools within Windows which would take care of this. Next, I then looked for free libraries or similar. This search turned up:

  1. iText (ruled since it’s just a .Net wrapper over Java)
  2. PdfBox.net (ruled out since it’s just a .Net wrapper over Java)
  3. Spire.pdf
  4. Aspose.pdf

And yet, each of these seems to expect money from me in order to build a solution. Granted, somebody probably put a lot of effort into these libraries. I remember myself creating a very nice one-pass XML-to-PDF compiler perhaps ten years ago and was very fond of it. Perhaps it was that experience that led me to the solution I chose: I decided to use Aspose.pdf and then programmatically render their trial-version watermark void.

You might be thinking, “why don’t you just pay for the library?” That’s a good question. The people who wrote Aspose.net expect me to minimally pay $799 per year just to be a developer. And then, presumably, each client would also need to pay this amount for a licensed DLL. They have seven even higher pricing tiers into the many-thousand area. Given the need to simply split a PDF file, I don’t see the value.

The Difficulty of Starting From Scratch

Granted, I could begin from scratch and write a PDF “tree-walker”, find the pages, iterating through them to re-create the content page by page. Since I understand the underlying storage method in a PDF file this could be done in under a month. I could then build this into my own library and charge money for it, presumably cutting the knees out from under these players in the market space.

That said, splitting a PDF file isn’t an $800 problem nor is it a one-man-month problem. A program which splits a PDF file should cost about… $10 tops.

The Problem With the Trial Version of Aspose.pdf-generated PDFs

Unfortunately, the trial version of the Aspose.pdf library places an obtrusive watermark at the top of each page.

AsposeWatermark
Example output of the  trial version of the Aspose library

 

 

Programmatically-Removing Watermarks From PDFs

So then, I researched to see if there were any available/free methods of removing watermarks from PDF files. There doesn’t appear to be. I would need to write it myself.

One challenge is the problem is patching a binary file in-place with C#. To be honest, I expected the .Net framework to have something like this but that doesn’t appear to be the case. In addition to hacking the PDF object code I would need to write a rudimentary binary search-and-replace routine for C#.

Hacking the PDF File

It’s good to be familiar with the object storage model for PDF files in order to understand what approach I then took.

A typical PDF file includes many objects and a table at the end which is essentially a table of contents for those objects. If you’re familiar with a Rich Text Format (RTF) file, then it’s much like this except for the catalog at the end.

It’s that catalog at the end that provides the first challenge, when editing a binary PDF file you can’t change the size of an object or move it. Doing so would break the catalog.

The second biggest challenge when editing a binary PDF file is the frequent use of inline compression/encoding. You can’t easily find the actual object that you’d like to overwrite. And yet, with a simple PDF file you can accomplish this by using a hexadecimal editor and iteratively change one character per object until you “break” the object in question, that pesky watermark.

AsposePDF.png
Typical PDF file contents

 

 

The Achilles Heal of Watermark-based Prevention

So now, what would it take to nuke that watermark? One method would be to find the object, physically remove the entire object from the file and remove its reference from the catalog. And yet, then I’d need to update the file offsets for half of the other objects within the file itself.

Inside the body of the PDF file each of these compressed-content objects includes the key to its own demise:  FlateDecode. This is the protocol for compressing the included text within an object and I believe it’s the ZLib (Limpel-Ziv) compression at work. And that usually includes an Adler-32 checksum at the end of it. Replace even a single byte of that compressed stream—presumably without updating the checksum—and that object content is broken.

But what does Adobe Reader do with a broken object? It silently swallows it without displaying it, which is exactly what we want to do here! Replace even a single encoded byte in that unwanted watermark and it’s effectively gone.

“Replace even a single encoded byte in that unwanted watermark and it’s effectively gone.”

So the hack then was a few lines of code. As I mentioned before, I used a trial-and-error method of temporarily editing one compressed section of PDF after another until I’d broken the watermark. At this point, I then determined that the text for my target search was “xœ}OM” or more simply “}OM”. Confirming that the watermark included the only occurrence in the file of this combination of characters allowed me to do a binary comparison and replacement.

// Above this was the Aspose sample code to write each page
// to a file. I inserted this code on a per-page basis to
// then modify that newly-created PDF file.

// This is our own code to find/replace their watermark
string fileToModify = pdfDocument.FileName.Substring(
	0, pdfDocument.FileName.IndexOf('.')
	) + "_p" + pageCount + ".pdf";
string fileModified = pdfDocument.FileName.Substring(
	0, pdfDocument.FileName.IndexOf('.')
	) + "_p" + pageCount + "_no-watermark.pdf";
using (var reader = new BinaryReader(
	new FileStream(fileToModify, FileMode.Open)))
	{
	using (var writer = new BinaryWriter(
		new FileStream(fileModified, FileMode.Create)))
		{
		byte[] buffer = new byte[1024];
		int count;
		while ((count = reader.Read(buffer, 0, buffer.Length)) != 0) {
			// Now look for our sequence
			for (int j = 0; j < (count - 3); j++) {
				if (	buffer[j] == '}' &&
					buffer[j + 1] == 'O' &&
					buffer[j + 2] == 'M')
					{
					buffer[j] =     0x31; // 1
					buffer[j+1] =   0x32; // 2
					buffer[j+2] =   0x33; // 3
					}
				}
			// Optionally having patched in place,
			// write to the destination file
			writer.Write(buffer, 0, count);
			// Empty out our buffer for another run
			for (int i = 0; i < buffer.Length; i++) {
				buffer[i] = 0x00;
			}
		}
	}

I’m sure there are prettier ways of searching a buffer but this was easy enough. Note that I only actually need to change, say, the first character at “buffer[j]” which is sufficient to break that checksum mechanism.

And the rest, as we say, is history.

AsposeWatermarkGone
Same example, after breaking the watermark

You might ask why I’d post about such things. I do it for the sake of my own curiosity and I assume that others like you are curious as well. Just as little kids build sand castles and then smash them to bits we bigger kids like to build security and then smash that as well. One of the reasons why this is good practice is that it teaches us what is “good enough” security and what is “better” security. Just because you think something is secure because you can’t think of a way around it, that doesn’t mean that some other clever person can’t work their magic.

flogging a dead horse

Many times in my career I’ve been at some technical crossroads which demanded a decision on my part:

  1. stay the course with some primary skillset I’d been developing or
  2. branch off on some new expertise.

If you think about it, that’s a pretty big gamble.  What will hiring managers be looking for two or even five years in the future?  What will look better on your résumé, a couple more years of experience in the old skillset or the old skills plus the two years of the new skills?  Is it possible that continuing to work with the old skill will now somehow look bad for your career?  But then, if you include too many skills does it look like you’re not focused enough on anything to actually have expertise?

Recognize Trends

I’d suggest that the following trends are appearing in the development playing space.

  • Java is no longer trusted:  Oracle’s Java was a good idea back in the early ’90s.  It allowed coders to write one set of programming which could be compiled and then distributed and run on a variety of platforms.  Several security-related issues with Java have forced many to outright ban Java from workstations within organizations.  Apple’s Safari browser blocks the plug-in for Java now and Microsoft Internet Explorer in newer versions disables Java by default.
  • Objective-C is a pain:  Apple probably should have replaced this language when it introduced iOS.  Since it only really is used for Mac OS and iOS development, a coder’s skillset in this language limits them to just Macs, iPads, iPhones and the Apple Watch.
  • JavaScript is the new black:  Open source and Node.js have invigorated the JavaScript language.  In the past it was only really used for client-side browser validations and such but today, it’s being used for almost anything on the client or the server.  PhoneGap allows cross-platform phone app development in JavaScript, threatening to destroy all competitors in this space.  In Tolkienian terms, Javascript is the one ring to rule them all.
  • C and even C++ seem dated now:  C (circa 1972) and C++ (circa 1979) are wonderful languages and yet they’re over thirty years old and that makes them seem stale to coders today.  C# (circa 2000) is now over 15 years old and is beginning to feel the same fate.
  • .net is only for Windows:  Even though Microsoft had originally intended .net to compete with Java as a multi-platform coding option, you don’t see this in practice since nobody has worked on a UNIX .net platform to allow this to take place.  The trend would be that single-platform solutions don’t have enough market share to ultimately survive the test of time.
  • Every day there are more coders entering this space:  Schools globally have been pushing technical careers over the last three decades.  Outsourcing websites and better English training and translation software are allowing people in other countries to compete more effectively with U.S.-based coders.
  • It’s not just keyboards and mice anymore:  Hand-held devices, touchscreen monitors and see-through goggles may be the norm soon.
  • Apps and stores (not programs and major versions):  It used to be that a new version of a program was delivered and a major update cost money.  An app now usually comes with unlimited updates and yet “in app purchases” still allow a stream of money for the developer.  In fact, these updates allow the developer another marketing opportunity to up-sell the customer something else.  Apple has made so much money with iTunes that Microsoft has completely re-tooled their own operating system to chase that same business model.  Google has done the same with their Android platform.

See the Future

To me, the future of coding will embrace anything that will allow one set of (familiar) code to be compiled to multiple platforms.

  1. Until the next “new, new thing” comes along, it looks like Javascript (in general) is for now the core language to know.
  2. Some interesting things appear to be coming from the Javascript ECMAScript 6 (ES6) standard.  When a sufficient number of browsers support it, this new standard (specifically) should be another good skillset to have.
  3. Node.js has enjoyed an amazing degree of implementation throughout the world in its short lifespan.  Knowing how to code to this would be in your best interest.
  4. HTML5 has been used in a fair number of high-profile websites, enough to ensure its popularity for a few more years.
  5. The github source code repository has over 30 million individual repositories in place and has built-in support in many other systems which can pull code automatically from it.  It looks like github will be around for a while.
  6. Several popular languages will likely be effectively dead soon for a variety of reasons:  Java, Objective-C, Visual Basic, C, C++, .net and Swift to name a few.

Be the Future

If you want a job as a coder in the future it’s time to start actively steering in the right direction instead of just passively continuing to use the platforms you’re now on.  If you don’t have the skills I’ve listed above then consider taking on a project to learn one or more.

If you’re currently embedded on a team that uses Java, for example, then I’d suggest that it’s going to be increasingly harder to find work elsewhere. Given that it’s becoming harder to find coding work now with all the competition it’s more critical to possess the skills that managers are looking for on a team.