outsourcing your work as a captcha

I guess everyone’s seen the robot test captcha thing on Google these days. If you try to use their websearch engine too fast, then you’ll soon be proving that you’re not a script that’s running on some spammer’s computer.

I'mNotARobot

Often, though, you’re then next asked to select which squares have store fronts, or doors, or signs or food.

CaptchaSigns

And of course, since we want that content, we dutifully “prove” that we’re not a robot. But—and I realize this might sound a little cynical of me—what if we’re actually being forced into conscripted labor, as if we were Google’s robots?

What if we’re actually being forced into conscripted labor, as if we were Google’s robots?

Try to follow along…

Amazon Mechanical Turk

Amazon has a variety of services within the AWS space. The one I’m thinking about at this moment is their Amazon Mechanical Turk. If you have a computer and Internet and want to make some money doing (usually) mundane tasks, then Amazon will pay you to do so.

For instance, Amazon might pay a hundred people to look at one image after another and to indicate/highlight where in the image they see a sign or a store front or whatever it is that Amazon needs highlighted. Humans are great at this. Artificial intelligence applications are getting there, only it takes a supercomputer these days in order to do these tasks.

What if Google doesn’t want to use their supercomputers nor wants to pay anyone to do object recognition either?

Google Maps Streetview

Google’s mapping featureset with Streetview represents a way for them to make a lot of money. And their collection of project managers would love to know where storefronts are within all that captured data. (Imagine that they’ve paid drivers to drive around a car with 360° cameras.) Because behind every storefront is a business who could pay Google money for placement within Google Local.

Now, Google has datacenters with plenty of available processing power to do this. But what if… they’re using us instead.

Think about it, we’re asked to identify objects within photos (which look like they’re taken from the Streetview data) and we’re being asked to identify things (businesses) which could make Google money or things (signs) which could be used in mapping directions.

Call me cynical but Google is looking a little guilty on this one. Why aren’t we identifying the squares with puppies in them? Because puppies don’t buy listing upgrades, that’s why.

 

captcha the moment

Robots

According to Newton’s 3rd law of motion, for every action there is an equal and opposite reaction. Out on the Internet that probably means that when forum content spammers apply force (adding content advertisements in order to enhance someone’s SEO) then forum admins must use an equal force to repel them. In this particular case, we’re talking about that captcha challenge that you keep seeing everywhere: prove that you’re not a robot.

recaptcha

Part of the problem is that the assumption here is that we are a robot and that we must prove otherwise to proceed. And I suppose that to some extent, Google is part of the underlying problem.

Search Engine Optimization

SEO is the acronym for what’s behind all this. Google, for example, can be faked into thinking that a particular website is more important than it should be.  Spammers have figured this out of course. Every day of the year, people are being paid to create fake content across the Internet’s collection of forums, blogs, websites, etc.

Behind-the-scenes, websites and forums are being visited nightly by a virtual army of Google’s webcrawlers, those robots which visit all the pages of a website and re-add them into the big indexed database which is the brain of Google, if you will.

The problem, though, is the collection of odd configuration settings and files for which most people have no knowledge. A typical website would have a /robots.txt to tell webcrawlers what files to add to the collection; the webmaster could simply not index this area of the website. A really awesome forum or blog software would know to automatically decorate the visitor-supplied links added in comments with a no-follow argument. What this means is that these spammers/advertisers would be foiled almost overnight since they wouldn’t get any value from this behavior.

But since nobody spends much time thinking about a real fix, most of us—the forum and blog users—are forced to prove our humanity on a daily basis. We are inconvenienced in many ways.

Fast Typist = Spammer

This detection method really annoys me. Back in the ’70s I typed 115 WPM on an Underwood typewriter. Now imagine how fast I type now on a computer keyboard with almost forty years’ of experience.

Underwood

Add to that, my brain works well. I can process problems and develop solutions in a hurry and will on many occasions attempt to provide assistance to others on the Internet, say, on a forum. Unfortunately, I’m often confronted with these anti-spam countermeasures which seemingly think: if you can type more than two posts in five minutes you therefore must be a robot. Seriously, I hate that one.

Denial-of-Service to Everyone

This is the reason behind today’s post. I was out there attempting to ask a question on the Sainsmart forum and after trying multiple browsers realized that I simply wasn’t going to get to ask that question. Their registration mechanism’s captcha doesn’t work. It fails over and over again since their code is wrong. It’s a denial-of-service (DoS) to everyone, robots and humans alike.

More Than One Lookup = Spammer

I tend to use the WHOIS database information a lot since I work in Information Technology. Each domain registrar (like GoDaddy) maintains a database like this of who has registered a particular domain. And yet, I’m sure there are people who create scripts to promiscuously query this information in order to build and sell marketing lists. I would urge people who maintain websites not to be so heavy-handed at robot-detection methods. (In other words, looking up two domains does not a robot make.)

Typical Customer Reaction

In my particular case with respect to Sainsmart’s forum DoS, it feels like Newton’s 2nd law of motion: the acceleration of the customer away from their forum is directly proportional to the force of rejection by their failing captcha mechanism. Okay, even for me that was stretching things a bit but I did want to add another Newton reference so there you go. Seriously, it will take a lot for me to go back to Sainsmart’s forum again. (See, that was a Newton’s 1st law of motion joke. You knew I was a geek, right?)