Why it doesn't matter whether censorware works or not

Publication date: 12 December 2000.
Last modified 03-Dec-2011.

Porn. Smut. Naked people.

And, just for the professional naughtiness-searchers, pron, and pr0n.

Oh, what the heck. Let's say "Japanese schoolgirls" too.

This page now won't be accessible to users of various half-baked censorware products.

Which is not news, of course. "Blacklist" Internet filtering software - whose central feature is an allegedly carefully vetted categorised list of sites which it uses to prevent people from seeing content on certain subjects - is renowned for blocking too many things.

You may be able to turn off the real-time naughty word filtering that stops you doing things like reading pages about women called Maryanne (because the string "aryan" is in the name...), but the blacklist is based on the same technology, and it's mathematically impossible for the censorware companies to have humans look at all of the entries. So, sooner or later, innocent-ish pages like this one get blocked, in one or another blacklist update.

People with more experience in the field than me have summed up the problems with censorware more eloquently than I can.

In essence, the major argument against censorware itself - as distinct from the companies that create it, and their politics - is that it cannot avoid being over-broad in what it blocks.

Censorware makers can apply all the energy they like to getting their categorisation as correct as they can and weeding out things they blocked by accident, but with well over a billion Web pages to index and well over a hundred million individual Internet hosts, it's just impossible for them not to paint the world with a darn broad brush.

Hang the politics of it, for the time being. People who live in countries where they actually have Constitutionally guaranteed freedom of speech can field that ball; I'm in Australia, where we don't. What interests me is - if censorware, generally speaking, works poorly, why do people keep buying it?

Moreover, why on earth do politicians make laws requiring it to be used, by schools and libraries?

The glib answer to this is "because people are stupid, and politicians are really stupid." Hanlon's Razor says you should never attribute to malice that which is adequately explained by stupidity; hey presto, there's your explanation. Some decisions to use censorware, analysed on a superficial and obvious level, do indeed make those making that decision look as dumb as a bag of honey-glazed doorknobs.

But people and politicians are not idiots, by and large. Far from it. Oh, shut up, you in the peanut gallery; go and watch a few episodes of Yes, Minister. There's a more devious explanation, if you ask me.

Before I explain my brilliant deductions, let me give you an example of some obviously broken censorware, paying for which would appear to suggest that all of the customers' shoes fasten, of necessity, with Velcro.

This is not the Web-browsing-limiter sort of censorware. Recently, a new kind of censorware's emerged, which as yet doesn't have a lot to do with the usual kind. This new censorware aims to tell dirty pictures from clean ones algorithmically, and block the former in order (principally) to protect companies from hostile-workplace sexual harassment litigation.

This concept's not all that new, but software that even vaguely looks like being able to live up to its sales pitch is. The first package I've seen that does have some diffuse idea about the difference between smutty and clean is Baltimore Technologies' PORNsweeper, which I reviewed some time ago, here.

In brief, PORNsweeper didn't work very well. Its false-negative rate - saying a picture is clean when it isn't - was quite good. But it got that good rate at the cost of its false-positive rate - saying a picture's dirty when it's not. Essentially, it tended to think that pictures of people, and various other pictures that don't have people in them at all, are porn.

After I put the PORNsweeper review up, I was e-mailed by a rep for UK company First 4 Internet. They had a picture categoriser, too, and apparently still do.

Unlike Baltimore Technologies, First 4 Internet had the courage to make their software publicly testable. They had a page, here, that let you test what they called "a small proportion of the functionality of the First 4 Internet Image Filtering Software". That page now just lets you submit a request to try out the software; there's another request page here, but I just tried to register using that form and received a big fat OLE error for my trouble.

I think I know why they're not making it easy to play with their product any more.

Like PORNsweeper, their system, when I tested it, was indeed quite good at detecting porn. Well, when it could load an image, it was; it only understood JPG and GIF, and that would have been fine, but it just failed to load a significant number of images. But the makers claimed that it blocked better than 95% of commercial pornography, and when it could load the picture, that seemed pretty accurate.

But, like PORNsweeper, this software also thought most pictures of people were porn. It blocked pretty much anything with skin tones in it.

Amusingly, the First 4 Internet software had two levels of porn detection - probably-porn and definitely porn. Needless to say, just like PORNsweeper, First 4's software thought that pretty much any picture of a person was pornographic. But the second, no-sir-I-don't-mean-maybe detection level with its "This image has PORNOGRAPHIC content and will not be displayed" message was a sure crowd pleaser, when it was produced in response to perfectly innocent pictures. Which it often was.

It conjured up, for me at least, the image of a sex-starved preacher perpetually trembling on the edge of a chasm of uncontrollable arousal, so preoccupied with the subject that standing in a breeze strikes him as lascivious behaviour, because it sure as heck presses his buttons.

First 4's software seemed quite good at letting through things that weren't colour photos of people, which is more than I could say for PORNsweeper. But, like PORNsweeper, the First 4 software was incapable of detecting evil in black and white images. Fortunately, there's no such thing as an offensive black and white picture, so the concerned parents can sleep safely in their beds.

If you hanker for software that blocks colour photos of people, First 4 Internet had just what you want. Heck, maybe they've improved it now, but the fact that you can't just feed it an image and see what it thinks of it any more suggests to me that they haven't. In my testing, it showed false positive rates on clean pictures of people in excess of 80%.

This is where it gets interesting.

I communicated my findings to the First 4 chap, and he said that I was welcome to try out the real, ready-for-prime-time version of their software. But he also said, leading with his chin, that the First 4 product was "more accurate" than PORNsweeper, when I'd just then told him why I didn't think it was.

I was less than totally polite to him about his assertion, and said I'd be happy to review another version of the software, if it actually bloomin' worked. He slunk away.

OK, big deal. Another half-baked attempt at porn image recognition. Not headline news.

The thing that gets me, though, is that this fellow cheerfully invited me to try the software out.

Which suggests only two possibilities.

Possibility one - he didn't check to see whether the thing he was trying to sell actually worked worth beans before he proudly presented it to a journalist who wrote a highly critical review of another such product. This conclusion could be correct, but it doesn't allow me to construct an elaborate theory of human behaviour, so I shall discount it.

Possibility two - he reckons that any publicity is good publicity. Get a review, even if the review says "absolutely as effective as an oxy-acetylene rig made out of butter", and your product name's in the minds of those who make buying decisions about things like this.

And you're set. Because those people are either twits that'll buy anything, or cynics that'll buy from anyone who's willing to keep up the pretence that the software performs the task it's made to do.

I think a combination of the dumb and/or uninformed, and the knowledgeable but cynical, have to be the market for these sorts of products.

Work with me here. I've got another crackpot theory on the burner.

Let's presume you're running a business which provides Internet access to its employees, and you're in a country where it's likely that the employees will be able to sue you if someone else manages to send them smutty e-mail, or if their workmates can download porn and set it as their desktop wallpaper, or whatever.

Now, it's just flat-out impossible to provide your employees with proper e-mail and Web access, and also make it impossible for them to access offensive content.

But the name of the game here isn't really blocking what's meant to be blocked, while letting through innocent material that your employees need in order to do their jobs. The name of the game is covering your rear.

Legislators cover their rears by proudly introducing dumb unworkable legislation that wins votes. Bosses cover their rears by installing software that works as well as anybody's censorware does, so far as can be determined through the haze of public relations.

It'd be nice if dirty picture spotters worked properly, but even if they don't, you can make company policies (or national laws, depending on who you are) requiring people to use them, and you'll look as if you're Taking A Stand and Doing All You Can.

Anybody who asks awkward questions about whether the Stand you are Taking has any chance of achieving something worthwhile can be given one of the stock answers from the War On Some Drugs Sourcebook and brushed aside. For most rule makers, being tough on porn, like being tough on crime, has no down side, electorally or commercially.

If you're a bright rule maker, you can figure this out for yourself; if you're a dim rule maker, your advisers will steer you to the same decision. If everybody with decision-making input is either possessed of unthinking religious faith in the value of the product, or is a cheerful cynic who's doing what needs to be done, your purchase and implementation experience should be a marvellously smooth one.

What people are interested in, when they're talking about, legislating about, buying or selling censorware, is not the product itself. It's the idea of the product - a thing that magically prevents people from seeing things which they don't want to, or shouldn't be allowed to, see.

We live in the age of public relations, which can Newspeak its way past any awkward facts, at least for long enough for stock options to vest or a short-memoried electorate to cast their votes. PR is marvellous gap filler, if you're building castles in the air.

Getting back to the more common flavours of censorware - back in 1997, all of these products pretty much stunk. Blacklists were grossly obviously compiled algorithmically and not subject to adequate human review, so enormous numbers of innocuous sites were unfairly blocked.

There are still qualitatively similar problems with current censorware, but the quantity of those problems seems to be much smaller. The bizarre quirks of taxonomy are fading somewhat - for instance, Secure Computing's SmartFilter seems no longer to have the peculiar "Non-essential" category, but "Worthless" appears to survive, at least for SmartFilter v2.x. It also no longer categorises as "Extreme" an awful lot of pages that just have the word "Extreme" on them somewhere.

It's still not too hard to find censorware SNAFUs, but the software is a lot closer to living up to its advertising than it used to be.

Well, OK, maybe not. Sigh. But at least they're starting to fight among themselves.

Even if the awful-categorisation problem really is fading - and maybe everybody's just gotten sick of seeing another list of fuzzy-bunny sites listed as EXTREME SEX BONDAGE SATANISM - other problems have become more prominent. Like the way some censorware marks anonymiser and translator sites as being members of every single category. See this review by the same fellow that wrote the one above, for instance; it's getting on for two years old, now, but several of the sites it points to are still listed in every category by SmartFilter. Put sites in every category, and anybody who blocks any category won't be able to see them. More recently, as another Seth Finkelstein piece points out, N2H2's BESS censorware has grown a separate semi-secret "Loop Hole" category, which does the same thing; sites in that category can't be viewed by any BESS user, ever.

This super-blocking's done to stop people accessing other banned content by going through redirector sites that are not, inherently, offensive in any way. But this is a clunky solution, when it really ought to be possible to pluck the destination URL out of the composite address of a site viewed through an anonymiser, or a translator, or other random proxy-things (I'm still mourning the death of AskJesus) for that matter (SmartFilter, by the way, categorises www.rinkworks.com/dialect/ as "Online Sales, Entertainment"). Blocking a whole translator's an easy workaround, but not a good one.

But this doesn't matter. None of it matters. Because nobody at any point in the censorware revenue chain - from the censorware companies to the end users of their software - seems to care very much. Parents who're buying censorware to stop their kids from seeing things that they shouldn't may get pretty much what they want from over-strict machine-compiled nanny-ware (well, except for things like that last URL, which is "Art/Culture" according to SmartFilter...), but the big-money clients don't much care whether the software works or not.

So the civil libertarians are spitting chips.

So what?

Heavens, they might even suggest that most of the time censorware's used, it shouldn't be, even if it works perfectly!

Big deal.

Personally, I'd definitely rather be trapped in a lift with someone who owns Gold Editions of all of Nina Hartley's movies than someone who collects Precious Moments figurines. But that just means I'm not part of the censorware target market, so who cares about me?

Civil libertarians wouldn't buy censorware either, so who cares about them, either? It takes them a lot longer to put together a well-supported, well-reasoned argument against censorware than it takes someone who disagrees to swat them out of the way with a hearty "won't someone think of the children!?"

Hence, cynicism. If people will pay you good money to stick your thumbs in your ears and dance around naked in order to make it rain, then, well, that's an employment option for you, isn't it?

It's not going to work, in the sense of actually causing precipitation. But it sure is going to work, in the sense of improving your bank balance.

And so it is with censorware, even the kind that just doesn't work.

Now do please excuse me. That fellow in the brown raincoat just went into the wonky lift, and I haven't had a good conversation all day.