The HTTP Dance

As I’ve noted in previous entries, running an Internet-facing Web server (shared, dedicated, sitting in your basement, whatever…) is a great thing.

Basically, you own a JIT publishing system. Toss some blogging tools on top of this, and you can immediately put out in front of a good percentage of the world your thoughts/creations/data and get an overwhelming amount of feedback about same (via e-mail, comments, server logs and and so on).

But the JIT publishing world, as with any world, is a world of give and take.

Recently, during my data diving exercises, I found someone who was requesting – every five minutes – my blog’s Picture of the Day. Since this rotates daily, the extra 287 daily hits were just eating up my bandwidth (it was pulling the full-sized picture, not a thumbnail). It is obviously a script, as it runs exactly every five minutes, with only a an occasional difference of a second (network traffic).

And – obviously – since this picture is the only request made – the person is pulling my photos and doing something with them. There was no request for robots.txt, so it’s just a simple script on a CRON.

I couldn’t get a server name or trace to the site (was a DSL connection; could be a home or business), so I couldn’t get an address to e-mail the person (this has happened before).

So, I have to block the user.

So, I hacked out a mod_rewrite files to stick into a .htaccess file. Tested it locally, it worked fine. Allows only my site (“myServer”) to request any images; give others a default image:


RewriteEngine on

RewriteCond %{HTTP_REFERER} !^$

RewriteCond %{HTTP_REFERER} !^http://myServer/.*$ [NC]

RewriteCond %{HTTP_REFERER} !^http://myServer/.*$ [NC]

RewriteRule .*\.(gif|GIF|jpg|JPG|png|PNG)$

http://www.littleghost.com/images/incorrect.gif [R]

This – of course – bombs my actual site when I move it over there (500 error). This site runs Unix with what appears to be a highly customized variant of Apache.

I write/call/talk to Customer Care (what a contradiction in terms), and am told – after several escalations – that my code is fine, it “should work”. So why doesn’t it? They can’t and won’t say.

OK.

So they have one of these control panels for the site. So I decide to block this one user’s IP address just on the affected directory (do as little harm as possible).

The for ends up generating a .htaccess file, which looks like the following (Note: I’m masking the actual user’s IP address with “X” to protect the guilty…):


AuthType Basic

allow from all

order allow,deny

deny from XXX.XXX.XXX.30

This looks a hell of a lot different from what Customer Service said should work, but … whatever.

And I watch my logs. It works – the call is made and 403s out.

About an hour later, I check again…the script/user has changed the IP address, so I’m forced to change the “deny” to include the entire C block: “deny from XXX.XXX.XXX” (note the entire C block is unspecified, meaning “all”). I don’t like to do that – that means 1,000 IP addresses are now blocked from this directory, but…whatever.

This did work and has help up for the last couple of days. But I still wish a couple of things:

  • I had more control over my shared hosting (so I could just plop that tested mod_rewrite and be done with it). At the same time, if I had more control, there would be more upkeep.
  • I wish there was a less draconian way to fix this problem. Ideally (in this case), I could block just this single image for this user/C block, but I have to block at directory level.

Sure, I could always just ignore all this – and I’m sure I’m getting pics stolen all the time (as I’ve mentioned, this is not the first time), but it’s a way to protect myself and learn a bit about this process in case it really becomes important to protect myself/my site.

So that’s good.

But it is a tango….