As I’ve noted in previous entries, running an Internet-facing Web server (shared, dedicated, sitting in your basement, whatever…) is a great thing.
Basically, you own a JIT publishing system. Toss some blogging tools on top of this, and you can immediately put out in front of a good percentage of the world your thoughts/creations/data and get an overwhelming amount of feedback about same (via e-mail, comments, server logs and and so on).
But the JIT publishing world, as with any world, is a world of give and take.
Recently, during my data diving exercises, I found someone who was requesting – every five minutes – my blog’s Picture of the Day. Since this rotates daily, the extra 287 daily hits were just eating up my bandwidth (it was pulling the full-sized picture, not a thumbnail). It is obviously a script, as it runs exactly every five minutes, with only a an occasional difference of a second (network traffic).
And – obviously – since this picture is the only request made – the person is pulling my photos and doing something with them. There was no request for robots.txt, so it’s just a simple script on a CRON.
I couldn’t get a server name or trace to the site (was a DSL connection; could be a home or business), so I couldn’t get an address to e-mail the person (this has happened before).
So, I have to block the user.
So, I hacked out a mod_rewrite files to stick into a .htaccess file. Tested it locally, it worked fine. Allows only my site (“myServer”) to request any images; give others a default image:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://myServer/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://myServer/.*$ [NC]
RewriteRule .*\.(gif|GIF|jpg|JPG|png|PNG)$
http://www.littleghost.com/images/incorrect.gif [R]
This – of course – bombs my actual site when I move it over there (500 error). This site runs Unix with what appears to be a highly customized variant of Apache.
I write/call/talk to Customer Care (what a contradiction in terms), and am told – after several escalations – that my code is fine, it “should work”. So why doesn’t it? They can’t and won’t say.
OK.
So they have one of these control panels for the site. So I decide to block this one user’s IP address just on the affected directory (do as little harm as possible).
The for ends up generating a .htaccess file, which looks like the following (Note: I’m masking the actual user’s IP address with “X” to protect the guilty…):
AuthType Basic
allow from all
order allow,deny
deny from XXX.XXX.XXX.30
This looks a hell of a lot different from what Customer Service said should work, but … whatever.
And I watch my logs. It works – the call is made and 403s out.
About an hour later, I check again…the script/user has changed the IP address, so I’m forced to change the “deny” to include the entire C block: “deny from XXX.XXX.XXX” (note the entire C block is unspecified, meaning “all”). I don’t like to do that – that means 1,000 IP addresses are now blocked from this directory, but…whatever.
This did work and has help up for the last couple of days. But I still wish a couple of things:
- I had more control over my shared hosting (so I could just plop that tested mod_rewrite and be done with it). At the same time, if I had more control, there would be more upkeep.
- I wish there was a less draconian way to fix this problem. Ideally (in this case), I could block just this single image for this user/C block, but I have to block at directory level.
Sure, I could always just ignore all this – and I’m sure I’m getting pics stolen all the time (as I’ve mentioned, this is not the first time), but it’s a way to protect myself and learn a bit about this process in case it really becomes important to protect myself/my site.
So that’s good.
But it is a tango….