Search Me Redux

According to a story that ran in the WSJ (sub required; I won’t link), Amazon’s full-text search (see preceding entry) hasn’t won over one publisher: Tim O’Reilly. (View the TechDirt article.)

This is a little surprising, because O’Reilly is usually in tune with stuff like this – hell, the O’Reilly site has lots of free online chapters of books they sell – an inducement to buy the dead-tree book, of course.

And this is pretty much Amazon’s goal is, I would think (although they probably have loftier goals, as well).

And – interestingly (to me) – O’Reilly is quoted in article as saying, “‘If [Amazon ends] up being a Google for published content…we need to think better about what publishers get out of it.”

Which is pretty much what I alluded to in my last entry.

I wonder what really went on there…it seems like something O’Reilly would be all over.

Search Me

Wow, I was just at Amazon and saw the full-text search it has going now.

Wow.

According to this C|Net article, it currently searches over 33 million pages of text.

Again, wow.

And I don’t think this is the last we will hear about this search. It sounds like a Lexis – for literature – type tool that…well, kind of encroaches upon Google’s turf (or any search engine, but Google currently is the champ).

Going to be interesting.

Picture of the Day

I’ve gotten some feedback on my Pic ‘O the Day feature that I’ve added to the left-hand column, most asking just how I did it.

The assumption is that it’s database driven; it’s not.

Since I am on Blogger, I’m pretty much stuck with a static site that’s written out by Blogger from the database they host and own.

This is bad and good:

  • Bad: I don’t have control over the templates, database and other functions like I do in most of my sites/my other development efforts. I have to funnel all my efforts through the Blogger tool.
  • Good: Since I am at Blogger’s mercy, I have to “roll my own” if I want additional functionality.

An example of such is the RSS (XML) feed that this blog has – it’s a Perl script that runs every five minutes from my home box: Grabs the index pages, parses out the necessary elements (strips HTML..) and writes out and uploads the RSS feed.

Why is this good??

Because I learn from doing this stuff. If Blogger had a built-in RSS feed option, I would definitely use it. They don’t currently have one, so I set one up myself from scratch.

Is it elegant? Nah.

Does it work. Yep.

That’s good.

OK – back to the subject at hand: Picture of the day.

Again, this is a Perl script that I run from home (my host doesn’t allow CRON access…another obstacle!).

Basically, it uses the Net::Telnet package. Using this package, I – though the Perl script – perform the following tasks:

  • “Telnet” to my server and log in
  • Get listing of all JPGs in the full-sized image directory
  • Select one of these images at random
  • Copy the thumbnail and full-sized image that is today’s random picture to “random.jpg” in each directory (full-sized and thumbnail).

That’s it – about 20 lines of code, reproduced below:


#! /usr/bin/perl

$myServer = "[server name]";
$myUsername = "[username]";
$myPassword = "[password]";
$imagesFull = "[path to full images]";

# create telnet object
my ($t);
use Net::Telnet ();
$t = new Net::Telnet;
$t->open($myServer);

## Wait for first prompt and "hit return".
$t->waitfor('/User\@domain:.*$/');
$t->print($myUsername);

## Wait for second prompt and respond with password.
$t->waitfor('/Password.*$/');
$t->print($myPassword);
$t->waitfor('/vde.*$/');

## Read the images in the full-sized image directory, one per line
$t->cmd("cd $imagesFull");
@remote = $t->cmd("ls -1 *.jpg");
pop(@remote); # remove last element; the shell cursor

# get random pic
srand;
$random = $remote[rand @remote];
chomp($random); # remove line feed from STDIN

## copy this file to the "random.jpg" file in the full and thumb dirs
$t->cmd("cp $random random.jpg"); #FULL pic
$t->cmd("cp ../thumb/$random ../thumb/random.jpg"); #THUMB pic

exit;

That’s the hard part: Then I just set a Cron job on my local machine and it fires at the interval I want. (I haven’t firmed this up, but I’ll probably stick to once a day.)

I had originally set this process up with the Net::FTP module (because I had done work with this module before), but this didn’t make a lot of sense – I could easily pull back the directory listing, but FTP doesn’t support remote system copy operations (delete only).

So I initially had a script – that worked fine with using Net::FTP – but that meant I had to find the random image (no biggie), but then I had to download the day’s pic and upload it again with the new name (random.jpg).

For both the full-sized pic and the thumbnail.

Doesn’t make a lot of sense to do four file transfers – and the full-sized images can/could be quite large – when two telnet commands (“copy [pic o day] random.jpg” for each – full and thumbnail – image) will do the same thing!

I knew there had to be a better way.

And I finally (thank god for the Internet & Google!) found the Net::Telnet module.

Installed it from CPAN, and got it up and functioning inside of an hour. I was able to copy a lot of the Net::FTP code (find random image…) right into this new script, and all was well.

One thing I did have to mess around with was the login part – this is not as seamless as the Net::FTP module (though I’m probably missing something).

The telnet script, much more than the FTP script, requires one to have actually done the scripted processes via the command line. Little differences crop up.

For example, the FTP script pulled back – just using “ls [image directory]” all the images into a straight-into-an array manner.

With the telnet script, I had to do a “ls -1 [image directory]” (to get single column listing) and returned all elements with a line feed following (like STDIN). So I had to chomp the selected image to remove this.

In addition, the directory listing – at least on my host – returned, as the last element – the shell cursor (i.e. “[$bash 2.0.1]#” or what have you). So I had to use the pop() command to remove the last element.

I’m not complaining, but it does seem as though the Net::Telnet module is not as generic as the Net:FPT module – or maybe it’s that telnet is not as generic as FTP.

Whatever. It’s done. Little bit of work (a couple of times) and I learned a bunch.

That’s the bonus of Blogger – you’re forced to learn to advance.

I’m cool with that.

CSS Hell

Don’t get me wrong, I’m a big fan and big supporter of CSS, but sometimes it just seems impossible to get things going the way you want.

As indicated in a earlier entry, I’ve been messing with ImageMagick and adding some pictures to this blog. As part of this process, I did a slight redesign of the left-hand column…and all hell broke loose.

I don’t think it’s CSS’s fault – it’s mainly a problem with implementation: I can get it to work perfectly in IE with one set of code, perfectly in Netscrape with a similar – but different – set of code.

And – to be honest – when it comes to positioning and all that, I’ve a lot of experience but I’m not certain which set of code is the W3C compliant code for what I’m trying to do.

If either set is.

Very frustrating.

Dave Winer writes frequently on this subject, and while he is a little too negative for my taste on CSS, I think he has a point.

Usually, he’ll be going through something like I went through yesterday and it just won’t work (across browsers). While someone will usually take the challange and produce the code to make what is spozed to happen happen, it’s not intuitive or sensical. There are often lots of hacks necessary.

This is not good, and it’s frustrating to Dave because he is a programmer. For programmers, while there may be many ways to do the same thing – generate a random number, for example – and many tools in which to do so (PHP, Perl, Python, C, C++…..), at the end of the day you’ll have a routine that will generate a pretty good random number.

With CSS, it’s almost a crapshoot.

For fonts and such, it rocks and is very stable. The sizing issues, and other browser-specific differences are still evident, but this is more of a variance of appearance issue, not a completely different appearance issue, the way positioning is plagued.

But CSS support is getting better in browsers, but that’s still frustrating: While CSS is now widely supported, not all of CSS is widely supported, and in a consistent manner.

While things have improved, we’re still mired – to some degree – in the Browser Wars. Except now the war is not over installed base (IE won dudes; get over it), but CSS support.

One step at a time…

Comcast Blues

Maybe it’s just me, but an Internet provider that does not support pings or traceroute is problematic.

I’ve had my Comcast cable account for over three years – it’s been though three or four owners (I think I first got it with Excite@home before they did their incredible we-won’t-sell belly-up), and – for the most part – I’ve been very happy with it. It’s quite zippy (though they have capped the upload speed, which is a handicap for a developer like me), and the outages have been very infrequent and – with one exception (over 24 hours) – pretty short.

I’ve generally been satisfied; I’ve recommended it to many other friends/associated.

But – for whatever reason – my ability to either ping or traceroute past the Comcast gateway has evaporated over the past couple of months.

This isn’t just a brief outage, the ports or protocol (ICMP) appears to be blocking pings and traceroutes (outgoing). I’m talking all the time for the past 2+ months.

And the most frustrating part of all this is the customer support – it really is horrible. I have opened four tickets on this single issue, three have been closed with an “it must be your system” statement with no change in my inability to use these base Internet tools.

Hmm…it’s my system??

  • OK, I do have a home network. So let’s plug the cable directly into any of the four boxes I have in the office. Even after the recommened recycling of box and modem, same lack of functionality.
  • Comcast has said my OS is the problem, and I should call Microsoft. Uh…
    • Which of the three OSes that I running is the problem (Win2000, WinNT, Linux 7.3)? All three?
    • So I should ask MS about the issues I’m having with my Linux box?
    • When I use dial up on each of the machines (to a different ISP), all is well?
    • If it’s my machine, how come I can ping or traceroute to the Comcast gateway (not in my house/on my property)…but it dies there? That means it’s outside my equipement, ja?

  • In response to direct questions, Comcast is unwilling to say yes|no they do|don’t support ping/traceroute.
  • In Comcast’s help forums, Comcast techs have posted messages asking for users to post their trace routes so they can see how an upgrade went…uh, I can’t do that….
  • Comcast has consistently insisted that they are not blocking any ports. OK. Then explain what’s happening here.

But I vent.

Just had to.

The sad part is that Comcast has been, overall, very good.

But this is a big negative, and the customer support has been downright rude and – thus far – unable to even acknowledge that this is an issue (again, they won’t say is isn’t a problem, either), much less give me some resolution.

Oh well…let’s hope that fourth ticket will get the job done. Sure, that’s the ticket!

The Magick (sic) Continues

My last entry talked about how I was finally getting around to learning ImageMagick so I could automate some image processing.

The madness continues!

A little bit of history is probably in order:

  • My first career was as a photographer; I did it for approximately a decade. I still love photography, and – trust me – I have thousands of pictures laying around the house. You have been warned.
  • My second career was as a writer/journalist, and during that period, I was always the “geek” of the writing staff (or part of that sad-sack club). I did production work – desktop publishing (QuarkXpress, Photoshop, Illustrator and so on) and saw the promise of digital photography before it really happened.
  • Current career – computer dork – dovetails nicely with the first two when it comes to graphics and such. Which is why the madness continues!

So ImageMagick has been for my Inner Geek and my Inner Artist (actually, my Inner Geek is more of an outie…).

I’ve fired up the old scanner and have spent some time with it today. It’s been fun.

Next steps: Batch processing with error handling; gallery construction.

The first step is just coding – I’ll get it to do what I want, I’m certain of that.

The second step is more … uh, interesting.

Because I’m using Blogger, and so they own the database.

I’m going to have to figure a system that will publish galleries locally and then push them to this site.

Again, doable.

But … in what way? There are a million (ok, more than two) ways to do this, so which path do I take?

That’s the frustrating, rip up the code/rip out the ethernet cable part.

Also the fun part.

Again, my Inner Geek is showing…

ImageMagick IS (Magic)

The more I look at what I’m been doing over the past year in terms of educating myself (probably my biggest hobby, and I do not mean that facetiously), the more I see myself – in the computer arena – drifting from learning new languages and such to learning/creating more tools.

By tools, I mean code or applications (such as Erwin) that helps take the drudgery – and time-consuming efforts – out of basic operations and leave one with more time to do the fun/learning stuff.

An example of a tool would be installation (and training) of a spam filter. I finally found one I liked for Outlook (yes, I’m asking for it…) and installed it.

  • Before: Getting 100-200 spams a day, deleting one by one (with big batches at once after breaks/sleep)
  • After: Browse the list of spams once or so a day, delete all. (I’ve only gotten one false positive so far! Amazing! [SpamBayes]

Same task – check spam – but now so much quicker.

Other Lee’s-True-Life-Examples:

  • Regular Expressions: I’ve used them before – beginning in Perl – but now I’m getting both better at them and learning the nuances of using them in PHP and ColdFusion. A long time ago (not!), I wrote the cruftiest e-mail address validator imaginable, with loops and so on; I’ve now compressed it to a single regex. That’s progress!
  • Scripts/Automation: Using Windoze batch files, SQL Server Scheduler and shell scripts (bash) on Linux, I now move around a hell of a lot of data every day. I not only do backups with these tools, but I post to my Web sites with database/content updates at periodic intervals. So, I can work locally and stuff just happens as I wish. Sure, took time to set up and code: But now it’s done; outside of an occasional check to make sure stuff is happening as I wished, no more time wasted (for example) exporting a SQL Server script that joins several tables to a flat file that is then FTP’d to one of my Web sites. It happens automagically.
  • Parsers: While parsers are usually a one-off affair (you write one for a specific, unique use), once written, it’s done! Just run when you need update, or set a crontab entry and you don’t even have to do that. And while parsers are basically unique, once you write a few, they get pretty similar in flavor (grab this file[s], extract such-and-such info, push this data to that). Another tool in the scabbard

Over the past week or so, I’ve finally (finally!) dug into the image-processing tool, ImageMagick.

I first ran across ImageMagic at cars.com; it was used there to batch-process vehicle images from vendors. I ran into some problem there, I cannot even remember what, and looking at the command line help (Solaris), I grabbed and e-mail address … and the woman there wrote back before the end of the day. Try that with Adobe!

I ran across it several other times over the year, but never installed it on a home machine and actually messed with it.

As mentioned above, that’s changed.

And ImageMagick is magic.

What a great tool.

I ran across an article about ImageMagick on the IBM site; it finally piqued my interest enough to download and play with.

A week later, I’m still playing with it.

I installed in on my Linux box, and – after some gnashing of teeth (hint: the php.ini file comes with “file uploads = no” by default) – got it working, and working well.

While ImageMagick is a command-line tool (which is what appeals to me – I like Adobe’s ImageReady with its droplet tool, but I like the command line), I integrated it into a PHP page so I could use it to easily batch process images.

A simple example?

Here’s a large image (scanned in from a slide) that is reduced – in the one upload – to a small (thumbnail) and full-sized image:

Full-size image:
Thumbnail image:

Hey, that’s good stuff. One upload, and the backend processes resize to my specs and puts them in the correct folders. The next step, of course, are various error-trapping mechanisms and so on, but – once I get this going here – it will make my life so much easier.

I’ll wonder how I got by without it.

Again, tools…

Did We Just Get Smarter???

In reference to the previous entry (of SunnComm threatening to sue a student for showing the company’s anti-piracy measures could be defeated by using the SHIFT key):

Just posted on C|Net, the company has told news.com that it will not sue the student.

The company said it will release a statement later today with more details, but it sounds like they backed down in the face of all the (no-so-good) publicity they were getting for the threat of litigation.

And really, how stupid to you have to be to sue someone for showing that your so-called security measures – which probably cost a pretty penny – can be defeated by a keyboard and an IQ of 10? Why not just keep it quiet and try to fix it?

Also, there was probably some pressure from other parties (RIAA??) to not pursue this – A case like this, a challenge to the DMCA over a SHIFT key, had a good chance of going against the DMCA, which would be a bad precedent for the RIAA and other such organizations.

OK, one stoopidity down; a zillion to go…

Amerika Just Got More Stoopider

OK, so there’s this new anti-piracy CD out. The anti-piracy measures were encoded into Anthony Hamilton’s “Comin’ From Where I’m From,” which was released last month (no, I’ve no idea who Anthony Hamilton is….).

On Tuesday (Oct. 7), I saw a news.com article detailing how a Princeton student had figured out how to disable this anti-piracy messure using the hi-tech method of … holding down the SHIFT key while loading the CD (disables autorun, so the anti-piracry program cannot run).

I laughed when I read it, but I thought to myself, “Someone is going to sue this guy for this…”

Yesterday, on Kuro5hin I read a (humor) story about how the Bertelsmann Group (parent of whatever company that released said CD) was suing the student based on DMCA violations.

Judging from the comments on the story, a lot of people were at least initially sucked in because, hey, it could happen. Look at all the other silly (to me) suits filed because of alleged DMCA violations and how crazy the RIAA is about file swapping/pirating and so on.

Sure, it’s a stretch – the SHIFT key, fer god’s sake – but it could happen.

*SIGH*

Guess what?

That’s right.

It happened.

The company that developed the anti-piracy measure – SunnComm Technologies Inc. – is planning on suing the student for potentially both DMCA violations, and for damaging the reputation of the company (its stock has dropped as a result of the disclosure).

So potentially both civil (reputation damages) and crimminal (DMCA violations) suits.

Super.

First of all, this is moronic, and a new low in hi-tech lawsuits.

Secondly, I love that SunnComm may sue for damage to their reputation. Here is a quote from a Reuter’s article:

“SunnComm believes that by making erroneous assumptions in putting together his critical review of the MediaMax CD-3 technology, Halderman came to false conclusions concerning the robustness and efficacy of SunnComm’s MediaMax technology.”

— SunnComm statement

How can a system be robust and efficacious if it can be disabled by that considerably low-tech circumvention of holding down a SHIFT key???

Are these guys morons?

(Hint: Yes! – see next graph!)

In the original Cnet story on the student’s findings, here was the record company’s (BMG) spokesman’s response to the finding:

“This is something we were aware of,” BMG spokesman Nathaniel Brown said. “Copy management is intended as a speed bump, intended to thwart the casual listener from mass burning and uploading. We made a conscious decision to err on the side of playability and flexibility.”

— BMG spokesman Nathaniel Brown

So, they knew about it – and admitted such to the press – yet are still are considering multiple suits against the evil-shift-key-finder.

So, they knew about it, yet still tout their system as robust.

Uh, can you say, “We f***ed up?”

And a couple of other random notes about this issue:

  • SunnComm says they knew about the SHIFT key issue, and their protection was a speed bump of sorts, to deter the casual pirate. OK…which means that the REAL pirates – the ones the RIAA etc. are so worried about – are the ones that SunnComm expects to be able to figure out the circumvention method? Shouldn’t they be more worried about these bad guy pirates more than the casual user?
  • And once ANY CD is compromised, it’s ripped and then it’s in the wild. And then anyone can get to it, so all the copy protection in the world is too late, correct? (Closing the barn door after the horses have fled or whatever the phrase is…)
  • While probably covered by some stupid EULA, isn’t there an issue with loading a program – not just the music – on a computer? Sounds like spyware. And I wonder if this program works on all platforms? And what possible damage this program could do (including simple privacy invasion potential…)

Sounds like SunnComm is trying to spin this issue to make themselves look less idiotic.

Guess what?

Not working…