Data Diving

Coming from a background that includes print publications (magazines), the ability to gather data about a Web site’s use is so refreshing.

Back in the print days, an effort was often made to find out just what the readers really wanted, as well as give a balance of what the readers wanted vs. what they needed.

However, this was usually an anecdotal method – talking with readers at trade shows or on the phone (for whatever reason) and trying to get some sense of what’s working and what isn’t. Gives one an idea, but not exactly science.

Even the more empirical efforts – readers’ surveys – suffered from not one but two Achilles’ Heels:

  • Small sample size, and
  • People lie on any survey – intentionally or not. Get over it.

So, at best, we were shooting in the semi-dark over what readers wanted/needed. Which is frustrating.

With the Web, however, the reality is often the inverse: Via user comments, e-mails and – especially – server logs, there is a mass of data that clearly states what the user actually is seeking out. The trick, of course, is to cut through this mass of data to identify the nuggets.

This comes to mind because of the Top 10 lists I’ve added to the site. The referers are particularity fascinating:

  • First off, the Top 10 area gets more referrer traffic on my site than any area besides the blog (after the blog proper, the Gallery section gets the most overall traffic).
  • Big surprise: The top referrer – by a wide margin – is Google. Duh.
  • The most frequently searched-for item that users hit my Top 10 list for is the author Robert Coover, whose Pricksongs and Descants is listed in the Short Story list. It’s most frequently a search for Coover and/or his story The Babysitter. I don’t know if it’s because not many people know Coover – so there are only a handful of places to find him compared to, say, John Updike, or if there is a greater interest in Coover than I’m aware of. Interesting.

Data diving. It’s not just for breakfast anymore.

The Shortest and Longest Month of the Year

February – even without this year’s Leap Day – is always the longest month of the year, at least in these latitudes.

Past the looking for a White Chirstmas and sloughing through an inevitability of a cold, snowy January, February leaves us looking toward Spring.

Which just ain’t coming.

That said, here are some things to do to help pass the time:

  • Watch Groundhog Day – You don’t think that Bill Murray can act? Watch this and enjoy. And you think your job is boring and repetitious?
  • Wonder just what the heck SCO is talking about – In light of the appearance of/threat of the myDoom virus, SCO (note the new URL) has been giving conflicting stories (here, here and here) So what really is the story over there??
  • Get over it – Spring is well from sprung. Do all those indoor chores you shun when it’s too hot or too nice outside to stay indoors.

Linux Faithful Still Getting It Wrong

As the continuing saga of the myDoom virus/worm unfolds, the Linux faithful are sticking with their kneejerk reaction that this cannot be the work of an OSS user/sympathizer. (See preceding blog entry for more detail.)

The lastest salvo? LinuxWorld – which should know better – has reprinted an article from the Moscow Times that says the virus has been traced back to a Russian ISP.

OK.

But the article is reprinted with this article summary (provided by LinuxWorld):

In a story that would completely exonerate the Linux community, accused by SCO of perhaps being behind this week’s e-mail virus, the Moscow Times is carrying a story this morning that the first e-mails infected with MyDoom back to addresses with Russian Internet providers.
MyDoom Comes From Russia With Hate, Moscow Times Confirms, Jan. 30,2004

How does this exonerate the Linux community?

Let’s look at the facts as they exists:

  • The virus appears to have originated from a Russian ISP.
  • The virus has many functions; it appears to be predominately coded to install trojan spam engines. It also targets (depending on variant) either SCO or Microsoft for DDOS attacks. It seems to at least try to install a keylogger, as well.
  • It targets Windows boxes only.

OK. So how does this Russian connection/spam-zombie reality exonerate the Linux community?

Let’s look at some other generalities that may come into play with this situation:

  • Most viruses target Windows, for two reasons: 1) Largest installed base, so best bang for your viral buck, and 2) As the Top Dog in software, MS is a target. Would be the same if Sun or Novell or Apple were software king. So myDoom targetting Windows is nothing exceptional.
  • Why would a virus target SCO for a DDOS? Really, the primary reason would be a grudge against SCO’s anti-Linux lawsuits. There could be some personal reason that the virus writer targeted SCO (old girlfriend works there…), but this one makes the most sense as a rule of thumb. So the writer is probably an OSS sympathizer.
  • The virus written for Windows does not mean that the virus writer likes MS apps; it means the opposite. Yet I’ve seen many comments that “Linux coders would never write for Windows….” Well, not for profit, maybe, but for destruction?? And would Windows coders – black hats or not – really care much about SCO? Why would a VB/C++ coder care about all this Linux/OSS stuff? If they were rabid MS fans, they’d probably welcome the SCO actions, not attack SCO.
  • The virus originated from a Russian ISP. Some facts:
    • The writer could be Russian or not.
    • The writer could live in Russia or not.
    • The writer appears skilled; this virus could have been written anywhere just hit the Internet via this Russian ISP (spoofed or real).
    • Russians are part of the Linux community – which is a global community
    • Russians – especially Russian OSS fans – are likely to be just as annoyed at SCO’s efforts against Linux as I am (I’m in the US).

So – again – how does the knowledge that the virus seems to be a spam bot and coming from Russia exonerate the Linux community?

The fact that the virus targets SCO – again, why??? – means that a skilled programmer wrote a spam bot that has an easter egg that nails SCO. Just for kicks.

Why SCO? Why not Amazon, Excite, some other higher-profile site? Because there is a grudge of sorts against SCO.

All the unfolding information appears to tell us is that this virus’ primary intent is not to thumb its nose at SCO.

That’s just gravy.

And it still points, sadly, to a Linux sympathizer behind the code. This does not mean the community condones such acts – for the most part, they deplore this and other similar acts – but it does mean that there is at least one OSS fan out there that has an active agenda against SCO.

Linux Faithful Get It Wrong

As you probably know by now, the nasty myDoom (or pick your pseudonym) virus is the virus that has a payload that, among other things, attempts to DDOS the SCO Group’s web site.

When this payload was first discovered, SCO and others said this was probably a disgruntled Linux person who was targeting SCO as payback for the litigation-happy company’s anti-Linux lawsuits. SCO even offered a hefty bounty to get the author of the virus.

Recently, MessageLabs has announced the virulent code probably originated in Russia.

OK.

But – for reasons that escape me – there seems to be a lot of Linux folks out there who are saying that SCO and others owes the open source community an apology. Why? As Pamela Jones, webmistress of Groklaw, put it, here’s why:

MessageLabs has announced that the MyDoom virus originated in Russia. That pretty much rules out any Linux enthusiast trying to get back at SCO, as far as I can see. Nobody in Russia cares about a legal case in the US that won’t affect them one bit. It looks like spammers and worse trying to shift the blame to cover the other ugly things this virus does, because it tries to install a keylogger to get your credit card and other such details, according to Symantec, something no Linux person has ever been involved in to the best of my knowledge….It appears somebody needs to apologize to somebody for leaping to ugly conclusions about the Linux community. [emphasis added]

— Pamela Jones on Groklaw, 01/24/2004

Slashdot – with opinions all over the place every day – had a similar thread.

I don’t get this – while the virus writer may be trying to better obfuscate his tracks by giving hints that this is just an anti-SCO virus, why does the writing living in Russia rule out an OSS person doing the dirty work?

  • Just because the virus began in Russia, does in mean it was written in Russia? Was it written by a Russian?
  • The SCO case affects everyone who uses/loves/wants to defend Linux. And these folks are all over the world. Hell, it was started by a Finn (Linus…).
  • While it’s true that SCO’s (many) lawsuits focus on US companies/users, there is no reason to expect it to end there. SCO has indicated it may go overseas to sue, as well. No on is safe.
  • All lawsuits are not equal. If SCO somehow managed to win this one against IBM, for example, the precedent set would ripple around the globe. And it would directly damage one of the largest tech companies in the world
  • Frankly, the SCO suit is more of a nuisance (or tragicomedy) for people like me – who have home boxes running Linux but no business plans that are dependent on it as it currently exists (OSS). Ditto for the folks in Russia/Denmark/Brazil and so on. Yet I’m still steamed at SCO – so might a (more talented) programmer in this or any other country. Again, Linux is global. Any attack – in whatever country – on Linux manifests itself in some manner as a global attack. That’s the reality; get over it. So a lone OSS dude in Russia could well be holding a grudge against SCO. Why is that so unthinkable?

Let’s keep hoping that it wasn’t an OSS fan that did this, but the virus originating in Russia does little to in any way prove that the author was or wasn’t just holding a grudge against SCO for it’s anti-Linux tactics.

You just can’t say.

So there is nothing to apologize for. Just as there is no reason for people to claim that the virus was the work of a disgruntled OSS developer.

Overall, I’m very disappointed in the OSS reaction to this latest, Mother Russia, development of the myDoom virus. I thought we were bigger than that.

*sigh* I guess Linux is growing up…

The biggest disappointment, to me, is Groklaw’s jumping on (helping create?) this bandwagon. This is a site that is the SCO anti-FUD. It’s dedicated to – and has done an exceptional job of – cutting though the SCO/MS FUD and giving the fact and gathering information in a practiced, deliberate manner.

Just as SCO saying that they have identified infringing code in Linux, figuring that the virus originated in Russia proves nothing.

Send Lawyers, Guns and Money…

I’ll bet the scientists and technologists slaving away in the early days of DARPA and at CERN never realized the bonanza this new-fangled Internet thingee would bring: Yes, lawsuits.

Now, I understand that – after football and overeating – litigation is The American Pastime. Yet it seems that, even for Americans, the tech industry is lawsuit happy.

Maybe it’s just because I’m closer to this industry than others, but I doubt it: This is my third career, and my previous one was as a journalist for trade publications, where each new job required learning a new industry. And – as a journalist – litigation was, of course, something compelling to write about.

Yet I don’t recall the flurry of legal action like I see every day in the tech industry. What the hell is going on? Is it just that the concept of bits vs. atoms (digital delivery vs. packaged products) is disrupting established industries, is it the way the patent department hands out software patents – unheard of only a couple of decades ago – like they were spitting out candy from a Pez dispenser, or what?

Look at some of the high/low lights of recent days/months:

  • Everyone is suing Microsoft, including the distasteful Eolas case.
  • SCO is suing everyone – they’ve abandoned their mission as a software/services company and turned into a full-time litigation factory. On the flip side, others have sued (RedHat) or are considering litigation against SCO.
  • The RIAA continues to file lawsuits againt file sharers, this time as “John Doe” cases.

And somehow I think it’s going to get worse before it gets better. I’m not a fan of this.

Why Linux Matters

READING:
Plainsong
Kent Haruf

Haruf’s book was nominated for a National Book award, but the book – while well done and an interesting read – doesn’t do anything special for me. Weaving together the disparate lives of a half-dozen or so inhabitants of a small agrarian Colorado community, the story never meshes enough to make it compelling.

All books

I devoted an earlier entry to Why Microsoft Stil Matters; this is the flip side I had planned on writing immediately afterwards, but just didn’t get to. Linus won’t mind the delay.

By Linux, I mean the product of Linux Torvalds and his minions. Much of what I say can also apply to Unix products – actual Unix products (AIX, Solaris) or, like Linux, non-certified Unix-like products, which includes MS-DOS (!). The emphasis on on Linux itself, but the generalizations are of Unix-like products, which I will refer to as Unix for the sake of simplicity.

Also, this list will, in many ways, compare/contrast Linux to Microsoft. This is only natural, as MS is the 800-pound gorilla, the yardstick against which one must compare other similar products or processes. But this does not means that it’s Linux or MS – as Linux has taught us, something can come out of nowhere and give any given gorilla a run for its money (figuratively and literally).

Without any further ado, some reasons that Linux matters:

  • Software is becoming a commodity: Microsoft has seen this coming, which is one of the reasons it attempts to lock users in with proprietary tools/code. However, like the file-sharing issues that are keeping record-company execs and movie moguls up at night, the reality is that there is a shift away from centralization to ubiquity. This has profound influences on Microsoft’s model, for example, and favors the work of Linux, which is so strongly decentralized. At the same time, this commodity nature means that software must be portable, so it can be embedded in the next widgit that comes along. Linux is perfect for this; MS’s offerings are not.
  • Linux has grown up: We’re finally at the stage where even Linus says that this year – 2004 – Linux will start to become desktop ready. Additionally, thanks to the support of companies such as IBM and Novell, Linux has moved from old 386 boxes to multi-processor supercomputers. The support of the traditional companies, such as IBM and HP, also means that it is easier to get Linux into the workplace. Before, the geeks would just sneak it in under the radar: The company intranet was run as a LAMP deployment, but the CEO would swear the company was an all MS house.
  • Linux is transparent: You can see and modify the code. Thus, you – a company/devloper – can work with the OS’s hooks to create new apps/tools. With Apple or MS, there is a complex series of contracts, cross-patent licensing and other non-computer work that all but guarantees a lack of product launch.
  • Unix is scriptable; MS is not: This is potentially my biggest gripe with MS products: While a limited amount of scripting is possible (batch files, the scheduler and so on), it does not have the robust scripting cababilities of Unix. In an age where computers have come out of the clean rooms and are on every desk doing every imaginable task, a strong toolkit is a time-saver. While MS makes great tools and allows – through easy-to-use GUI tools and wizards – some control, Unix scripting tools (crontab, tar, piping) allow a user with a little bit of experience to automate menial tasks. The best example I can think of is a simple back-up: Ask the average user how to do this on Window. Huh?? On Linux, a small script with tar and zip and move it to a backup directory/machine. Set the script via the crontab and never worry about it again. Also, MS’s scheduling is strange: I have several tasks on SQL Server running every day. Yet they don’t show up in the master scheduler; you have to know that the task is scheduled (and detailed/editable) in SQL Server. Ouch.
  • The SCO Fiasco: While there is an obvious strong negative to the SCO Group suing everyone and their mothers over Linux (follow the almost daily lunacy on Groklaw) – it makes the wary shy away from Linux – there are a few strong positives to come out of this (ongoing) mess:

    • It’s pulled the software community together. With the exception of Sun, MS and – to a degree – HP, everyone is pulling together on the side of Linux. And much is done without the often-counterproductive Slashdot-type efforts/remarks.
    • No publicity is bad publicity. Witness the surge in SCO’s stock even today over a year ago’s levels, even as SCO is getting discredited. On the other side of the public courtroom, Linux is getting plastered into every news story. Never heard about/know much about Linux before? Now you do, and – guess what? – you’re getting interested in what it can do…
    • SCO would not sue to win … nothing. Much like MS beginning to attack Linux, the SCO Group’s lawsuit legitimizes Linux to a degree: If it wasn’t of any value, why sue? But there is high perceived value. That helps Linux in that respect.

  • Linux is extensible: This is a reinforcement of some earlier points, but bears the emphasis. Because Linux is open, it can be easily extended. This means that more products will be made with/for it, and the base product improved in unexpected ways. While this lower barrier to entry will mean that a lot of dross will be created (witness the abandoned projects at Sourceforge.net), it will also open door that – in a MS-centric world – would not otherwise happen. In a MS-centric world, projects have to be approved by committee, with extensive research and so on, so the products are a lock to be winners: Like MS Bob and Clippy…
  • Linux is not designed to make money: Yeah, this one drives the Microsofties crazy. But I see Linux much like the Internet: It’s a tool to do other stuff. Imagine if the Internet had been heavily regulated, taxed and so on. We’d never have Amazon, online banking and reservations, blogs and more. Linux works the same way – it’s an open system that allows you to leverage it in the way you want, which just may be for-profit tool/service (think Google and their thousands of Linux boxes..).

Sure, this is an incomplete list – all such lists are. I didn’t even mention how – now that more and more tasks are getting computerized (and potentially exposed to the Internet), security is a real issue, as is stability. Linux has them; MS doesn’t.

And so on.

Pick your tool; make the most of it. Both Linux and MS matter; however, the balance of power is shifting toward – toward, not to – Linux. I don’t see this changing in the near future; I actually expect this trend to accelerate.

Technology Predictor Success Matrix

I wrote several days ago about a great series Tim Bray has going on his blog, which builds/evaluates a Technology Predictor Success Matrix.

Please read the series – it’s worth it – but it’s worth noting that the predictor that seemed to hold up best for a variety of technologies was the 80/20 Point: This is the point where you’re enjoying (roughly) 80 percent of the benefits after only 20 percent of the work.

While an imperfect indicator, it was the strongest of all Bray examined.

Thoughts?

More List Warm Fuzzies

I’ve spent a good part of the last couple of days creating the code and populating the List of Lists pages.

It’s been fun, and a learning experience.

The last code enhancement is to dynamically create a pulldown (onChange) menu that will take one to whatever page is selected – remember, these are static pages that are written out. Pretty cool, and – really – not that hard to do.

It’s been fun to fill out the lists (this will always be a work in process); I’d forgotten some of the books that I pulled off the shelf to garner info for; damn, I need more time to read!

People Love Lists

I’ve added a new feature to this blog – a List of Lists.

Basically, it’s a list of movies, books and so on that I find good (or bad, as the case may be).

What’s the point?

Well, there is no point. Just something I was curious to do. Lists such as this are telling: A few items in a handful of categories and you can get a pretty good idea of an individual.

  • Conservative vs. Liberal – Likes Sontag or the movie The War Room, probably liberal. Rates Michael Moore’s books/movies/TV as overrated crap – probably conservative.
  • Intellectual vs. Not – Lots of foreign movies? Loves NPR? Probably an intellectual (Note: Intellectual != intelligence; it can equal pretentious). Dude, Where’s My Car? or anything with Chris Farley on favorite movie list? Probably not an intellectual.
  • Techie vs. Not – Has a Computer Book or Web Sites category? Techie. Lists “the Internet/Web/e-mail” as overrated or annoying. Probably not.

This list is a work in progress; items/categories are not carved in stone (in bits, ya know?).

Another tool – PHP/mySQL back end, which pushes static pages to the front end (this site) – developed.