The Semantic Web – Is This Progress?

First off, a disclaimer: This blog entry is not really about the uber-Semantic Web, just a small portion thereof.

For one, I don’t fully understand it – the whole concept.

For another, I don’t care to discuss it today (even if I did fully understand it).

Understand?

Actually, I read a very thought-provoking blog entry by Gina about ridding your URLs of IDs and extensions etc.

In other words, making them more like English (or whatever language), and less like geekSpeak (spoken at many places, exspecially /.) A step toward the semantic Web.

Examples (paraphrasing Gina’s examples):

  • Bad: http://www.somesite.com/blog/index.php?entryID=123
  • Bad: http://www.somesite.com/blog/123/
  • Good: http://www.somesite.com/blog/why_I_blog/

Long story short, she didn’t like to either give away her file structure or bother people with IDs (123 is an identity column value), and she wanted to give a meaningful URL – “why_I_blog” is more meaningful than “123” (agreed!).

As someone who has built his own blogging tool (implemented locally, only), this was a great article.

I didn’t put a whole lot of thought into the way items were stored (or, more correctly, displayed) when I built the tool (building it was too much fun; honest…), but I have given it a whole lot of thought since then.

As in, a lot (plus tax…)

So, this article hit a nerve.

But I still don’t agree.

Here are some of Gina’s arguments, and my responses:

  • “…[?php.id=123 type URLs are] not good because it includes a file extension (.php) which is sure to change when I port my site to Python CGI or Java Server Pages or flat files…” This is a valid concern, but I don’t put too much stock in this. If you can’t do a code sweep to find all links (twixt anchor open/close tags) and replace, say “.php” with “.jsp”, well Houston, you’ve got a problem. (10/07 update: One issue that was NOT mentioned was that extensions will break bookmarks (index.php is now index.jsp; if pages were both “index” [sans extensions], bookmarks would work. Excellent argument for such, but Gina didn’t explicitly mention it. And I was too stoopid to realize this…)
  • Expanding on the above: The above issue sort of assumes that the same code will somehow be directly ported from your old ASP site to your new PHP site. Nope. At least, that’s been my experience. Usually, the issue is with the database, pulling data from there. That is the part that should be rock-solid.
  • “File extensions expose technical details of the site’s inner workings” As in, hmm, .php3, when are they going to upgrade? and so on. I have several responses:

    • So what? So people can see that you run on ASP vs. PHP. Big whoop. The dorks (us) will always be able to figure this out, so why hide it?
    • Non-dorks don’t even pay attention to URL structure (past the www.blah.com). File extension? What’s that? Oh, it has to be an HTML page to display! (PHP? No, that’s not HTML…). This is mixing up content types and file extensions. Two different things.
    • The method Gina recommends to hide the extension (mod_rewrite) can also be used to – hey! – rewrite the extension!
    • To a much lower extent, extensions are a help to the middle users – not the casual users, not the hard-core dorks, but people that are working on the Web and that “.” extension helps them for whatever reason. Why not?

  • While I agree that the URL that ends with “/why_I_blog” is a more user-readable (NOT more user-friendly; see below), I guess I have to say, “So what?”:
    • Who reads URLs? Dorks. Why do English URLs help me? OK, maybe it’s easier to remember “{base URL}/my_thoughts” than “{base URL}/blog/entry/index.php3?entry=123”, but how does that really help me? How many sites do you know down past the base URL? So how exactly how is this helpful?
    • Let’s say you do remember the “{baseURL}/advertise” link. What if there is a link for, basically, how to advertise on this site, and one for a how-to article on how to advertise (pretend it’s a PR site)? Yes, the links must be unique. Yes, one has to be called – at some point – something dorky to accomodate all the permutations of “advertise” and so on.
    • OK, can be helpful semantically if there is English instead of “code” (much like DNS works – www.whatever.com is easier to remember than 12.34.56.78).
    • Repeat the preceding bullet point and substitute any other language for “English.” Whoops. Unicode to the rescue? Integers (id=123) usually translate better than character-based language (better: NOT perfect or directly, by any means. Also depends on your math…)

  • Semantically-correct URLs might look nice (especially now, when they are uncommon), but this does not help either 1) Favorites/Bookmarks (uses page title, not URL info), and 2) how many people actually link to the full URL (as in “http://www.somesite.com/why_I_blog” as opposed to putting in a HREF (with full URL) and then some phrase (“Why I Blog”) as the link text? Yes, some do, and that would help, but – especially with Gina’s URLs (example: “http://www.scribbling.net/how_scribblingnet_freed_itself_ from_file_extensions_and_internal_ids” – honest). To be fair, she said she is thinking of trimming all post root URLs to 15 characters or less, but that just introduces the following problems:
    • Limits to number of permutions of links you can have
    • What are the rules for cutting down to 15 characters? Left 15 characters (including punctuation/white space)?, up through last word that falls – somewhere on character 15? What? Or will user be forced to have unique titles?

There are other issues, but – on the whole – I don’t see the reason for a semantic URL yet. I like them; I definitely hate long URL with all sorts of params passed, but I don’t see that much of an issue with “..index.php?id=123”

But maybe (gasp!) I’m wrong. I’ll have to think about it.

But I’m glad I ran across Gina’s site and saw what she had to say and how she did what she wanted to do.

Makes ya think…