More on Horsepower

I was just reading an (old) blog entry by Jeremy Zawodny about how Yahoo! (where he works) got Slashdotted.

OK, as the commenters pointed out, it was not Slashdotted – taken down by the deluge of requests brought on by a Slashdot post.

But that was Zaw’s point: He always reads about how sites get taken down by the Slashdot effect, and even though Yahoo! got enormous traffic due to the /. posted, it was no biggee.

For one box, running FreeBSD and Apache. (ONE box folks).

His points on why other sites go down with high traffic were just two:

  1. Lack of sufficient bandwith (Obviously, not a problem for Yahoo)
  2. Dynamic content

I totally agree with him, but the second point made me think, especially in light of my previous post.

For those of us with no lives (yes, we know who we are…), it’s common knowledge that one of the secrets of Yahoo’s stability and speed is it’s use of static content – the top page and other indices are rebuilt on remote boxes every [what time frame?] and pushed to the front-end Web servers.

So it’s all static content. Very fast.

Which totally made sense in the early days of the Web, with the 9600 baud (and less) modems, slower home and server boxes (WOW – a 166Mhz Pentium with 96M RAM! That must smoke!!!). Take away as much overhead – an/many OBDC request(s) – as possible and the site will be faster.

Yet this was, at the same time, the era of the static Web. While Yahoo! was clever in making as much of what would normally be a dynamic site static to improve overall performance, Yahoo! was the type of site – a search engine (more accurately for Yahoo!, an indexing site) – that was one of few that needed to be dynamic.

Back then, the only dynamic nature of sites was either via SSI or Perl CGI scripts. And they were used, generally, for things a static site could not do: Page counter, quizzes, creating pages for and storing info from form pages. (The one exception was the use of SSI for header and footer files, so site maintenance could be somewhat more manageable.)

Most sites were collections of static, hand-coded pages.

Need to change that header graphic? Search and replace all 415 pages, dude….

Then – sometime around the time of the dot.com upload – the concept of portals emerged as a business model that would make those companies/investors filthy rich.

The key of a portal was all about personalization: Ralph goes to Excite.com and gets one look/feel/content; Rebecca gets another. Same site.

Suddenly, terms like “sessions” and “database” entered the vocabulary of a lot of non-techie folk.

At the same time, hardware was getting faster, backbones were getting more robust, and connection speeds were getting up to (after the insane standards battle- K-Flex vs. [I fergit – X?]) 56K! Some folks even had ISDN, which was twice that rate. Whoo hoo!

Suddenly, regular – non Web-only – sites (think “The City of [your hometown]” Web site) startied either embracing the personalization trend to attract more customers/eyeballs (remember “attracting eyeballs”?) so they could, too, become filthy rich, and to provide a way to better manage the site.

For larger sites, true content-management tools came out (think Vignette); for smaller sites, it was a small developer/development group building a narrowly defined content management toolset.

Either way, a fundamental shift was underway: Static pages gave way to templates, which pulled – at the time of request – the appropriate data from a database to provide that page’s content (or portions thereof).

We have been moving in that direction ever since.

While the portal concept has died down (lives on, in a better positioned way, on company intranets), the concept of a database-driven Web site is the norm now (who da heck has a truly static, hand-coded site anymore? Home sites, sure, but few other types).

In fact, it’s gotten a little bit nutty lately, where you look at code for some sites (including those I’ve built….) and you see queries tossed around like rice at a wedding. A query to set this, to set that…but that’s another rant.

Basically, the Web is currently, overwhelmingly database-driven.

Which gets us back to Zaw’s second comment: Essentially, dynamic pages incur much more overhead than static pages.

Again, agreed.

Yet, with all this bandwidth and turbocharged hardware, do we really care about this anymore? And – even if we do – computers are going to get faster and more people are going to get broadband (which will get “broader” as time goes by…) and so on.

So, should we really care about this? I mean, when is the last time you really worried about file size, image loads and so on. Sure, that’s hard-wired into us now, but the things I see being done every day are things that would have been red flags not too long ago.

Current Standard Web Practices – or, “what would have gotten your knuckles rapped not too long ago…”

  • Deep table nesting: Dreamweaver is, in my mind, responsible for this. I’ve gotten “templates” from designers with HTML that is so nested even I have trouble figuring it out. A table with a TD with a table making up the cell…and cells in that table each have tables…with rowspans/colspans…..Sometimes, I just look at the output (in a browser) and re-code the HTML. It always comes out cleaner, easier to maintain, and with much less nesting.
  • Massive image pre-loading This is usually due to loading images used for navigation (both “on” and “off” images). Yes, I know, CSS can do this; tell them (and CSS does not work (well) in Netscrape 4.x…kill me now…)
  • Lots of images: While most places run image through ImageReady (it’s like magic..) or similar tool, there is still no issue with putting X images on a page. No real examination of of the cumulative effect of all these images. Discussions often focus on whether to display (catalog example) the thumbnail or full-size image on a page; this discussion is invariably about appearance and number of catalog entries per page, not about load time.
  • Flash: While a great tool, it’s overused, much like organizations overused fonts when desktop publishing became commonplace. Didn’t need 13 fonts on a page, but now you could do that….. And while Flash is vector-based and usually pretty small; folks, it’s the startup that kills you. The load, fire up that plug-in (remember, it’s not native) that is the overhead, that can cause page-load slow-downs and rendering issues (esp. in IE6, I’ve noticed)
  • Standards:
  • Standards? We don’t need no stinkin’ standards?! While CSS has gained a foothold, and tools like Dreamweaver have (by default, so users are unaware) enforced standards (close LI and OPTION tags, for example), most developers are clueless about XML, XHTML, HTML 4.x. I guess the tools have to catch up.

  • (trust me, I could go on and on…)

Note: This is just what I see happening at a lot of development houses. While progress is, well, progressing (use of image-optimizers, CSS has almost made extinct), there is still work to be done.

So should we care about static (fast) vs. dynamic (not as fast)?

I dunno.

It’s interesting, because at the same time that our hardware/connections are becoming so fast that upgrades make virtually no difference, I’m seeing a trend toward static publishing.

Rather, a Yahoo!-style publishing, where dynamic content is written out, and then served as static pages.

Case in point: Blogs.

Blogs are a typical content-managed solution: Templates are tweaked for visual representation; content is stored in a database. All content changes are made via a back-end tool that stores info in a database.

Very familiar, so far….

But here comes the twist: In most cases (let’s say, the default setting on most blog tools), the data is then written to static pages, which is what is ultimately served up by the Web server.

So while the site is dynamic in that there is databased content, the content the end user sees is not pulled from the database at run time.

Interesting.

But is it necessary?

OK, for tools like Userland Radio and the tool I’m currently using (basic Blogger), it’s necessary because the database does not exist on the Web server. Created in one place (with resident database), and then pushed to Web server as static pages.

Works well; makes it tough to integrate “run at run-time” tools, but, yes, snappy.

But – except for the tools that make it necessary to create static pages – are static pages really necessary?

Again, I dunno.