RSS Feeds

I’ve added a page to this blog – RSS Feeds – that is not so much for anyone else but for me.

I’ve been tinkering with RSS and XML for over a year; I build the RSS that this site has by hand (it parses out the static index page and drops to RSS file every five minutes).

This is another step.

Basically, I build an RSS parser on my home box that grabs some feeds that I like at certain intervals, processes same, and uploads the results to the RSS Feeds page.

This is just an experiment to teach me how do to all this – no, I don’t want to be (and technically can’t) the next Technorati or what have you.

Basically, I want to learn how to use RSS feeds, process same and get results so I can, at some future date, embed a “recent headlines” area in a client’s Web site.

It’s another tool that I can wield; another way to leverage what is out there.

This feed section is strongly beta; here are the good and bad points of the section:

The Good:

  • It works! For all the caveats and so on that are listed below, it pretty much does as designed. Slick.
  • All processing happens locally and is then pushed to the remote (publically accessible) site. No database hits or what have you for the end user.
  • It was designed with extensibility in mind: Designed to not process a given RSS feed, but to process all feeds in an array – so I can keep adding/subtracting to the list and no code changes.
  • Using a simple JS function and CSS, I display the list of items without descriptions. A toggle is available to show/hide descriptions; defaults to no description (more headlines per inch). Note: Since JS is used, a page reload is not required. Very fast.
  • I cache feeds, so I don’t hit (just reprocesses local copies) any sites more frequently than every hour. During testing, I hit Slashdot too often, and I’m now under a 72-hour ban for RSS feeds. My bad.
  • Even on this first cut, the code is documented fairly well – it’s not alpha code – and has a handful of variables that can easily be transfered to a config file of sorts to alter the output. For example, I have a constraint on the number of listings from any given site (currently defaulted to 10). If the site offers more listings than the default, the default wins. If the site offers less listings than the default; the site listing’s count wins (duh!). But little things like that are good, especially this early in the process.

The Bad:

  • The processing code – all combined – is too much code. This calls this which includes that which writes to this which is FTP’d to there…and so on. First cut; code work. Now the challenge is to optimize.
  • Right now, it’s built in PHP, with a shell script for the cron. Should build the entire thing in either Perl or a shell script to make it faster.
  • Major character issues – Lot to learn there, but that’s part of the point, to get it as generic as possible so I can roll it out for any RSS feed and have it work.
  • I’d like to add a feature where each feed can have its own schedule – for example, I don’t care if I hit my own site more frequently than every hour. But right now, the global is one hour (I can set the global to any time), and I can’t override that value for any given feed – it’s all or none. In the future, this will be important: Some sites will allow more frequent updates, and that should be designed into this sort of app. Why not? Worst case scenario, I build in this functionality that almost never – or just never – is used. It’s there if needed.
  • As my feed list gets larger, I’ll probably have to create some sort of page-level navigation (drop-down form or bullet list) to take users down the page to the feed desired.

But first cut. Damn good for that, I think.

Lot of tweaks needed, but this is at the 80/20 mark already (80% of the functionality with 20% of the work…)