More web app ranting.

February 2nd, 2005

The theory is that one shouldn’t be able to tell what parts of a site are dynamically generated and which parts are static, except that the dynamic parts are dynamic. The W3 Style Guide makes some specific reccomendations. Dates as heirarchy make sense, because they won’t conflict over time, so you can keep the same URL valid forever, without namespace collision with future content. URLs specific to a certain set of server software should be avoided: no cgi-bin, no .cgi, and if your resource could theoretically be served up in a new content type in the future, no file extensions. It may be .html now, but maybe next year .xml will be in vogue, and ten years from now, .xht7 files will be the rage. If you leave it off and let MIME types do the describing.

For a web app framework, this means that whether or not to dynamically generate a page can’t be drawn from the URL in the general case. Should /products/foospray be static or dynamic? It should probably be static, updated when product information changes. Perhaps the output code would filter the page to include information about a user’s shopping cart, but hopefully that could be handled client-side, so that every user accessing the same URL at a similar time sees the same page.

One could probably conclude that locating a web app at a particular URL (or collection of URL space) is not the best option for flexibility. One needs a level of indirection for large sites, so that dynamic requests get handled by the right handlers, and static pages are served as fast as the machine can do it.

In an apache-like web server, I’d like apps to be able to be named independently of their URL, so that something like mod_rewrite can direct requests to the right app. I can imagine this registration resembling registering a remote procedure — after all, most web app frameworks have an entry point that looks remarkably like handle_request(request_object).

Perhaps it would look like this: RegisterApplication shoppingcart /usr/bin/carthandler Route "/products/([0-9a-z]+)(.*)" shoppingcart product=$1 info=$2 URLFor product,info "/products/#{product}#{info}" It’s a little ugly, but I think it would work really well. The URL parsing is handled by the web server, where I think it belongs, but enough context about the request is passed on to the web app that it can make self-referential URLs without too much problem. How the arguments are passed to an application process is grounds for another rant, not this one which is more concerned with routing and URL spaces.

The neat thing about that is that one can re-arrange the URLs in the web server, and assign them a semantic meaning there. If one wanted to develop a standard for configuring these things in a webserver-independent way, that would be nice, but hardly neccesary if the app had enough information to just do the Right Thing.

I think a limitation of having user-exposed keys like URLs is that they are subject to configuration, not just being simple opaque identifiers. It’s not as simple as an auto-numbered field in a database. URLs have to be designed, not just created. Most non-trivial systems are going to have to have some configuration of their URL spaces, because auto-generation just isn’t going to get it right.

In the example above, the instructions for building a URL from a keyed set of information is provided in the configuration. This can be passed on to the app, so that it can generate URLs refering to itself and to others, without major hassle and with a central, well-defined place of control. No need to repeat yourself all over the code, or make undue assumptions (like that /appname/thiing1/thing2 is magically going to be self-referential.)