PeriodicPreoccupationsProjectsPicturesPersonPing

Recent musings

The promise of html5 and low-hanging fruit

I don't have enough time in a month to get one third of what I would like accomplished. I've been brainstorming in various contexts, and I've decided to start giving ideas away wholesale. I have had no problem thinking up ideas in the past, and don't foresee any problems in the near future. I may as well start putting them out there for anyone to pick up and use.

I spent too long this afternoon browsing through the HTML5 working draft, and found some nice things in there, such as the pre-defined BibTeX vocabulary. At first, it's unsettling that HTML5 can exist in a well-formed XML, a decidedly non-XML form, and every step in between, but given the right tools, that might make for interesting opportunities.

Many forms, one document

For example, from what I understand so far, the following two documents will be identical in DOM5 HTML, thanks to optional tags:

<!DOCTYPE html>
<title>Sample Document</title>
<meta name=keyword content=example>
<h1>Sample Document</h1>
<p>Hello<br>World!
<table><tr><td>a<td>apple<tr><td>b<td>banana</table>
<p id=done>I'm done now.
and
<!DOCTYPE html>
<html>
  <head>
    <title>Sample Document</title>
    <meta name='keyword' content='example'/>
  </head>
  <body>
    <h1>Sample Document</h1>
    <p>Hello<br/>World!</p>
    <table>
      <tbody>
        <tr><td>a</td>
          <td>apple</td></tr>
        <tr><td>b</td>
          <td>banana</td></tr>
      </tbody>
    </table>
    <p id='done'>I'm done now.</p>
  </body>
</html>

On one hand, it's faintly alarming. On another, it starts to look kind of cool. (On a third hand, is it old news? I don't think HTML5 approaches this in quite the same way that HTML4 could sort of be in XML-ish form.) I could be as terse as is legal when authoring a document, then serialize it to a canonical well-formed XML document, and then use the end product in my XML toolchain of choice, whether for storage or transformation or editing.

Dear someone looking for something to do — make this tool for the world: a full HTML5 parser that serializes to well-formed XML. Replace all entities (except the necessary five — or even two) with their Unicode equivalents.

With that accomplished, you probably can make a big splash by also letting it output HTML that current browsers won't choke on and/or conversion to HTML4 that retains semantics by converting new elements to div and span.

I'd really love this tool if implicit sectioning elements in an outline were converted into explicit section elements. Having easily manipulable outlining sections would enable a lot more tools — or allow you to consolidate many writing tools into one.

An archive format?

Why do I care about well-formed XML? Well, did you notice the difference in sizes between the HTML parsing and XHTML parsing sections in the HTML5 draft spec?

Working for years in MPEG made me appreciate why we should strive for data longevity. It might be merely an abstract ideal, but it's one of our primary tasks today to be kind to our future selves. If I come across a document in fifteen years, I don't want to have to look up which elements are void elements in order to parse it. But we owe it to ourselves to archive in a format with more structure than plain text, or even an enhanced text like one of countless wiki formats, Markdown, or Textile.

As I make and break websites and leave them online as a form of digital detritus, I've also been thinking a lot about the maintainability and migrability of data. I'm finding it's easier to setup a new CMS than it is to migrate an existing CMS and its data to another machine. I've even considered migrating out of various CMSes by crawling my own websites. Uck.

Influenced quite a bit by Mark Pilgrim's thoughts on The Format, I'm now considering a well-formed flavor of HTML5 as the format for now. It's not as complete as docbook, but the structural elements are sufficiently complete for 95% of use cases for extended text. And it's more compact. And it'll be trivially viewable.

So, the other itch I'd love someone to scratch is to create an author's profile for HTML5. The HTML5 spec describes what is essentially a delivery format. It has worked hard to separate presentation from semantics, and goes as far as it can in doing so. However, there's a lot in the spec that has very little to do with an article or group of articles connected to form a multi-page resource. I would like a canonical version of an article to carry just the data (and metadata) necessary for the article, and nothing more. It should be self-contained and portable.

From an author's point of view, I would like to concentrate on words and structuring those words. No navigation, no scripts, no unnecessary headers, footers, banners, or columns.

That is to say, the canonical format for the ages doesn't have to be the same as what the user accesses — but they could share the same syntax and semantics. If I'm going to have my work mediated by a CMS, then I want it also maintaining a canonical format for each resource, free from the navigation and eye candy that seems to be necessary to get noticed, and free from existing only in a database that creates a lot of inertia for my data. Dokuwiki, as with so many other things, seems to get this right, with its do=export_* modes.

All roads lead to...

That's starting to suggest my next itch to scratch: a CMS that embodies these principles. It can ingest data in various formats (so long as it can be rendered to HTML and potentially carry a little metadata so that an envelope format isn't necessary, it sounds usable to me) and via different channels (REST and some flavor of DVCS feel like my current favorites). It could render these sources lazily, only when the web server misses from a pre-rendered cache, so that most of the rendering machinery stays out of the way most of the time.

This architecture isn't miles away from how Blosxom does things, mashed up with Django's FlatpageFallbackMiddleware. The core could be small, fast, and flexible, working with varied storage solutions and template languages. Really, it seems like the core code for this involves routing, knowing when to render, and how to ingest from diverse sources, and very little else.

This final idea obviously doesn't rely upon HTML5, but thinking about data longevity and what a webpage is naturally leads me back down this road. Ooh. How original. Yet another CMS. Well, yes. But right now, I have to shelve the idea for later anyway — or hope somebody comes along and makes this exactly how I imagine it to be.

Related Entries:
Scaling K2
Symphony, WordPress, Gallery, Flux
 Permalink

Scaling K2

Perhaps it's a bit cheeky to diss a CMS on a blog that doesn't use it, but I use WordPress, even if it's not on this site. My "big time" blog is moving house, from a dedicated server that I more-or-less maintain (in my wife's office), to a shared server on TextDrive. Performance certainly has taken a big hit, but I can pretty much guarantee that it can scale to higher traffic better and that it won't be taken out nearly as often or as long by random downtime.

I spent huge swathes of the weekend trying to get the site's theme working in a way that I liked. The K2 theme boasts of being a flexible theming framework, allowing you to do what you like in CSS, and "baking in" support for several popular plugins, lowering the need to tinker with the core theme files even further. The promise didn't quite match reality. K2 is an attractive, clean theme, and I like the fun, AJAXy things you can do with it, but it still has a set structure and strong ideas about how elements are displayed. Unfortunately, I was trying to match a known structure and layout, and had strong ideas about what the design should look like, myself. WordPress was good in allowing complete flexibility, but in the process I completely changed the internals of the infamous WP-loop, meaning I have to do a lot of work if ever I want to upgrade the K2 theme.

(This makes me wonder about plugging XSLT into the WordPress theme hierarchy. You can radically rewrite elements using that. So I'd imagine a master Xhtml document being created by the loop, and optionally doing a run through an XSLT stylesheet before being shipped off to the reader. Obviously, I'm a bit swayed by this site's architecture, and it's simply moving the problem to another place, and not entirely avoiding maintainability problems (and creating further performance issues as well). But, computer science is nothing if not the art of moving problems around... Anyway, that's an aside to explore later.)

What really let me down, however, was the complexity of the master CSS file for the K2 theme. The default theme has a lot of attractive, complex, subtle things going on, and the CSS is a mess to deal with. First off, if the makers of K2 want to encourage CSS-only styles, they would do well to strip out the unnecessaries in the base CSS, and move the attractive blue Kubrick-2-like theme into a default style of its own. There were far too many settings to override, deeply buried within the CSS inheritance hierarchy (I had to impose my choice of font family at least four times, manually, for example).

Quite contrary to K2's implied promises, I ended up with a cluttered, fairly arbitrary stylesheet that was a series of hacks. I know at least part of the blame falls on my empirical, hack-y CSS coding method. Part of it also falls on the state of CSS, and the tools we have to deal with it — I think there must be a better way. (Does anyone advocate grouping by properties instead of elements? All the elements of the same color could be named once, and then all font styling is done that way, etc. I see flashes of this in exemplary styles, but it isn't consistent. [I see google brings up one page, originally from 1996...])

So, since I must count you as one of my dedicated fans (especially if you've read this far), I can point you to the site as a preview, because I'm sure it's not going to be overrun by a Slashdot effect from here: mediadescri.be. I like the URL a lot. I've been sitting on it for the better part of a year.
Related Entries:
Moving up the stack
Symphony, WordPress, Gallery, Flux
A lifer, again
Customer loyalty
Tiers shed by Jason
 Permalink

Symphony, WordPress, Gallery, Flux

So, I discovered that the Symphony CMS is now for free. I had heard some nice things about it, and looked at it for a few minutes, when I was considering setting up this website, mostly because of the XSLT architecture. I took another quick look last night, and I think I'm still happy with this Flux CMS. Symphony felt a bit like a prettified TextPattern, with XSLT in place of tags. If it had been released for free perhaps a week earlier, I might have built lindsay.at on that architecture, but right now I have no regrets.

I also stayed up far too late last night getting the skeleton of Ian's site up, using WordPress, Gallery2, and WPG2 linking them, in a subdomain of my site. On one level, it's amazing how the pieces work together so easily. On another, there's a lot of fine-grained tweaking that still has to live in stylesheets (which one, now?!) in order to make things right. I guess it's a while before we'll see an architecture that allows everything to be set up and display and interlock in just the right way.

One thing I would like to take from Symphony is their tumbleblog revival: my old site had one of those in the form of an unordered list on the front page, with pointers to pages updated on my site. Sounds like a helpful, light way of incorporating site news without being overreliant on the blog architecture. Hmm.
Related Entries:
Moving up the stack
Scaling K2
Google vs Twitter: FUD on URL Shorteners
The promise of html5 and low-hanging fruit
Twitter
 Permalink
1-3/3