PeriodicPreoccupationsProjectsPicturesPersonPing

Music hack day

I'm planning on going to the Music Hack Day in London in two weeks. I'll be waving the flag for The Echo Nest and their fabulous APIs. There's a lot being said elsewhere about it, but I wanted to send out a special welcome to French and Belgian hackers.

The hack day is being held at the Guardian's offices neat Kings Cross, London. That puts it just a couple hundred metres away from the Eurostar terminal in London. So, for precisely the price of a round-trip fare to London, you can hop on an 8am train, get fed throughout the event, housed on Saturday night, and return Sunday evening. Nothing else to worry about. Well worth considering if you're close to Lille, Paris, or Brussels. Oh boy, what I would have given for a weekend like this when I was living in Brussels...

So register right away: the spaces are now filling up fast!

And we can get up to antics like this:


(Which is just the Dissociated algorithm applied to video in synch with the audio, in the latest versions of the Echo Nest Remix API. In my opinion, it moves the image of the subject from being quirky to having serious battles with mental health.)

Related Entries:
Mashed aught-eight
About the Dissociated Mixes
Writing on remix
My mash
ROFLCon: an exaltation of larks
 Permalink

Twitstream

Prompted by a tweet by Simon Willison on Monday, I was intrigued to hear about the Twitter real-time streaming APIs. In spare moments this week, rather than surfing the web, I found myself looking at how to get a view on the API from within Python which was… not trivial. In fact, none of the standard libraries seemed to handle the API at all: every HTTP access library waited until the stream closed, which was potentially forever.

A little poking and prodding, and I knew Twisted was capable of doing it. However, that seemed too heavy-weight a solution for just hacking about. I discovered asyncore, and despite the fairly thin documentation and examples online, it seemed clear that it could help. By evening I had knocked together something that worked with Twitter’s basic authentication and the proxy at work, which pretty much meant that I had to create my own basic HTTP/1.0 client. Not rocket science — I guess I had done the same thing 14 years earlier in Perl4 — but it took some trial and error, and neither the asynchat library nor any other common libraries did anything to simplify putting together an HTTP request.

I posted things up on github just to get some feedback on whether what I was doing was hopelessly misguided and reinventing the wheel, and — other than hearing it confirmed that Twisted could do this easily — it turns out no one else was doing this in Python. Cool, because they’re nifty APIs. It seems Twitter had turned into quite the target this year, with both Facebook on one hand and Google on another looking to get in on the action. I think the simplicity, transparency, and speed of the API are brilliant responses to things like Google Wave. This is very easy to work with, will keep developers around, and is pregnant with possibilities.

Oh, the speed. The fact that there is no apparent latency — a message that I send through my desktop client will appear in the stream before the miniature posting window even closes — makes this tremendously satisfying to work with.

After returning to the project like a bad rash, aided by encouraging words, github followers, and a primal pleasure in seeing words from all around the world spontaneously crawl up my screen, I’ve got a good chunk of code. It’s not brilliantly engineered, but I really enjoyed the process of it, the way it grew organically while trying it with different applications, such as a twistori clone to track (and highlight) multiple keywords in the Twitter stream.

There’s also fixreplies.py, a read-only client that mines your Friends, Followers, Favorites, and conversations to find people to track all public tweets to and from. While I think Twitter did the right thing in limiting replies, this makes for an interesting adjunct to a main, traditional twitter client. You get the feel for a lot more conversations sliding by. As it’s only text in the terminal, it feels more ephemeral, with far less of a need to catch up on everything.

Mostly as an exercise to see if it could be done, I also turned that client into a double-clickable script in Mac OS X. Click, and it opens up a terminal window, asks you for some information on your account, what sorts of users interest you, and how to find up to two hundred users’ conversations to listen in on.

/files/images/fixreplies.png

I do hope that Twitter makes this available for desktop clients and the general public, because it really expands the, er, Twitterverse, and opens up new possibilities in interacting with and getting data from Twitter.

Related Entries:
About the Dissociated Mixes
My mash
Launchpad, Github, Bitbucket
Writing on remix
Twitter
 Permalink

The promise of html5 and low-hanging fruit

I don't have enough time in a month to get one third of what I would like accomplished. I've been brainstorming in various contexts, and I've decided to start giving ideas away wholesale. I have had no problem thinking up ideas in the past, and don't foresee any problems in the near future. I may as well start putting them out there for anyone to pick up and use.

I spent too long this afternoon browsing through the HTML5 working draft, and found some nice things in there, such as the pre-defined BibTeX vocabulary. At first, it's unsettling that HTML5 can exist in a well-formed XML, a decidedly non-XML form, and every step in between, but given the right tools, that might make for interesting opportunities.

Many forms, one document

For example, from what I understand so far, the following two documents will be identical in DOM5 HTML, thanks to optional tags:

<!DOCTYPE html>
<title>Sample Document</title>
<meta name=keyword content=example>
<h1>Sample Document</h1>
<p>Hello<br>World!
<table><tr><td>a<td>apple<tr><td>b<td>banana</table>
<p id=done>I'm done now.
and
<!DOCTYPE html>
<html>
  <head>
    <title>Sample Document</title>
    <meta name='keyword' content='example'/>
  </head>
  <body>
    <h1>Sample Document</h1>
    <p>Hello<br/>World!</p>
    <table>
      <tbody>
        <tr><td>a</td>
          <td>apple</td></tr>
        <tr><td>b</td>
          <td>banana</td></tr>
      </tbody>
    </table>
    <p id='done'>I'm done now.</p>
  </body>
</html>

On one hand, it's faintly alarming. On another, it starts to look kind of cool. (On a third hand, is it old news? I don't think HTML5 approaches this in quite the same way that HTML4 could sort of be in XML-ish form.) I could be as terse as is legal when authoring a document, then serialize it to a canonical well-formed XML document, and then use the end product in my XML toolchain of choice, whether for storage or transformation or editing.

Dear someone looking for something to do — make this tool for the world: a full HTML5 parser that serializes to well-formed XML. Replace all entities (except the necessary five — or even two) with their Unicode equivalents.

With that accomplished, you probably can make a big splash by also letting it output HTML that current browsers won't choke on and/or conversion to HTML4 that retains semantics by converting new elements to div and span.

I'd really love this tool if implicit sectioning elements in an outline were converted into explicit section elements. Having easily manipulable outlining sections would enable a lot more tools — or allow you to consolidate many writing tools into one.

An archive format?

Why do I care about well-formed XML? Well, did you notice the difference in sizes between the HTML parsing and XHTML parsing sections in the HTML5 draft spec?

Working for years in MPEG made me appreciate why we should strive for data longevity. It might be merely an abstract ideal, but it's one of our primary tasks today to be kind to our future selves. If I come across a document in fifteen years, I don't want to have to look up which elements are void elements in order to parse it. But we owe it to ourselves to archive in a format with more structure than plain text, or even an enhanced text like one of countless wiki formats, Markdown, or Textile.

As I make and break websites and leave them online as a form of digital detritus, I've also been thinking a lot about the maintainability and migrability of data. I'm finding it's easier to setup a new CMS than it is to migrate an existing CMS and its data to another machine. I've even considered migrating out of various CMSes by crawling my own websites. Uck.

Influenced quite a bit by Mark Pilgrim's thoughts on The Format, I'm now considering a well-formed flavor of HTML5 as the format for now. It's not as complete as docbook, but the structural elements are sufficiently complete for 95% of use cases for extended text. And it's more compact. And it'll be trivially viewable.

So, the other itch I'd love someone to scratch is to create an author's profile for HTML5. The HTML5 spec describes what is essentially a delivery format. It has worked hard to separate presentation from semantics, and goes as far as it can in doing so. However, there's a lot in the spec that has very little to do with an article or group of articles connected to form a multi-page resource. I would like a canonical version of an article to carry just the data (and metadata) necessary for the article, and nothing more. It should be self-contained and portable.

From an author's point of view, I would like to concentrate on words and structuring those words. No navigation, no scripts, no unnecessary headers, footers, banners, or columns.

That is to say, the canonical format for the ages doesn't have to be the same as what the user accesses — but they could share the same syntax and semantics. If I'm going to have my work mediated by a CMS, then I want it also maintaining a canonical format for each resource, free from the navigation and eye candy that seems to be necessary to get noticed, and free from existing only in a database that creates a lot of inertia for my data. Dokuwiki, as with so many other things, seems to get this right, with its do=export_* modes.

All roads lead to...

That's starting to suggest my next itch to scratch: a CMS that embodies these principles. It can ingest data in various formats (so long as it can be rendered to HTML and potentially carry a little metadata so that an envelope format isn't necessary, it sounds usable to me) and via different channels (REST and some flavor of DVCS feel like my current favorites). It could render these sources lazily, only when the web server misses from a pre-rendered cache, so that most of the rendering machinery stays out of the way most of the time.

This architecture isn't miles away from how Blosxom does things, mashed up with Django's FlatpageFallbackMiddleware. The core could be small, fast, and flexible, working with varied storage solutions and template languages. Really, it seems like the core code for this involves routing, knowing when to render, and how to ingest from diverse sources, and very little else.

This final idea obviously doesn't rely upon HTML5, but thinking about data longevity and what a webpage is naturally leads me back down this road. Ooh. How original. Yet another CMS. Well, yes. But right now, I have to shelve the idea for later anyway — or hope somebody comes along and makes this exactly how I imagine it to be.

Related Entries:
Scaling K2
Symphony, WordPress, Gallery, Flux
 Permalink

1-3/3