PeriodicPreoccupationsProjectsPicturesPersonPing

Recent musings

Google vs Twitter: FUD on URL Shorteners

DeWitt Clinton's screed on URL shorteners, especially directed at Twitter's usage thereof, is interesting, not only for the actual content (which is broadly true, and fairly sensible), but for the meta-message: Google is increasingly threatened by Twitter as the prime mover in the "Real-time web."

To me, this feels a bit like fear, uncertainty, and doubt spread about a competitor, attacking the competitor's actions while distracting us about Google's own actions. Most telling was this sequence about precedent:

As a thought experiment, imagine that your email provider suddenly started rewriting all of the URLs in your outgoing emails so they could track every link the recipients click on.

But since Twitter is the most popular, and arguably the most influential, of the new wave of micro-blogging systems, I sincerely worry that this is going to establish a precedent that everyone else will feel compelled to follow, since it is clearly an advantage to the network if they can get away with capturing this data. I ask, why wouldn't WordPress or Facebook or Tumblr do the same if they could?




We're supposed to be outraged by the privacy implications here, but the real outrage to Google is that it makes their job harder. Look at Gmail's privacy statement on what they do about you clicking links on email you receive:

When you use Gmail, Google's servers automatically record certain information about your use of Gmail. Similar to other web services, Google records information such as account activity (including storage usage, number of log-ins), data displayed or clicked on (including UI elements, ads, links); and other log information (including browser type, IP-address, date and time of access, cookie ID, and referrer URL).

from Gmail's Privacy statement, dated February 9, 2010


Yes, Google's asserted rights are over your own use and not with others' use of the links you send. Ask an average user on the distinction, and I think they'd say it's different but not categorically so.

The other part that interested me is the "Safety and Transparency" part for Twitter's links. A major part of Twitter's justification for wrapping every URL (which I'm still personally dubious about) is to protect people from malicious links. Well, that sounds suspiciously like the role Google's stopbadware.org interstitial warning page plays, especially when Twitter doesn't have direct control of how the status message is displayed (it may be via a third-party application or SMS). Is this an argument against URL rewriting, or an argument against anyone else acting as a trustworthy intermediary?

I think Twitter's revelations on its monetisation and platform strategy earlier this year have Google genuinely worried that Twitter is turning into a trusted gateway into the web, and so it gives rise to pieces like DeWitt's, where Twitter is attacked for minimal differences in approach to taking on a threatening gatekeeper role. Google's problem, and DeWitt's myopia in offering solutions ("consider using an html payload"), is that Google is fundamentally of the web, and deals with web pages viewed through browsers. Twitter reaches beyond the web, being deeply embedded into mobile devices, and deals with much smaller units of interest than a web page.

Related Entries:
It'll end in tiers
Twitter
The twitter problem
NBC and News Corp., sitting in a tree
Why OmniWeb?
 Permalink

Twitstream

Prompted by a tweet by Simon Willison on Monday, I was intrigued to hear about the Twitter real-time streaming APIs. In spare moments this week, rather than surfing the web, I found myself looking at how to get a view on the API from within Python which was… not trivial. In fact, none of the standard libraries seemed to handle the API at all: every HTTP access library waited until the stream closed, which was potentially forever.

A little poking and prodding, and I knew Twisted was capable of doing it. However, that seemed too heavy-weight a solution for just hacking about. I discovered asyncore, and despite the fairly thin documentation and examples online, it seemed clear that it could help. By evening I had knocked together something that worked with Twitter’s basic authentication and the proxy at work, which pretty much meant that I had to create my own basic HTTP/1.0 client. Not rocket science — I guess I had done the same thing 14 years earlier in Perl4 — but it took some trial and error, and neither the asynchat library nor any other common libraries did anything to simplify putting together an HTTP request.

I posted things up on github just to get some feedback on whether what I was doing was hopelessly misguided and reinventing the wheel, and — other than hearing it confirmed that Twisted could do this easily — it turns out no one else was doing this in Python. Cool, because they’re nifty APIs. It seems Twitter had turned into quite the target this year, with both Facebook on one hand and Google on another looking to get in on the action. I think the simplicity, transparency, and speed of the API are brilliant responses to things like Google Wave. This is very easy to work with, will keep developers around, and is pregnant with possibilities.

Oh, the speed. The fact that there is no apparent latency — a message that I send through my desktop client will appear in the stream before the miniature posting window even closes — makes this tremendously satisfying to work with.

After returning to the project like a bad rash, aided by encouraging words, github followers, and a primal pleasure in seeing words from all around the world spontaneously crawl up my screen, I’ve got a good chunk of code. It’s not brilliantly engineered, but I really enjoyed the process of it, the way it grew organically while trying it with different applications, such as a twistori clone to track (and highlight) multiple keywords in the Twitter stream.

There’s also fixreplies.py, a read-only client that mines your Friends, Followers, Favorites, and conversations to find people to track all public tweets to and from. While I think Twitter did the right thing in limiting replies, this makes for an interesting adjunct to a main, traditional twitter client. You get the feel for a lot more conversations sliding by. As it’s only text in the terminal, it feels more ephemeral, with far less of a need to catch up on everything.

Mostly as an exercise to see if it could be done, I also turned that client into a double-clickable script in Mac OS X. Click, and it opens up a terminal window, asks you for some information on your account, what sorts of users interest you, and how to find up to two hundred users’ conversations to listen in on.

/files/images/fixreplies.png

I do hope that Twitter makes this available for desktop clients and the general public, because it really expands the, er, Twitterverse, and opens up new possibilities in interacting with and getting data from Twitter.

Related Entries:
About the Dissociated Mixes
My mash
Launchpad, Github, Bitbucket
Writing on remix
Google vs Twitter: FUD on URL Shorteners
 Permalink

Twitter

Twitter's currently ablaze with talk about the recent change to how @replies are handled on twitter. (Actually, as of this writing, it's ablaze with a shocking number of "RT this if you disagree with Twitter's decision to hide replies to people you don't follow. #fixreplies" messages that cause a massive facepalm.)

While I agree with the change (because for most of my multiple use cases, that's how twitter works best), I feel for the people who didn't use Twitter this way. Their whole way of looking at the social web has been taken away by the elimination of an application preference. That's surely upsetting, and I can't hope to change their minds. But I can try to get them out of their heads and see how the change makes sense for a lot of Twitter scenarios.

I operate a bot, @recomme (which is currently in the shop for maintenance due to another API change that Twitter made with no warning — so my above empathy is real) that feeds on @ replies: send recomme a message, and it tweets back to you, also in the public stream. If you keep your tweet stream private, you must follow @recomme, and it must (auto-)follow you back. The same follow requirements happen if you want to send private messages and receive them back.

This seemed like a reasonable model when I designed it because to me, the sensible way of managing replies was to see only replies to those people you follow and have asserted that you found interesting. @recomme has nearly 4000 followers now. For those people who have kept their settings on receiving replies to people they're not following, that's a lot of eyeballs to reach, especially when anybody can trigger a message from the bot. Fortunately, this has not been noticeably abused by any of the thousands of users. Unfortunately, because of the existence of the all-replies setting, the bot receives nowhere near the number of tweets you would expect from the number of users, I suspect because the social cost of sending tweets to and from the bot are big.

Now, if Twitter stays with hiding replies to people you don't follow, then my vision of an "emergent social network" can happen. People can happily tweet the bot, knowing that people who don't care about music recommendations won't see those tweets or those replies. The people who do care about you and music recommendations do see those tweets, and I think that's a nifty way of opening conversations.

For my own tweets, I struggle with keeping the number of people I follow down, and I have a tough time refusing to read tweets. The "@ replies to people I'm following" setting was an effective filter. Actually, there's a core of twitter users who I know in real life, and I am more keen on seeing all their tweets. (And did so with a secondary account that saw all replies.)

But the real reason why I'm in favor of the recent change is because of new users. Techcrunch actually showed some insight after their typically incendiary headline:

Before tonight I never paid much attention to this train of thought - after all, on Twitter, I can just follow the people I care about and ignore those I don’t. But it’s clear that Twitter is concerned with appealing to a more mainstream audience, and if that takes making a very simple service even more simple, then by golly, that’s what they’re going to do.
Well, yes. Exactly. My pride as an early adopter is far outweighed by the desire to have more close friends and family come to Twitter.

What does a new user see after adding a few friends and "recommended users"? Some tweets, but a lot of decontextualized half-conversations. This is confusing and off-putting. I feel guilty some days when I use Twitter as an open IM/IRC channel, having long threads of conversation. For those who follow all of my tweets, I must seem incredibly boring, geeky, trivial and somewhat profane. That's accurate, but it's not what I like to remind people.

I think that seeing this chatty, focused, decontextualized side of Twitter is not a very gentle introduction for newcomers. It might be useful for comfortable users who have been around for a while but follow dozens of people (or for very prolific users who skim or otherwise filter their tweet stream but don't pay particular attention to any users in particular). But for introducing users to Twitter while fighting concerns that it is "trivial," it's a pretty important step to take.

Some ideas that were thrown at me were improving the experience:

Carr0t
I would prefer it on a per-user basis. So I could say ignore @replies from @stephenfry, but get them from @daagaak.
daagaak
It would be nice for API users, bots, etc to be able to specify if a tweet should act like that. I just dislike the list of discovery.
...in other words, make the control more fine-grained, whether it be push or pull. Well, changing the granularity would be welcome from my selfish point of view. I could zoom in on some users, and not on others. I can see the appeal of the push-control, too. I would have welcomed making @recomme more private, if that had been an option.

But the truth is, that's all too fiddly. It makes a degree of sense to a "power user" like me, but from a user standpoint, it's a disaster. Too many things to control. The point of the change was to simplify a hard to understand option, something that I stand behind on principle.

On the other hand, Anne said, "Never take options away from a user." Those are also wise words. My gut instinct to make as many people happy as possible would be to grandfather in the feature: keep the option for those who have changed from the default, make it disappear for all others. This creates two tiers of users, though, which is untenable in the long term, especially from the social aspect: these exceptional users have a view on Twitter that is fundamentally different from others, and are likely to use it differently. As such, I think the feature should be phased out. Leave it to clients to support an expanded view of the greater twitterverse.

There's a Twitter truism that's been floating around: "Anyone who tells you how Twitter should be used is wrong." I've tried not to fall afoul of that here, but it's a fine line.

Related Entries:
Google vs Twitter: FUD on URL Shorteners
The twitter problem
ROFLCon: an exaltation of larks
and this is my jam
Content transition
Comments (2)  Permalink

The twitter problem

I am a patient person, so it's only now that Twitter's perennial scaling problems are bothering me me enough to blog about them. That probably makes me the last person to do so, ever.

However, lately, it's hurt. It hurt most when I tried to implement a twitter bot at Mashed08. With API calls throttled down to 20 per hour, the best I could hope to do (via polling, and with IM shut out, that was the only obvious path) was to be a bot for one person making no more than one request every 10 minutes. So for the demo, the twitter connection was really baling wire and duct tape (or, ipython console and cut-and-paste into twitter's web form).

Last month, I read Tim Bray's Twitterbucks entry with interest. When I last checked in, nobody seemed to be interested in where the real scaling problems were, so the comment thread didn't come up with any real revelations.

Today, as I tried to reflect on why I use twitter, I came upon another potential solution: pay for what is the hardest to scale: disk access. When any high-volume application has to hit the spindles, it takes a massive performance hit. Twitter's recent outages seem to address that at least partially: paging backwards in your personal + friends timeline is scaled back, as are examining replies.

Seems to me that much of what Twitter covers well is the "now" and recent past. Going back in time on a merged timeline makes for increasingly expensive queries, and reaching further back in history goes beyond the memory caches. If Twitter didn't try to keep all its posts accessible, it could be a much more efficient messaging platform, always living in an amnesiac present. By having a web-accessible memory, with persistent tweets, it becomes a lot more difficult to predict where the database is going to be hit.

So take a look at Twitter as it stands right now. With the buttons that are disabled, which ones are the biggest pains? Single-user pagination? Friend pagination? Replies? Seems to me that the biggest omission is in having zero reply-page functionality, but the complex query (user > friends > friends' updates that can be seen > merged and sorted in time) database hit makes sense to limit. Why not cull functionality for all users such that it's either a complex query that hits memcached exclusively (the pages representing what's happening now and in the recent past), or a very trivial query that is allowed to hit the disks (a single, permalinked tweet or any user's front page of recent tweets). A twitter caught in the present, and exhibiting some memory when specifically prodded.

From there, you could charge for more archival access. I imagine this not as a monetisation move, or even one that could directly cover additional costs, but one that would allow serious users with serious needs self-select, not unlike what Flickr has done with their paid accounts. A paid user could access their archives as a continual stream of tweets on a blog-like page. They could access a more comprehensive memory of their friends' replies. They might even be given persistent past per-day or per-month archive pages.

I'll admit that I don't fully appreciate the particular scaling problems presented by heavy users like Scoble. Perhaps there are payment thresholds to pass once you follow 500 and 5000 users?

What do people think? I know I can't be the first to suggest it, but it's the first I've heard of it.

Related Entries:
Google vs Twitter: FUD on URL Shorteners
Twitter
Good lord, referrals do work!
ROFLCon: an exaltation of larks
and this is my jam
 Permalink

My mash

As previously blogged, I attended the BBC's Mashed08 hack day. I explored some of the noteworthy hacks on the LOLCODE site, but it's time that I explain what I did with my 24 hours straight of hacking.

The number of APIs and virtual toys unleashed by the BBC at Mashed08 was a bit dizzying with all the choices. In the end, however, I had to go with an idea that had been rattling around in my head for the longest: a twitter bot based on the Echo Nest Recommend API. Twitter bots are nothing new, not terribly original, and not even all that feasible nowadays with the API limits, but it seemed quite a nice application of the EN Recommender.

What I spent a lot of time on was the ergonomics, context awareness, and giving the bot a memory, all in aid of getting maximal information from minimal effort from the (mobile) twitter user. The bot's name, 'recomme,' was designed to be easily keyable with a T9 keypad. I spent a fair amount of time on maximizing the amount of information in 140 characters.

Given a tweet with a bandname to @recomme, it responds with the closest recommended bands:

http://lindsay.at/files/_galleries/gallery/recomme/Picture5.png http://lindsay.at/files/_galleries/gallery/recomme/Picture6.png

If you name one the BBC's (pop) music radio stations, then the bot is aware of the context at the time: it goes off and checks on what's currently playing:

http://lindsay.at/files/_galleries/gallery/recomme/Picture7.png http://lindsay.at/files/_galleries/gallery/recomme/Picture8.png

If you messaged the bot too late, however, you can correct it, asking for the track immediately preceding:

http://lindsay.at/files/_galleries/gallery/recomme/Picture9.png http://lindsay.at/files/_galleries/gallery/recomme/Picture10.png

If the recommendations are on target, you can ask for more of the same:

http://lindsay.at/files/_galleries/gallery/recomme/Picture11.png http://lindsay.at/files/_galleries/gallery/recomme/Picture12.png

As you can see, there's a fair bit of state saved with each interaction with the bot, and it responds with as much information it can fit into the space allotted.

The fairly terse URLs that follow each set of recommendations are give each query some persistence, an easily accessible reminder of what was requested, capturing the context of the moment, and offering more verbose detail than can be captured in a 140 character message. The user's past queries (saveable, sharable) are also accessible via a linked user page.

The cherry on top, and perhaps the only part of the hack that couldn't conceivably be done last year, was that the BBC Audio & Music Interactive team brought live archives of the BBC pop music radio stations. For the Mashed08 event, I was able to link to these live, time-indexed archives, so in the above "BBCR1" query, the persistent link pointed to the right time in the past such that you would hear the BeatFreakz song that was playing at that time.

Most of the development time was spent re-learning parts of Django, and getting the model for the underlying web application ("memory") right. The twitter bot is a separate process that requests information from and saves things to the Django webapp with some special POST requests.

The project was a a fair bit of fun, and it felt rewarding to see it through. I'm a bit conflicted on whether to deploy it: I think it would be good, fun, and useful to some, but it could easily use another solid 24 hours of polish before it's presentable. Furthermore, Twitter is in no state to support a new bot: 20 API requests per hour are nearly useless for something that's ostensibly an interactive mobile application.

As a postscript, I noticed that the "recomme.com" domain was cybersquatted by the time I returned to my hotel in London. I wasn't too bothered by it, as I have other domains that could be pressed into service for this, but I was mightily impressed at who's paying attention.

Related Entries:
About the Dissociated Mixes
Twitstream
Writing on remix
Music hack day
Launchpad, Github, Bitbucket
 Permalink
1-5/5