Category Archives: Technology

Tungle: A wasted opportunity

Apparently Tungle has shut down development, although they still allow people to sign up. Turns out their acquisition by RIM last year must have been an acquahire (technically an acquisition, but really an admission of defeat).

Tungle had an incredibly viral business model, perhaps the most viral I’ve seen since Plaxo, solving a problem I and many others encounter on a near-daily basis:  Help people schedule meetings and calls with each-other.

So what went wrong? Their usability SUCKED. I desperately wanted Tungle to work, but almost every time I tried using it to schedule a meeting with someone, something would screw up and we’d have to resort to manually scheduling via email.  This was embarrassing when it happened, but even so I tried over and over again.  Every time I did could have been an opportunity for Tungle to sign up a new user, if their usability wasn’t so bad.

So if there is anyone out there looking for an idea that could be the next LinkedIn-scale viral phenomenon, all you have to do is reimplement Tungle, but this time get the usability right.  If I weren’t already rather busy I’d be doing this myself.

Microsoft probably just killed “Do Not Track”

Update (27th Oct 2012): I told you so!  Yahoo will ignore DNT from IE10 for exactly the reason I cite below.

Microsoft just announced that the “do not track” opt-out would be on by default in Internet Explorer 10.  This is a boneheaded move.

“Do not track” is a standard through which a web browser can inform a web page that the user does not wish to be tracked by third-party websites for the purpose of advertising.  So far as I can tell, respecting this is entirely voluntary on the part of the advertisers.

Advertisers often use browser cookies to track users, this allows them to target advertising specifically to people who’ve visited their website, for example.  Google and Microsoft both do it, it’s fairly standard practice these days.  Typically the advertiser isn’t tracking you as an individual, all they know is that you may have previously visited a particular website.

To explain why Microsoft’s move is boneheaded, I’ll relate a story from the early days of Revver, the online video sharing website that I co-founded back in 2004.

We had decided to let video uploaders tell us whether the video contained any content that is not appropriate for children as part of the upload process.  The vast majority of our users did exactly this and all was well, until at some point we realized that people were uploading some pretty serious pornography that we weren’t comfortable with even if it was marked as “adult” by the uploader.

Our panicked solution was to simply remove all videos marked as “adult” from the site, and prevent any further uploads where the videos were so-marked.

Of course you can predict the result: people immediately stopped marking videos as “adult”, making our task vastly more difficult.

The moral?  Don’t expect people to do something voluntarily if you are then going to use it against them.

I think Microsoft has just made exactly the same mistake.  Previously I think there was a reasonable chance that advertisers would choose to respect this, since only a minority of users are likely to enable it, and those are the people that really care about not being tracked.

But if it is enabled by default in Internet Explorer 10, advertisers now have no idea whether the user really cares about being tracked, and as a result they are far less likely to respect it.

Looking at it a different way, Microsoft just gave advertisers the perfect excuse to ignore DNT, because they can correctly claim that in most instances the user will have made no conscious decision to enable it.

Proportionate A/B testing

More than once I’ve seen people ask questions like “In A/B Testing, how long should you wait before knowing which option is the best?”.

I’ve found that the best solution is to avoid a binary question of whether or not to use a variation. Instead, randomly select the variation to present to a user in proportion to the probability that this variation is the best based on the data (if any) you have so-far.

How? The key is the beta distribution. Let’s say you have a variation with 30 impressions, and 5 conversions. A beta distribution with (alpha=5, beta=30-5) describes the probability distribution for the actual conversion rate. It looks like this:

A graph showing a beta distribution

This shows that while the most likely conversion rate is about 1 in 6 (approx 0.16), the curve is relatively broad indicating that there is still a lot of uncertainty about what the conversion rate will be.

Let’s try the same for 50 conversions and 300 impressions (alpha=50, beta=300-50)”):

You’ll see that while the curve’s peak remains the same, the curve gets a lot narrower, meaning that we’ve got a lot more certainty about what the actual conversion rate is – as you would expect given 10X the data.

Let’s say we have 5 variations, each with various numbers of impressions and conversions. A new user arrives at our website, and we want to decide which variation we show them.

To do this we employ a random number generator, which will pick random numbers according to a beta distribution we provide to it. You can find open source implementations of such a random number generator in most programming languages, here is one for Java.

So we go through each of our variations, and pick a random number within the beta distribution we’ve calculated for that variation. Whichever variation gets the highest random number is the one we show.

The beauty of this approach is that it achieves a really nice, perhaps optimal compromise between sending traffic to new variations to test them, and sending traffic to variations that we know to be good. If a variation doesn’t perform well this algorithm will gradually give it less and less traffic, until eventually it’s getting none. Then we can remove it secure in the knowledge that we aren’t removing it prematurely, no need to set arbitrary significance thresholds.

This approach is easily extended to situations where rather than a simple impression-conversion funnel, we have funnels with multiple steps.

One question is, before you’ve collected any data about a particular variation, what should you “initialize” the beta distribution with. The default answer is (1, 1), since you can’t start with (0, 0). This effectively starts with a “prior expectation” of a 50% conversion rate, but as you collect data this will rapidly converge on reality.

Nonetheless, we can do better. Let’s say that we know that variations tend to have a 1% conversion rate, so you could start with (1,99).

If you really want to take this to an extreme (which is what we do in our software!), let’s say you have an idea of the normal distribution of the conversion rates, let’s say its 1% with a standard deviation of 0.5%.

Note that starting points of (1,99), or (2,198), or (3,297) will all give you a starting mean of 1%, but the higher the numbers, the longer they’ll take to converge away from the mean. If you plug these into Wolfram Alpha (“beta distribution (3,297)”) it will show you the standard deviation for each of them. (1,99) is 0.0099, (2,198) is 0.007, (3, 297) is 0.00574, (4, 396) is 0.005 and so on.

So, since we expect the standard deviation of the actual conversion rates to be 0.5% or 0.005, we know that starting with (4, 396) is about right.

You could find a smart way to find the starting beta parameters with the desired standard deviation, but it’s easier and effective to just do it experimentally as I did.

Note that while I discovered this technique independently, I later learned that it is known as “Thompson sampling” and was originally developed in the 1930s (although to my knowledge this is the first time it has been applied to A/B testing).

A (possibly) novel approach to image rescaling

We’ve all seen fictional examples of increasing the resolution of images to reveal previously unseen detail, here is a reminder:

Unfortunately, all of these examples are basically impossible, because they imply revealing information in the image that simply isn’t there.

Turns out that there is a way to do exactly this, described by this paper: Scene Completion Using Millions of Photographs.

So, anyone want to try applying this approach to “filling in” the missing pixels in an image that has been scaled up? The results would be really interesting.

A better way to pick passwords

Hands up how many of you use the same password for more than one website? How many of you use the same password for most or all websites?

This is extremely dangerous. Let’s say you sign up for a website, and you give them your email address (perhaps a gmail account), and then give them a password that happens to be the same as your gmail password. It is now trivial for them to hack your Gmail account and spam your friends.

Even if you only sign up for reputable websites, they can be hacked, as happened recently with Gawker (update: and even more recently with LinkedIn). Anyone who used the same password both for their email and for Gawker was immediately exposed.

Additionally, let’s say you use several passwords (my previous approach). You then run into the problem that you often forget which password you used where, so you have to try several of them (potentially revealing all your passwords to an unscrupulous website).

Another annoyance is that some websites have weird requirements for passwords, often they must be at least 8 characters in length, and contain a mixture of letters and numbers. If your default passwords don’t meet these criteria then often you have to modify them somehow, or pick new passwords entirely, and then of course you can never remember which variations you used for particular websites.

So what to do? A simple approach I use, which isn’t foolproof, but which is a big improvement over what most people do, is to base my password in some way on the domain of the website I’m visiting.

For example, let’s say you are coming up with a password for plentyoffish.com. One approach you might take is to start with the last 4 letter of the main part of the domain in reverse order, capitalizing the final one. And then add an additional 4 characters that you’ll always remember – ideally a combination of letters and numbers. Here are some example passwords following this scheme (using “5yty” as the final 4 characters in each case):

Website Password
plentyoffish.com hsiF5yty
www.google.com elgO5yty
facebook.com kooB5yty

While initially it might take you a few seconds to figure out the appropriate password for any given website, with a little practice it quickly becomes second-nature.

The good thing about a password scheme like this is that these passwords will meet the criteria of even the most fussy websites, because they are 8 characters in length, I’ve never seen a website that required more than 8 character passwords. Additionally, the passwords contain a mixture of upper and lower case characters, and numbers.

Now please don’t copy the exact approach I describe here. Perhaps instead of taking the last 4 characters of the domain, take the 2nd, 4th, last, and 2nd last – or something like that. It doesn’t matter, so long as you remember it.

Of course a weakness of this approach is that someone looking at your password for their site might be able to reverse engineer your system, but this involves a lot more work on their part than if you use the same password everywhere.

If you are concerned about this you could make your system more difficult to reverse engineer by, say, incrementing the letters you take from the domain name, so “abcD” becomes “bcdE”. Of course, this is at the cost of making it more difficult to figure out the appropriate password for an appropriate domain.

Related tip: When you are signing up for new websites, don’t give them your real email address. Use a service like 33Mail.com to create a new email address for each website. That way, if they start to spam you, you can just shut them down with a single click. I’ve been using 33Mail for a few months now (since OtherInbox stopped offering this functionality), and it has worked flawlessly. I may do a separate blog entry on this soon.

Another year, another way to host my blog

Two years ago, almost to the day, I switched to using Squarespace for my blog hosting. To quote my reasons at the time:

Well, I’ve switched blog engines once again. Several months ago I switched from a self-hosted WordPress to the WordPress.com service, because my blog kept getting hacked due to security holes in the open source version of WordPress.

But WordPress.com was far from perfect, they are very restrictive about what you can put on your blog (so no tools like Woopra or Google Analytics), only a small selection of approved plugins.

So I’ve decided to switch again, this time to SquareSpace. They let you use your own embedded code, which means I can now use the analytics system of my choice, and other neat tools like Google’s Prettify script.

Unfortunately it turned out that Squarespace has shortcomings of its own, and recently they became unbearable. Basically the problem is comment spam. Almost every one of my blog entries, especially the popular ones, had tens of spam comments. On my blog as a whole there may have been as many as a thousand. Whatever mechanisms Squarespace has for preventing comment spam are evidently ineffective.

Worse, Squarespace provides no convenient mechanism to delete comment spam en-masse, which would have meant spending hours deleting them manually, only to have them reappear at some later date.

My original reasons for moving away from self-hosted WordPress was that it was insecure and kept getting hacked. At the time this was especially damaging as the server hosting my blog was also used for other things, and they could have been compromised.

Well, it seems like the security situation may have improved with WordPress, and they have made it much easier to keep up-to-date with the latest version, so I’ve decided to try it again.

Hopefully I won’t regret it :-/

Google Wave goes public but misses obvious viral opportunity

So Google Wave has opened its doors to the public – yay!

Now you can just create a Wave, and enter anyone’s email address and they will automatically be invited, and signed up for Wave if they don’t already have an account – right?  

Wrong.  If you try you get an unhelpful message telling you that they aren’t a Google Wave user:

 

How Google could miss this completely obvious opportunity to ensure viral adoption of Google Wave?

What should happen is as I described it – from my perspective it should be no different than sending them an email.  From their perspective they should get an email saying that someone wishes to have a Wave conversation with them, and give them the opportunity to easily and transparently sign up for Wave.

If, for whatever reason they don’t want to use the Wave UI, then I guess they should be able to reply and participate in the conversation through email.

Wave is designed to break down the barriers between different means of communication through its plugin architecture (eg. its Twitter support).  How then could Google not see the importance of breaking down the barriers between Wave and the primary communication medium its supposed to replace: Email?

Simple bash script to name a tab in iTerm

I like to open a lot of tabs in iTerm, the open source replacement for Apple’s Terminal.app on the Mac. Unfortunately, its easy to forget which is which, as typically they don’t get very descriptive names.

In particular, I have a few scripts I run which will automatically ssh into a remote server, and run screen. When I wanted to find one of these, I’d often have to click several tabs to see which one I wanted (since they would all, rather unhelpfully, be called “ssh”).

So I wrote a simple script, which I call “nametab”, which allows you to name the tab you are in from the command line. You just type something like:

$ nametab New tab name

If you’d like to use this yourself, here is the code:

Just put it in a file in your PATH, and ensure that it is executable (ie. do a chmod u+x nametab).

Making it easy to search Wikipedia from Firefox

This tip will allow you to type “w anything” in your firefox location bar and it will take you straight to the page describing that topic on Wikipeda.

This is far more powerful than a normal search on Wikipedia because it uses Google, although it redirects straight through so you may not even notice it.

Note, this requires Firefox 3 or later.

Here is how:

  1. Go to the Bookmarks menu and click “Organize Bookmarks”
  2. Right-click on “Unsorted Bookmarks” and select “New Bookmark”
  3. Fill in the following fields:
    Name: Wikipedia Search
    Location: http://www.google.com/search?btnI=I%27m+Feeling+Lucky&q=%s%20site%3Aen.wikipedia.org&meta= (all on the same line, no spaces)
    Keyword: w
  4. Click “Add”

Now, test it by typing “w Freenet”, you should be taken straight to the Wiki page.  Because its Google, it will tolerate misspellings, inexact word matches and so on.