Digg to deploy recommendation engine – but will it work?

According to TechCrunch, Digg is about to deploy a recommendation engine (aka “collaborative filter” or CF).  I’m rather surprised that its taken this long, I spoke with Kevin Rose about it way back around August 2006 – so they’ve been working on this for almost 2 years!

From the description, it sounds like a fairly standard user-based CF, not unlike Daedalus, the CF I built for Revver (which we deployed on Reddit for a while before my friend Aaron Swartz’ departure).  It works by finding users who have rated the same things as you similarly to you, and recommending things they liked, but which you haven’t seen yet.

The problem with this approach, which we found with Daedalus, is that for the similarity between users to be meaningful, it has to be based on a lot of overlap between your ratings, and the ratings of other users.  Typically this won’t be the case except for extremely committed Digg users.  This is made worse by the fact that, apparently, they will only be looking at user data from the last 30 days.  Its really going to be hard to infer meaningful relationships between users that way.

This is why I went for a different approach with SenseArray, because I specifically wanted it to perform well without large amounts of rating data.  It works by, rather than just looking for similar users, finding a mathematical formula that successfully predicts your behavior.  This has the potential to allow the algorithm to have a much more nuanced understanding of your interests, and it can do this with much less overlap between user rating behavior.

The other major difference with SenseArray is that it augments simple user-rating data with other metadata about users, such as their geographic location, their referring website, and their choice of operating system and web browser.

Leave a Reply

Your email address will not be published.