Category Archives: Programming

Scala: The best of both Ruby and Java

Many programmers love Ruby, they just can’t get enough of it. Its probably one of the most aggressively evangelized languages since Java first came on the scene. They generally cite features that include a flexible and extensible syntax, closures, and how concise and expressive the code can be.

For example, you can create a Map (Ruby calls them “hashes”, even though a hashtable is only one way a map can be implemented) with a simple syntax like:

numberMap = {"one" => 1, "two" => 2, "three" => 3}

The Java equivalent of this is far more verbose:

Map<String, Integer> numberMap = new HashMap<String, Integer>();
numberMap.put("one", 1);
numberMap.put("two", 2);
numberMap.put("three", 3);

The Java version does have one advantage though, which is that we’ve told the compiler that numberMap uses Strings for keys, and Integers for values. If we try something like:

numberMap.put(4, "four");

The compiler will complain, or if you are using an IDE like Eclipse, you will be told there is a problem almost the moment you type this. This is called “static typing”, or “compile-time type checking”. Ruby’s approach is called “dynamic typing” or “run-time type checking”.

In Java you must painstakingly tell it the type of each variable you use, and this is one of the main reasons Ruby is more concise and easier to read than the equivalent Java.

Java has another advantage, which is that it is far faster than Ruby. In some benchmarks Ruby requires over 15 times as long to perform a CPU-intensive task as Java (and this has caused real life problems for some Ruby users).

So where does Scala come in? Lets look at the map example in Scala:

var numberMap = Map("one" -> 1, "two" -> 2, "three" -> 3)

You will notice that it looks very similar to the equivalent code in Ruby, but there are some important differences. In particular, just like Java, the Scala compiler knows that numberMap uses Strings as keys, and Integers as values. Unlike Java, you didn’t have to tell it this, it just figured it out for itself! This is called “type inference”.

This means that if I try adding a new key-value pair to numberMap, but using an Integer for the key, and a String for the value, Scala will complain as soon as you try to compile it (or your IDE will warn you immediately). With Ruby, it is only likely to cause a problem when you run your software and try to retrieve this key and value from the Map and get an Integer and a String, rather than the String and the Integer that you were expecting.

It is hard to overemphasize what a timesaver compile-time type checking can be, it eliminates a whole class of bugs that would otherwise occur at runtime, and Scala brings you the benefits of this, without the verbosity.

Scala also has a bunch of other nice Ruby features, which Java lacks, including closures, and a very malleable syntax that makes it well suited for “domain specific languages”. It has all of these features, and combines them with the benefits of static typing.

But there is more. When you compile some Scala code, it is compiled to Java class files, and can be run on a Java Virtual Machine just as if it was compiled Java code. This means that Scala can go wherever Java can go, it can also use any of the vast number of libraries that have been built for Java, and it can take advantage of today’s very efficient Java runtime environments. Because of this, compiled Scala is almost as fast as compiled Java!

Scala is still quite young. Its own libraries are impressive (especially the scala.actors library), but I’ve found one or two bugs already (in the JSON parser lib), suggesting that its still somewhat immature.

Nonetheless, for anyone yearning for the advantages of both Ruby and Java in one language, you should definitely take a look!

Netflix Prize – is RMSE a good measurement?

The Netflix Prize is a pretty cool competition where Netflix has made available 100 million user ratings of movies, and you must use these to try to predict what movies users will like. The idea is to stimulate innovation in the field of “Collaborative Filters”. I won’t go into detail, you can read more at the link provided.

One important question is how one measures the success of a collaborative filtering algorithm. Netflix has opted to look at the average difference between the what algorithm predicts a user will rate a movie, and what they actually rate the movie. To be more precise, they look at the square root of the average of the squared differences (aka Root Mean Squared Error or RMSE). This is similar to a normal average, except it is swayed more by larger values.

A naive approach that just looks at a user’s average rating, and the average rating on each movie gets an RMSE of about 1.02. Netflix themselves score around 0.95, and if you can get it down (lower is better) to 0.85, you are in line to win $1 million. Currently, the best performers are around 0.87, but as they get closer to the goal, progress gets much more difficult.

Like seemingly every other geek on the planet, I decided I would try my hand at this. My goal was not to win, since most of the current top entrants work by blending predictions from a bunch of different algorithms. While this is well within the rules of the competition, it does rely much more on perspiration than inspiration, and thus I find it rather dull. Also, I would question whether the approach is fast enough for practical use (something the competition doesn’t consider).

My attempt is not a terribly original approach, I’m using a perceptron learning algorithm (perceptrons are simple precursors to what are today known as “neural networks”), and my best score to-date is 0.905. I consider this respectable (especially given the simplicity of my approach and that its about two weekends of effort), but its not going to win any time soon (its around 300th on the leader-board).

Then I started to think about what it means for an algorithm to score 0.905, or 0.95, or 0.85, or 1.02 – what bearing do these different scores have on the actual user experience?

Well, user experience is a hard thing to measure, but if we make some simplifying assumptions, we can get an idea. Most collaborative filtering algorithms are used to choose things (in Netflix’ case the things are movies), predict a user’s rating of those things, and then present the things with the highest predicted ratings to the user.

If the user finds the things they are presented with to be appealing, then they should rate them highly, if not, they won’t. Thus, if we look at user’s actual ratings of things with the highest predicted ratings, we should get a better idea of how much difference the algorithm makes to the user’s experience.

Here is a graph which shows actual versus predicted ratings for two algorithms (click to see it full size), the top one is my algorithm half way through its training with an RMSE of 0.913, the bottom one is the algorithm based on simple averages of user and item ratings, this has an RMSE of 1.022.

Click for full size

You can see the predicted ratings along the bottom, and for each, you can see the proportion of ratings of 1 star, 2 star, 3 star, and so on. As one would hope, when the algorithm is predicting a rating of 1, the majority of the actual ratings are indeed 1, and when the algorithm is predicting a rating of 5, the majority of those ratings are actually 5.

But when you compare the two sets of predictions you see something a bit surprising, there isn’t really a huge difference between the two algorithms, even though one is using a fairly naive average-based approach that doesn’t even consider a user’s individual likes and dislikes, and the other is an algorithm that does significantly better than Netflix’ own approach.

The pie charts at the right show the proportion of actual ratings when the predicted rating is over 3.5 stars. This is intended to approximate what the average user might see.

For the smart algorithm, the actual ratings over 3 stars represent 98.9% of those ratings that were predicted to be over 3.5 stars.

But for the naive algorithm, the percentage is 94.4%, a 4.5% difference. Is 4.5% really that impressive? I think a user might have a hard time noticing such a difference in performance. I’m not the first to express concerns of this nature.

You can find the raw data in the form of an Excel spreadsheet here:
Netflix RMSE effectiveness data spreadsheet