How can A.(x-B)^2+C help you avoid a curry overdose?

picture of curryYou might love curry, but no matter how much you love it, you probably don’t want to eat it every night.  We’ve spent the last few months working on a powerful collaborative filter called SenseArray.  A collaborative filter’s job is to quickly figure out what people want, and then give it to them, but most  don’t account for the fact that people need some diversity.  Indeed, the Netflix Prize (pretty-much the olympics of collaborative filters), doesn’t even measure it!

One fundamental question we need to answer is just how much diversity users want.  Too little and users get bored, too much and the quality of your recommendations is too low, so it’s a balance.  This will be different for different applications of SenseArray.  So how do we figure out how much diversity SenseArray should use in its recommendations in any particular situation?

Well, the high-level answer isn’t complicated, it’s “trial and error”.  You try a certain parameter, and you measure the aggregate user response to that parameter.  If more users like what they see with parameter A versus parameter B, then you go with parameter A.

There are a few problems though.  These patterns of user response only become meaningful after you’ve collected a lot of data, so it can take hours, perhaps even days to collect enough data to properly evaluate a particular parameter.

Also, since you are testing this on a live system, you really want to settle on a good parameter as quickly as possible so that users are getting the best possible experience as quickly as possible.  How can we do this?

We should start by getting some data.  Let’s say that we know our parameter must be between 0 and 1.  Initially, since we know nothing, let’s run 3 experiments, testing performance when the input is 0.0, 0.5, and 1.0.

Graph of a curveSo now we have these initial 3 samples, what do we do with them?  I’ll start by making an assumption: we expect there to be an input value at which performance is optimal, where performance gradually deteriorates the further you get from this value.  This means that if you were to graph performance as the parameter changes, where a lower value means better performance, it would look something like the image to the right, a U-shaped curve.

Turns out that you can produce a simple U-shaped curve with the equation:

You can adjust A to change the steepness of the slope, B determines the middle of the curve (the lowest point), and C is the height of the lowest point.

So the next step, given our initial 3 samples, is to find values for A, B, and C that will produce a curve that most closely fits these sample points.  The technique used to do this is a rudimentary genetic algorithm optimization.  We choose some initial values for A, B, and C.  We then measure how good the “fit” is by measuring the output of our curve equation for the sample points, and squaring the difference with the actual values we got for these samples.  We then adjust A, B, and C, measure the fit again, and if it is better, we keep the new values, if not we revert to the old values for A, B, and C and try again.

Typically this approach will find a good curve fit in just a few hundred iterations.  There are undoubtedly  more efficient ways to do this, but this is something we need to do quite infrequently, so a “good enough” approach is just fine.

You will recall that value B represents the lowest point in the curve, this means that whatever our curve-fitting told us B should be, is our guess for the input that will produce the best output, cool!

So what do we do now?  Well, chances are that B won’t be a perfect guess, after all it’s based on only 3 samples, which probably incorporate a lot of noise, and our simple curve almost certainly isn’t an exact match even if there were no noise.  But it has given us an approximation, its given us a clue as to where we should focus our attention next.

We started by testing across the entire range of possible input values, so we should now narrow this range.  I’ve found that narrowing it to 1/4th of its previous size works well, although this is a fairly arbitrary number.  We then repeat the process using this new range centered on our best guess from the previous iteration.

Another refinement is to track whether our guesses appear to be “homing in”, meaning that each successive change in our best guess is smaller and smaller.  While the successive changes are getting smaller, we continue to decrease the range by 1/4th.  If a change is larger than the previous change, then we increase the range again – I arbitrarily chose 1/3rd, and it seems to work (I didn’t want to choose 1/4th because then we’d just re-test the same set of samples we just tested).

So lets see how this works in practice.  Lets say we have a mystery function, you and I know its defined as follows:
[sourcecode language=’java’]
public double mysteryFunction(double input) {
return Math.sin(input+4)+1.1+(random.nextDouble()/3-0.11);
}
[/sourcecode]
This is a simple sine curve with some random salt thrown in to make things a little more interesting (real sample data is rarely as neat as a clean curve). The curve itself (without randomness) looks like this:

Untitled

We will focus on the part of the curve between 0 and 3, because this only has a single lowest point.

I’ve created a class called ParameterOptomizer which implements the algorithm described above.  Here is how we use it:
[sourcecode language=’java’]
public void testMystery() {
// Create our ParameterOptimizer object and tell it that valid inputs
// are in the range 0 to 3
ParameterOptimizer po = new ParameterOptimizer(0, 3);
// Output the initial best guess. Since we don’t have any samples yet,
// this will be mid-way between 0 and 3
System.out.println(” Best guess: “+df.format(po.getBestGuess()));
// Now we run a few tests
for (int x = 0; x < 3; x++) {
// Get three sample points from the ParameterOptimizer
List probes = po.getNextSamples(3);
for (Double probe : probes) {
// For each of these points, obtain the output provided by
// the mysteryFunction. Note that in real-life scenarios,
// this might be a time-consuming process
double output = mysteryFunction(probe);
// Tell the ParameterOptimizer what we learned
po.addSample(probe, output);
// And output the information so we can see what is going on
System.out.print(” “+df.format(probe)+”:”+df.format(output));
}
// Find out the current best guess, along with what we predict the output will be
// at this point
System.out.println(” Best guess: “
+df.format(po.getBestGuess())+”:”+df.format(po.bestFit.getYAt(po.getBestGuess())));
}
}

private double mysteryFunction(double input) {
return Math.sin(input+4)+1.1+(random.nextDouble()/3-0.11);
}
[/sourcecode]
Hopefully this code is relatively self-explanatory.

So how does it perform?  Well, here is the output from a typical run:

 Best guess: 1.50
0.00:0.51 1.50:0.56 3.00:1.70 Best guess: 0.72:0.39
0.34:0.28 0.72:0.12 1.09:0.34 Best guess: 0.72:0.28
0.63:0.03 0.72:0.10 0.81:0.32 Best guess: 0.72:0.23
0.44:0.33 0.72:0.09 1.00:0.11 Best guess: 0.76:0.21
0.00:0.47 0.80:0.28 1.61:0.41 Best guess: 0.76:0.21
0.55:0.17 0.76:0.32 0.97:0.35 Best guess: 0.73:0.22
0.10:0.37 0.73:0.07 1.36:0.48 Best guess: 0.70:0.22

On the first iteration, using just three samples, we get extremely close to the lowest point on the curve:

Untitled

Here the red circles indicate the sample points, and the blue X indicates where the algorithm guesses the lowest point will be. As you can see, it slightly overestimates the actual height of this lowest point, but this is because two of the samples are slightly above the curve due to the random salt.

On the second iteration it still thinks that 0.72 is the optimum, but its estimate of where this optimum is has improved, going from 0.39 to 0.28:

Untitled

You may wonder why it is still guessing so high for the optimal point given that these three samples would suggest it is much lower.  The reason is that when it fits the curve, it will use all available samples, not just the three most recent.

Also note that the three sample points used are now much closer to the optimal than in the first iteration. This means that our “guinea pig” users should be getting a better experience than on the first iteration.

I’ve made this code available under the Lesser GNU Public License, so you are welcome to take it, play with it, and perhaps improve it.  All I ask is that if you do improve it, you let me know about your improvements.  Browse the code here, get the details for pulling it from Subversion here.

Leave a Reply

Your email address will not be published. Required fields are marked *