Proportionate A/B testing

More than once I’ve seen people ask questions like “In A/B Testing, how long should you wait before knowing which option is the best?”.

I’ve found that the best solution is to avoid a binary question of whether or not to use a variation. Instead, randomly select the variation to present to a user in proportion to the probability that this variation is the best based on the data (if any) you have so-far.

How? The key is the beta distribution. Let’s say you have a variation with 30 impressions, and 5 conversions. A beta distribution with (alpha=5, beta=30-5) describes the probability distribution for the actual conversion rate. It looks like this:

A graph showing a beta distribution

This shows that while the most likely conversion rate is about 1 in 6 (approx 0.16), the curve is relatively broad indicating that there is still a lot of uncertainty about what the conversion rate will be.

Let’s try the same for 50 conversions and 300 impressions (alpha=50, beta=300-50)”):

You’ll see that while the curve’s peak remains the same, the curve gets a lot narrower, meaning that we’ve got a lot more certainty about what the actual conversion rate is – as you would expect given 10X the data.

Let’s say we have 5 variations, each with various numbers of impressions and conversions. A new user arrives at our website, and we want to decide which variation we show them.

To do this we employ a random number generator, which will pick random numbers according to a beta distribution we provide to it. You can find open source implementations of such a random number generator in most programming languages, here is one for Java.

So we go through each of our variations, and pick a random number within the beta distribution we’ve calculated for that variation. Whichever variation gets the highest random number is the one we show.

The beauty of this approach is that it achieves a really nice, perhaps optimal compromise between sending traffic to new variations to test them, and sending traffic to variations that we know to be good. If a variation doesn’t perform well this algorithm will gradually give it less and less traffic, until eventually it’s getting none. Then we can remove it secure in the knowledge that we aren’t removing it prematurely, no need to set arbitrary significance thresholds.

This approach is easily extended to situations where rather than a simple impression-conversion funnel, we have funnels with multiple steps.

One question is, before you’ve collected any data about a particular variation, what should you “initialize” the beta distribution with. The default answer is (1, 1), since you can’t start with (0, 0). This effectively starts with a “prior expectation” of a 50% conversion rate, but as you collect data this will rapidly converge on reality.

Nonetheless, we can do better. Let’s say that we know that variations tend to have a 1% conversion rate, so you could start with (1,99).

If you really want to take this to an extreme (which is what we do in our software!), let’s say you have an idea of the normal distribution of the conversion rates, let’s say its 1% with a standard deviation of 0.5%.

Note that starting points of (1,99), or (2,198), or (3,297) will all give you a starting mean of 1%, but the higher the numbers, the longer they’ll take to converge away from the mean. If you plug these into Wolfram Alpha (“beta distribution (3,297)”) it will show you the standard deviation for each of them. (1,99) is 0.0099, (2,198) is 0.007, (3, 297) is 0.00574, (4, 396) is 0.005 and so on.

So, since we expect the standard deviation of the actual conversion rates to be 0.5% or 0.005, we know that starting with (4, 396) is about right.

You could find a smart way to find the starting beta parameters with the desired standard deviation, but it’s easier and effective to just do it experimentally as I did.

Note that while I discovered this technique independently, I later learned that it is known as “Thompson sampling” and was originally developed in the 1930s (although to my knowledge this is the first time it has been applied to A/B testing).

7 thoughts on “Proportionate A/B testing

  1. Mary

    Hi Ian,
    I saw you recently discussed how to create a random number generator using the beta distribution in JavaScript. I was wondering if you’d converted that code from Python to JavaScript — and if so, if you’d be willing to share it.

  2. ian Post author

    Mary, I’m afraid not – we ended up going with a non-JavaScript solution. It shouldn’t be terribly difficult to do a JavaScript implementation (assuming you can’t find one elsewhere).

  3. Mary

    Oh, that’s good news! I have not been able to find one elsewhere — could you say a bit more about what you mean by not terribly difficult? I’m very new to this!

  4. ian Post author

    Basically you’ll want to pick a random number with a uniform distribution and pass it through the beta distributions’ “Cumulative distribution function”. If you google for it you should find a more detailed explanation.

  5. Tim

    Hi Ian,

    Thanks for the informative post. I’ve got one question regarding the random number selection.

    Do you generate one random number in the range (0,1) and then select the variation which has the highest inverse CDF value according to their respective alpha & beta?

    BetaDistribution variationA = new BetaDistribution(2,5);
    BetaDistribution variationB = new BetaDistribution(2,2);

    Double r = Math.random();
    Double a = variationA.inverseCumulativeProbability(r);
    Double b = variationB.inverseCumulativeProbability(r);

    String selected = (a >= b) ? "A" : "B";

    or do you generate a random number for each variation?
    My hunch would be to generate one random number for all distributions, but I’d like to make sure I understand the dynamics properly.


    1. ian Post author

      The approach I used generates a separate random number for each, and I’m fairly sure it will work. I’d need to give your approach more thought on whether it would work.


Leave a Reply

Your email address will not be published.