Losing Is The Value Of Experimentation

The value of experimentation is in the losses. You ran the experiment because you thought it was a good idea. With no experimentation…

Originally published at: https://goodenoughstatistics.com/losing-is-the-value-of-experimentation-5876234629a1.

Losing Is The Value Of Experimentation

The value of experimentation is in the losses. You ran the experiment because you thought it was a good idea. With no experimentation system, you would have just launched it.

The wins are the cost of experimentation because otherwise, you would have rolled out and gotten the value quicker.

Why?

Let Y be the metric we’re trying to move. When you run an experiment, you get the following payoff during a standard A/B test:

After the A/B test, if the experiment loses and you continue with the status quo, you get E[Y(A)]. If the experiment wins, you get E[Y(B)].

Suppose we’re measuring performance over some time interval [0,1]. Let T in (0,1) be the length of the experiment. Here is your payoff if you run an experiment (ignoring statistical error, which just reduces payoffs —it doesn’t change anything qualitatively):

Now, let’s suppose you don’t experiment. Presumably, you think B is better than A! That’s why you spent time developing it even though A already exists. So, without an experiment, you’d launch B and get the payoff:

Non-Experiment Payoff = E[Y(B)]

Value Of Experimentation = Experiment Payoff — Non-Experiment Payoff

The Winners are the Cost of Experimentation

Suppose you are right. B is better than A. Well, then:

Why? The Non-Experiment Payoff is a weighted average of E[Y(A)] and E[Y(B)], which must be less than E[Y(B)] if we have a winner.

So, when you have a winner, you pay a cost to experiment.

The Losers are the Value of Experimentation

Suppose you are wrong. A is better than B. This is where we get value because…

So, if the time a feature spends in an experiment is small relative to the total time period we care about, i.e., if T is small, which is the usual case, then the above is approximately E[Y(A)] — E[Y(B)], which we know is positive because A is better than B.

The value of experimentation comes from the losses!

Losing is the typical case

See Table 2 in this write-up for a list of win rates at major tech companies.

The typical company in that table—a who’s who of tech companies—only wins ~20% of the time.

That is why experimentation is valuable. If we were typically right, experimentation would be an unnecessary cost. But we aren’t. We’re typically wrong. So it goes.

To make the case for an experimentation program, we shouldn’t look at Win Rates. We should look at Loss Rates. It’s a good case.

A Practical Metric

You can compute this “Value of Experimentation” metric using data readily available from your Experimentation Platform.

For a single experiment:

So, we need to decide on a value for T. The other values we can estimate from past data.

An easy solution is to fix a unit of time, say a year, and normalize the metric for your experiment to be annualized. Then, set T to be the percentage of the year that the experiment ran, i.e., if it ran for two weeks, T = 3.8%.

The other thing we need to do is take into account that the populations for our experiments are usually not the full population of our website. So, we need to downweight our Value Of Experimentation because it doesn’t apply to the full population. So, let’s say the fraction of our total population exposed to the experiment is R(i) for experiment i.

Of course, there might be interactions between experiments. Still, for a simple measure (and because interactions are actually kind of rare), we can take the weighted sum of the Value of Experimentation across experiments to get our aggregate measure (interpretation is per unit, annually in this weighting, it should be easy to transform as needed).

Enjoy!

Zach