Table of contents:
“The A/B test didn’t show an improvement, let’s move on to the next test.”
Aaaahhhh! What a waste…
When evaluating an A/B experiment and drawing conclusions we need to dig deeper than this. At PurpleFire, we work on the principle that there are learnings to be taken from every A/B test. Whether the variation or variations break all previous records or take a soul-crushing dive, there is always something of value to take forward.
Caveat: The only time that you can’t take any learning is when the data is corrupted in some way or there is a known problem with the way the experiment was designed. This is something you’ll need to resolve quickly and run the experiment again.
I want to share two examples of good post-test analysis and highlight the types of questions you should be answering to plan sensible next steps.
Scenario 1: The nose-dive
When we start working with new companies, we often hear things like: “I’m not sure people are willing to take risks with key parts of the website or experience.” While I understand this thought process, it’s only really valid if you don’t intend on making any other changes. Otherwise, we’re really saying: “We’d rather just guess and never find out if our changes have a negative or positive impact.” Doesn’t sound too smart does it.
However, it’s inevitable once you’ve committed to A/B testing that your variation does not perform as you expected and you need to justify your work. The obvious first step is to highlight how bad things could have been if the changes had been made without A/B testing first. You also need to make sure that you analyse beyond headline metrics. It may be that the variation performed poorly but the results were not normally distributed. For example, did the variation performs well for new users, but frustrate returning visitors? Did performance vary by screen resolution? Carrying out some more detailed analysis and answering these types of questions can provide a range of learning for further testing (if you have sufficient sample sizes to allow this).
The next step is to make sure that you squeeze as much learning from a failed test as possible as in most cases we find that a losing test leads to a range of new hypotheses for testing:
e.g.
- The idea was good but the execution (copy? design? code? speed? etc.) was flawed.
- The messages are important, but we shared them at the wrong time.
Scenario 2: No significant change
This can be really frustrating. As our friend Michael Aagaard quotes:
“It is not the magnitude of change on the “page” that impacts conversion; it is the magnitude of change in the “mind” of the prospect.” –Dr. Flint McGlaughlin
This will come down in part to learning what type of changes affect user behaviour and to some extent part of the reason we test; because we want to learn about the impact of changing certain elements. Therefore it is inevitable that if we test widely some elements will prove to be more or less important than we expect.
One area where we sometimes see overlooked opportunities is when you have statistically significant results that haven’t had any observable changes to top-level metrics.
However, sometimes this can provide really valuable insight. For example, on one site we worked on we tested simply hiding/showing a block of editorial content on the homepage. After running the experiment we saw that there was no great change in our key testing metrics. However, we realized that actually a lot of time was invested in creating content for the homepage editorial slots. In this instance, we didn’t change the primary CRO metrics, but we were able to reassign the time spent preparing content to more valuable tasks for the content team.
Testing next steps
Once we conclude an experiment the primary question is usually, “shall we maintain control or implement the winning variation?” But there are a series of secondary questions that we ask to squeeze value from each experiment, such as:
- Are there follow-up tests that we should run?
- Is there design or dev work that could be leveraged in another test?
- Are there elements of the winning variation that are partially validated but could be applied to new contexts, e.g. elsewhere on site? Marketing content? etc.
- How should we communicate these results/learnings? Blog post? Internal Comms?
In summary
When you run experiments you need to evaluate the results with care and attention. Don’t dismiss poorly performing experiments. Dig for insights and extract as many learnings as possible