site-logo
  • HOME
  • SERVICES
    • Conversion Rate Optimization
    • Web Development
    • E-commerce development
    • How We Work
  • PORTFOLIO
    • Development Work
    • Landing Pages
  • ABOUT US
  • OUR BLOG
  • CONTACT
Get in touch
  • All Categories
  • Conversion
  • CRO
  • Team
  • Usability
  • User Centered Design
  • User Research

Conversion Rate Optimization Statistics – What does mean?

November 6, 2022 | All Categories, CRO  | Author: Daniel Chabert
Jargon-buster-website-header

This is the first in a series of posts that aim to make clear a few commonly used phrases in conversion Rate Optimization Statistics and debunk a few myths about what you can and can’t infer from your test statistics. This article uses examples from A/B testing tool Optimizely but the explanations can apply to any statistics in testing.

Table of contents:

  • 1 Part one: confidence intervals & confidence limits in testing
    • 1.1 Wait…what?!
    • 1.2 That’s all well and good but why do we need these?
    • 1.3 So we can estimate how all our users will act, what next?
    • 1.4 How can you say that?
  • 2 Part two: statistical significance

Part one: confidence intervals & confidence limits in testing

optimizely_360-1

 

Ever seen these signs on an A/B split test and wondered what they mean? The values to the right of the conversion rate are what we call ‘confidence intervals’. These are the values that (when added and subtracted from the test conversion rate) give us confidence limits. Confidence limits are basically a range wherein we can say with relative safety the ‘true’ value of the conversion rate lies.

Wait…what?!

Okay, let me put it another way. Let’s assume we’re testing to a 95% confidence level, the confidence limits on ‘Variation #1’ above mean; “I’m 95% confident that from this test we can say the exact value of the conversion rate of Variation #1 lies between 2.87% (which is 3.26% – 0.39) and 3.65% (which is 3.26% + 0.39)”

That’s all well and good but why do we need these?

The reason we need confidence intervals and limits is because it would be impossible to run a conversion optimisation test on the whole of population. Therefore, when we test a sample of the population, we can’t assume that they will behave in a way that represents the whole population. What we can assume is that the sample population provides an estimation of how the whole population would behave.

So we can estimate how all our users will act, what next?

Now we can start comparing the confidence intervals of the ‘original version’ and the ‘variation version’. If there is no overlap of the confidence intervals between the original and the variation (as with our first example above), it is very safe to assume that ‘Variation #1’ will increase the conversion rate over the original, assuming we have tested with a suitable sample size and let the test run for (in our opinion) at least two ‘business cycles’.

How can you say that?

Take a look at the below. This is a (very crude) illustration of the results. As you can see, the most extreme low (or confidence limit if you’re being fancy) of the variation is still higher than the most extreme high of the original. So even in the unlikely event, the true value of the original and variation were the high and low respectively, the variation would still win. Now, this is a fairly extreme example but gives an idea as to how you can use confidence intervals to interpret your test data.

*absolutely, in no way, to scale

The great thing about confidence intervals is that they provide an alternative way of visualising your test results but also provide extra information on the largest and smallest effects you can expect.

Do you use confidence intervals and limits? If so, leave a comment and let us know how you use them. We’d love to hear from you.

Part two: statistical significance

Is-that-test-Statistically-Significant-

Probably the most used and most contested CRO term of them all, is statistical significance. How many times have you quoted a test running to ’95% statistical significance? 95 times? 100 times? But what does it actually mean? Is it a good thing?

Firstly, before we delve into the glossary, a brief introduction. In a lot of CRO testing, we agree that a test is significant (i.e, the results are significant enough for us to assume the probability of it being a fluke is low enough to accept the test result) when it reaches 95% statistical significance and 80% statistical power. This will make more sense later on.

So, let’s dive straight in. Statistical significance measures how often, when the variation is no different from the original in terms of results, the test will say so. So when you test to 95%, you are basically saying if I ran this test 20 times, 19 times it would show no difference (assuming there is no difference).

Wait, what?!

Okay, let me put it another way. Say we have tested the original landing page of a company, for namesake we’ll call it Frank’s Merchandise. We know that Frank’s homepage converts 3% of traffic into sales. Say we then test a variation of the home page, where we know the true value of that conversion is also 3%. Despite them being the same, when we tested them against each other 5% of the time the test would show that one version was better, which is also called a false positive result.

Great!

Or is it? One in 20? Near twice as likely as rolling a double six when playing monopoly? Put another way, if we tested two versions of exactly the same webpage against each other, 1 out of every 20 tests would say that one variation outperformed another.  Is that good enough for us?

But what’s statistical power, and what’s the difference between the two?

Now, statistical power is kind of the opposite of statistical significance. Statistical power measures how often, when a variation is different to the original, the test will pick up on it. The industry standard is 80% statistical power, which again, basically means if I ran a test where I knew categorically there was a difference in conversion between the original and the variation, the test would pick up on that significant difference 8/10 times. So 2/10 times the test would fail to pick up on the difference between the two versions, even though there is one. This is also called a false negative.

But surely statistical significance and statistical power are industry standards set at 95% and 80% respectively for a reason?

Erm, actually no. There is actually no basis behind 95% for statistical significance, nor 80% for statistical power. They are simply the most commonly used in statistics, particularly medical statistics, although a wider range of different values is often used. One suggestion as to why statistical significance is set higher than statistical power is that it is riskier. Certainly in medical terms, it is far more damaging to implement a new drug treatment that is actually less effective than the control drug than to not implement a more effective drug treatment. But in CRO, which would you say is more damaging to a business:

Not implementing a variation that increases conversion (a false negative)

or

Implementing a variation that has no effect on the conversion? (a false positive)

I hope that’s given you more of a grounding into what a lot of CRO hypothesis testing is based around and maybe sprung up a few more questions. I’d like to leave you with two questions to ask about your business:

  • What do you hold more value towards? Never implementing a variation that doesn’t hold any difference, or always implementing variations you know will provide results?
  • With this information, would you change the levels of statistical significance and statistical power you test?
Daniel Chabert
Daniel Chabert is a leading expert on Conversion Rate Optimization. He has worked in the industry for 10 years. He has sold 3 eCommerce businesses for 7 figures combined. Today he is the CEO at PurpleFire.
Our 3 Recent Videos

  • Convert.com - How to start A/B testing on your Website (Webinar)

    Convert.com - How to start A/B testing on your Website (Webinar)

  • Jones Road Beauty - Let's do a Full Customer Journey Redesign in Shopify

    Jones Road Beauty - Let's do a Full Customer Journey Redesign in Shopify

  • Warby Parker - E-commerce Brand - Before/After case study - Conversion Rate Optimization

    Warby Parker - E-commerce Brand - Before/After case study - Conversion Rate Optimization

Contact us
CRO specialist
Let's discuss how we can boost your sales via CRO

Recently posted

How to Use Surveys to Boost Your Conversion Rate

[embed]https://www.youtube.com/watch?v=0dPM8IMvuxo&t=1s[/embed] Surveys are an underrated tool for boosting your conversion rate. In this article, we’ll show you how to set up surveys, what software to use, and...
Read More

10 Best A/B Split-Testing Software Compared in 2023

If you’re a conversion rate optimization marketer or just a marketer looking to expand your repertoire, you need the best tools in the market. Split testing proves your experiment ...
Read More

The Art of Persuasion: Psychology Principles and Optimisation

Seeing how psychological principles can influence website visitors in various ways is one aspect of website optimization that I find most fascinating. Within the industry, ‘persuasion’ could be seen as something of a ...
Read More

Contact us

Drop us a line and we will get back to you asap to discuss your awesome idea or provide answers to your technical questions.
Get in touch
  • Home
  • About Us
  • How we develop
  • Contact
  • Web Development
  • Blog
49 reviews
image
Copyright © 2023 PurpleFire - All rights reserved.