As UX designers we get asked to solve a lot of different problems for our clients. Sometimes they are looking to increase overall sales on their site, other times they just want to optimise a single form to improve specific conversion rates.
For every problem there are experts in their field who fall on either side of the argument, so there is never one clear answer.
So while I might have a very good idea of how to tackle a problem, sometimes I simply have to throw my hands in the air and admit that I’m not 100% sure this is the exact right way to do it.
And when you’re not sure of something, sometimes the best thing you can do is to test your theories.
“In theory there is no difference between theory and practice. In practice there is.” — Yogi Berra
What is A/B Testing?
A/B testing or split testing is a method of comparing two versions of a webpage against each other to determine which one performs better, as defined by Optimizely.
But the real question is does A/B testing work? And guess what? There’s no one answer to this question either!
A/B Testing doesn’t always work…
According to Dan Waldschmidt there is no real certainty from the data you’re receiving because you can’t account for external factors unknown to you:
“Can you really be assured that two datasets deliver different results simply because of the wording changes you made? Can you rule out timing, demographic nuance, personal life experience, and a list of other dynamic variables that lead to fickle outcomes?”
You can’t predict all of the circumstances that will have an impact on your A/B experiment.
That’s why it’s so important that if you’re A/B testing you have to engage with a large enough sample group to gain meaningful data.
In order to achieve statistical significance you can use an A/B split testing calculator and aim for 95% or higher confidence.
An interesting article by April Dunford of Rocket Watcher explains why A/B testing can tell you which idea works better, but not why.
“While the test might have proven the second ad generated more clicks, it did not prove the second ad “worked” better from a business perspective.”
In this analysis of these two ads, the crappy ad done in 5 minutes in Microsoft Paint was clicked on more. However, there are many reasons why users may have clicked on this joke version of the ad.
We can see from this a common pitfall of A/B testing which is using the click through rate as your result metric instead of having a specific conversion goal increase rate in mind.
Let’s say you change button text on Version B from ‘Sign up’ to ‘Get a free car’ and your goal is to increase sign ups on your website by 10%. You’re likely to see an increase in the Click Through Rate (CTR) but because you’ve set the wrong expectation you won’t see an increase in conversions to sign up.
“Have things happen when users expect them — either because of their existing expectations or because you’ve clearly communicated what to expect” Nielsen Norman Group
From this we can learn the value of setting clear conversion increase goals for our A/B experiments and we should pay no attention to high click through rates if that goal isn’t also achieved.
Alex Turnball from Groove, suggests there is no magic A/B testing formulaand that their experiments are often failures providing inconclusive data:
“… it’s important for anyone looking for a “magic bullet” to understand one simple truth: testing is mostly failure after failure. If you’re lucky, you find a statistically viable win after a few weeks. Most of the time, we see results after months of iteration.”
The case studies presented here are really interesting and show us that it’s not always as simple as it seems to get A/B testing right. Sometimes you can follow all of the rules and still end up with nothing to show for it.
The lesson I think we can learn from this is that testing, iterating and re-testing are all too important when it comes to accurate results for your split test.
Jen Havice wrote a really interesting article on Why A/B Testing doesn’t always turn out the way you’d hoped.
One of the more intriguing outcomes of her analysis is that not having enough traffic coming to your website makes it near impossible to reach statistical confidence.
“Everybody knows that testing takes lots of good traffic… but testing platforms casually disregard that critical point… How on earth is a non-Amazon business supposed to “test everything” when you can barely test your home page headline and reach confidence?”
When you have low traffic on your website it will be hard to get any clear cut answers from your A/B tests.
Cameron Chapman from Kissmetrics believes in the power of A/B testing in helping to improve your marketing efforts:
“A/B testing, done consistently, can improve your bottom line substantially. If you know what works and what doesn’t, and have evidence to back it up, it’s easier to make decisions, and you can often craft more effective marketing materials from the outset.”
Chapman explains the importance of time in testing — you need to run your tests for long enough to get enough visitors, but not so long that there are extenuating variables affecting your results. Somewhere between 1–2 weeks is ideal.
“You should already know your baseline result, which is the results you’re currently getting. You want to test option A and B against each other, but you also want to know that whichever one does better in the test is also doing better than your current results.”
We also learn here from Chapman that using your current conversions as a baseline will help you to know which of your tests are achieving your desired results.
But there are other thought leaders out there like David Kadavy with his article on A/A Testing: How I increased conversions 300% by doing absolutely nothing.
“Even if you do have a large enough sample size, you’re bound to get the occasional “false positive” or “false negative.”
Meaning that you could make the completely wrong decision based upon false information.
I love this article because it completely dispels the idea that A/B testing will be the answer to all of your business problems. It also poses an interesting theory for me a designer:
Clients often ask to see two or more options and I really dislike this kind of thinking as a designer. Why would I split my time in half working on two OK versions of something than putting all of my energy into tweaking and iterating on the best possible version of it.
“Unless you’re someone who is properly trained, and really understands statistics, you should be wary of running tests.”
I see now that A/B testing isn’t for everyone. Statistical analysis isn’t everyone’s cup of tea and unless you’ve had training you might find it hard to understand anything in the sea of numbers you get back.
So should I A/B Test?
Take what you want from the articles and observations above.
What I’ve learnt is that sometimes A/B testing can help you to increase your conversions, sometimes it fails and that it’s not for everyone.
My advice would be to let your UX designer put their time and effort into coming up with their best possible solution first. Launch this and get your baseline conversion rate from it — I suggest at least 6 months to gather a solid baseline.
Then run some A/A tests before you even think about starting any A/B testing.
Finally, when you’re happy to start your A/B testing make sure you follow these golden rules:
- Establish a current conversion rate baseline
- Set a specific conversion increase goal
- Ensure you have enough visitors coming to your site
- Run only one test at a time
- Run your test for at least one week
- Use calculators to ensure you have statistical significance
- Tweak and re-test if the test fails
Originally published at blog.strata3.com.