A/B Testing Pitfalls: How Marketers Can Avoid Costly Mistakes

November 12, 2019 | 15 minute read
Chad S. White
Head of Research, Oracle Digital Experience Agency
Text Size 100%:

Because it gives you a way to determine whether your marketing audience prefers version A of something or version B, A/B testing is powerful. And increasingly it’s easy to do. Sometimes it seems so easy that we don’t even realize that we’ve completely wasted our time, missed out on golden opportunities, or—worst of all—confidently come to the wrong conclusions. The truth is that A/B testing is only powerful if you do it right and avoid the many pitfalls that can undermine your testing. 

“Without clearly defined processes, marketers run the risk of testing just for the sake of testing, which leads to discrepancies in methodology, lack of purpose, ambiguous results, and wasted resources,” says Reed Pankratz, Sr. Strategic Consultant for Strategic Services at Oracle Marketing Consulting.

Here’s detailed advice on how to best accomplish all of that and avoid A/B testing’s many potential pitfalls:

What to A/B Test

The first set of pitfalls revolves around what marketers decide to A/B test. Here’s our advice...

1. Focus on the most impactful elements.

You can’t test everything, so direct your testing efforts toward those things that are most likely to generate significant lift. In email marketing, those elements are:

  • Subject lines (as well as preview text), with brands routinely testing short vs. long, tone, urgency, offer elements, emojis, personalization, and other elements. For more ideas, check out 6 Ways that Subject Line Writing Has Changed
  • Headlines and subheads, with brands testing length, word choice, style, size, color, and other elements.
  • Calls-to-action, with brands testing buttons vs. links; CTA text; button size, shape, style, and position; and other elements.
  • Hero images and other key images, with brands testing size, position, color composition, lifestyle vs. product images, corporate vs. user-generated images, and other elements.
  • Timing, with brands testing time of day, day of week, week of month, delay in send after trigger for automated emails, and other elements as they try to answer the age-old question, When is the best time to send an email?

For instance, one of our financial services clients based in New York tested time of day and day of week around communications about corporate payment solutions, says Sudha Bahumanyam, Senior Principal B2B Consultant at Oracle Marketing Consulting. “Despite common conjecture to avoid weekends when emailing corporate audiences, their audiences were more responsive on Sunday evenings than in the middle of the day on Tuesday or Wednesday,” she says. “The results for open rates were eye-opening and definitely debunked what we had originally summised as a given.”

2. Don’t just test the easy things.

Subject lines are by far the most A/B tested email element—largely because tools make it really easy to test them and it only requires some additional copywriting. It can be significantly more work to test images or design arrangements because of the additional design work and test setup. A/B testing automated emails can also be significantly more involved, but the payoffs can be much greater than testing broadcast emails because of the much higher ROIs generated by automated emails.

“Testing definitely isn’t just for one-time promo emails,” says Helen Lillard, Principal B2B Consultant at Oracle Marketing Consulting. “Set up audience journey programs like welcomes, nurtures, and reengagements with testing so that you can test your content, offer, subject line, and other elements for weeks at a time and get really solid results.”

3. Don’t forget about your target audience when testing.

Sometimes you’re trying to affect a change in your overall audience, but sometimes you’re only looking to affect your most valuable clients or inactive subscribers or customers in California, for instance. To generate the highest return on the test, be sure to clearly define your target segment, says Autumn Coleman, Cloud Technology Manager for Oracle Eloqua.

“For example, I helped a client build a multi-channel content marketing test with landing pages, video content, and emails using a key account strategy,” she says. “They developed branded microsites for specific top-tier accounts that include case studies and content relevant to that company. Landing page conversions trigger emails that are personalized for each contact’s role within their company and include personalized video messages from their account manager powered by Brightcove. The landing pages included forms that were tested in length and form field pre-population. The emails were tested by subject lines, send times, and call-to-actions. And the video content was tested by length, message, and call-to-action. The results were outstanding and generated millions in revenue and additional pipeline for the customer!”

Establish Clear A/B Testing Goals

Casual, ad hoc testing is almost a guaranteed way to waste your time. You’re unlikely to get the kinds of results that you want and are likely to be led astray by misleading results. To avoid that fate, consider taking the following steps...

4. Understand whether you’re testing to learn or testing to win.

Both have their place in a marketer’s arsenal, but they have very different approaches during the setup and analysis. For instance, when you’re testing to learn, you’d test themes, tactics, layouts, and other things that you can apply broadly to future campaigns. Testing to learn examples would include testing whether your audience responds better to lifestyle images or product images, or if they respond better to dollar-off or percentage-off offers, for example.

However, when you’re testing to win, the results of the campaign are usually only applicable to that campaign and can’t be applied to future campaigns, says Wade Hobbs, Senior Strategy Consultant for Strategic Services at Oracle Marketing Consulting. 

“This type of testing is perfect for bigger campaigns—think Black Friday, product launches, and other high-priority sends—where an incremental lift of 10% to 20% in open or click-through rate can be material,” he says. “The benefit of this type of testing is that the marketer can be creative and try a variety of unique subject lines, offers, or design elements to give the individual campaign the best possible outcome.”

5. Understand whether you’re testing to find a new local maximum or a new global maximum.

In addition to knowing if you’re looking to learn or win, you need to know if you’re looking for an incremental improvement or a paradigm shift. That’s the difference between seeking a new local maximum and new global maximum, respectively.

For instance, testing different colors for your CTA button will help you potentially find a new local maximum because you’re making a small tweak to your approach. But testing an entirely new email module design could help you find a new global maximum because you’re radically changing the experience. The former represents a low risk–low reward scenario, while the latter represents a high risk–high reward scenario. Both have their place. But don’t make small changes and expect to find radically superior results.

6. Have a clear hypothesis.

If you test enough changes you’re likely to stumble across a few high-impact changes, but you’ll have much more success if you know what change in subscriber behavior you’re hoping to see and why the change you’re A/B testing is likely to cause that desired behavior. Have clear intention.

7. Be clear about what a victory will mean.

If your challenger wins, what change will you make in your emails? In some cases, it will mean a permanent change, but in others it will simply mean that you’ll use the winning tactic occasionally or rarely—rather than never. However, sometimes everyone isn’t on the same page about this, says Nick Cantu, Senior Art Director for Creative Services at Oracle Marketing Consulting. 

For example, when testing subject lines, when one shows statistical significance, some will want to use that format for all sends going forward,” he says. “If first-name personalization wins, they will plan to use that on all emails, which can quickly become overused and stale to the audience. Make sure to use your learnings effectively and that your team understands that plan.”

8. Get buy-in to make changes based on the results of your A/B tests.

Being clear about what a victory will mean is essentially meaningless if you don’t have the buy-in to follow through on those changes. Depending on the test, you may need buy-in from executives, your email marketing manager, your brand marketers, or others. This is a critical step that I see some of my clients miss, says Jessica Stamer, Principal B2B Consultant at Oracle Marketing Consulting.

“For example, I had a client with a vigorous A/B testing strategy and each month I would prepare a presentation to share the results,” she says. “The feedback never changed because they were never able to take any action. The tactical to-dos were easy—make emails more personal, use more text and fewer pictures, shorten the subject lines, etc.—but they weren’t able to get their internal people and processes aligned to make the strategic shift.” 

Executing an A/B Test

The next set of testing pitfalls has to do with executing your tests, so you have reliable, meaningful results. This is another make-or-break moment. Our advice is to...

9. Test One Element at a Time

“A common misstep in A/B testing is that marketers try to test everything at once,” says Bahumanyam. “A/B testing is testing one variable and providing accurate and actionable insights.”

Many brands simply don’t have enough volume to test more than two variables and generate statistically significant results (which we’ll talk more about later). That said, if you’re a brand that has millions of subscribers, then you’re likely a good candidate for multivariate testing (supported by Oracle Responsys), which allows you to test multiple variables at once. Just be careful that you understand the data demands, says Bahumanyam.

“If you’re testing four variables, for example, it would result in 4 factorial (1x2x3x4), or 24, possible combinations,” she says. “It’s essential to wear your statistician’s hat.”

But if you have enough volume, multivariate testing allows you to get results much faster, especially when testing different combinations, says Lillard. “For instance, having your preview text work well with your subject line is important,” she says. “Multivariate testing is an easy way to incorporate both elements into a test so you can find the most effective combination.”

10. Use test audience segments of similar subscribers.

To ensure a fair comparison between how version A and version B perform, you’ll want the test groups that receive each version to be composed of the same kinds of subscribers, whether that’s new subscribers, subscribers who are customers, or subscribers in a particular geography, for instance. 

What you’re testing will influence which demographics and other characteristics you control for. In fact, your audience segment may be central to what you’re testing, says Coleman. “Often marketers begin testing email content tactics such as subject line and call-to-action variances,” she says. “I recommend beginning the test by analyzing a segment or intended audience first. When building your marketing segment, include behavior and profiling data to generate the greater return for the test.”

A quick word about running tests on new subscribers: If you’re testing a new email layout or newsletter format, your existing subscribers have the emotional baggage of having experienced your previous design. This will typically skew your results as some of these subscribers reject the changes because they’re “not used to them.” Your new subscribers don’t have such baggage, which makes them the ideal group to test such changes.

11. Use test audience segments of active subscribers.

Unless you’re testing reengagement, re-permission, or other campaigns that explicitly target inactive subscribers, you’ll want to ensure that your A/B testing groups consist of active subscribers. Otherwise, if version A goes to a group of subscribers who are much more active than the group that got version B, then version A would likely “win” for reasons that have nothing to do with what you’re testing.

12. Use a large enough audience to reach statistical significance.

Ensure that your test groups are sufficiently large to guarantee that your test results will be statistically significant. Most often, marketers strive to reach a statistical significance of 95%, which means that there’s only a 5% chance that the outcome of your A/B test is the result of pure chance. Obviously, you want to know that the winner of your test won because it was better, not just lucky, so reaching this threshold is critical.

Your ability to reach this threshold will determine whether you’re able to do a 10/10/80 or 25/25/50 split—where you send version A to a small percentage of your list, version B to equal percentage, and the winner to the remaining portion—or a straight 50/50 split, where half gets A and half gets B and you use the learnings in future campaigns.

If you’re working with agencies or other third parties, you’ll definitely want to make sure that everyone is on board with what’s required so your tests aren’t invalidated, says Lizette Resendez, Associate Creative Director and Copy Director for Creative Services at Oracle Marketing Consulting. 

“For instance, we had a Creative Services client that we designed multiple creative elements for so they could test them—only to learn after deployment that the client’s agency and ESP didn’t send the test to a large enough audience size to get a clear reading,” she says. “If you don’t have the right people, technology, and strategy in place, it’s just a waste of time and money.”

Consider using a A/B test size calculators, such as AB Testguide, to ensure you’re reaching statistical significance. The testing functionality that’s native to your digital marketing platform is also likely to have a statistical significance calculator.

13. Use holdout groups, when appropriate.

Sometimes not sending an email is better than sending an email. The only way to know if that’s the case is to use a holdout group, which is a group of subscribers that you don’t send emails to. You rotate which of your subscribers are in the holdout group to avoid irritating them.

Holdout groups are particularly useful in testing automated emails. Is adding that third email to your welcome series moving the needle? Is that second re-permission email helping retain more subscribers or simply hurting your deliverability? Is that second browse abandonment email increasing conversions or just annoying subscribers? Having a holdout group helps you test those scenarios.

Determining the Winner of an A/B Test

This is, sadly, where too many marketers lose their focus. They’ve done everything right to this point—having chosen the right things to test, having established clear goals, and having executed the test properly. Then they get the results and misinterpret them. Here’s our advice for how to avoid this pitfall...

14. Choose a victory metric that’s aligned with the goal of your email. 

Let’s start by saying that unless you’re running a reengagement campaign with the aim of generating any sign of life, your victory metric for an email test will be something other than the open rate. That’s true even if you’re A/B testing a subject line, because the goal of your email is never to just get an open. It’s to get clicks or, even more often, to get conversions and sales. And your subject line has a critical role in deciding who is reading your email’s body copy and even who’s clicking through to your landing pages.

In an ideal world, most email tests would likely use conversions or sales as victory metrics. But the number of subscribers who will make it to the bottom of your funnel is relatively low, which will make reaching statistical significance tough if you don’t have a really large list and healthy conversions. Many brands with smaller email programs will want to compromise and use clicks as their victory metric.

But, again, compromising further and using opens is a bad idea, chiefly because opens don’t have a strong correlation with conversions, unlike clicks. As I say in our post on Using AI Subject Line and Copywriting Tools Successfully, “Being quickly certain about the influence of an uncertain indicator is not the path to becoming a data-driven marketer.”

15. Don’t ignore negative performance indicators.

In addition to keeping your sights focused on success metrics further down the funnel, keep an eye on negative performance indicators like unsubscribes and spam complaint rates. Churn reduces the effectiveness of your future email campaigns, so even an email that performs well in terms of conversions may be unwise if it causes a spike in opt-outs, says Tommy Hummel, Senior Strategic Analyst for Strategic Services at Oracle Marketing Consulting.

“When reviewing test results,” he says, “if one version ‘wins’ according to the primary KPI but loses to a KPI further down the funnel, it might be worth reconsidering the test results and deciding which matters more. Solve for the full funnel.”

16. Don’t dismiss inconclusive tests.

Sometimes, even when you set up your test correctly, you can end up with inconclusive results. Don’t simply ignore those and move on, says Hummel.

Often an inconclusive result is itself a conclusion,” he says. “What this might mean is that the elements the marketer thought would matter to their customers or prospects don’t actually matter much. For example, we’ve had clients test different call-to-action language that ended up yielding similar click and conversion rates. While some might consider those failed tests, they actually provided a clear indication that the language wasn’t different enough or the differences tested didn’t actually matter to the recipients. This can inform copywriting strategies going forward and help direct the team toward more valuable tests.”

17. Verify the winner of the test.

In the world of email marketing, victory isn’t eternal. Winners must be re-tested periodically to ensure they’re still winners. That’s because of a few factors: First, the novelty effect can give a short-term boost to new changes as subscribers react solely to the newness of the change. Then the boost wears off, which can make what looked like a winner into a long-term loser. 

Second, subscriber expectations slowly change over time. The outstanding email experiences created by some brands are constantly teaching email users to expect more.

And third, your email audience is constantly evolving. With roughly a third of a brand’s subscribers opting out or becoming inactive each year, the majority of your active email audience can turn over in just two or three years. Those new subscribers bring with them new preferences, which you must discover by testing similar concepts periodically, says Antipa.

“Your database changes over time, so A/B testing should be part of an ongoing process with multiple cycles to truly learn about what works and how to optimize for conversion,” she says. “Commit to continuous testing.”

Document Your A/B Testing

The final set of pitfalls are around documentation, which helps you avoid many of the other pitfalls we’ve discussed if you do it right. Here we have two pieces of advice...

18. Record Your A/B testing results.

Hopefully by now we’ve convinced you that A/B testing is powerful but full of lots of details and dependent on iteration to further your learnings and to confirm victors. Because of all the details, we highly recommend that you keep a log of all of your testing efforts, writing down your answers the following questions:

  • What is being tested? And what is your hypothesis about the outcome of the test?

  • What is the goal of the test? What is your success metric?

  • What is the control (version A) and what is the challenger (version B)?

  • What email are you running the test in? What is its send date? Or if it’s an automated email, when are you starting the test period?

  • Did the test reach statistical significance of 95%?

  • What were the results of the test for version A and version B, looking at both your success metric and other relevant metrics like unsubscribes and spam complaints?

  • Which version won? What does that victory mean? What did you learn?

“Note all your findings throughout the year,” says Bahumanyam, “and toward the end of the year you’ll have a results file that will be great to share with senior management and to add to your portfolio.”

19. Create an A/B testing calendar.

While your A/B testing results log helps you keep track of past tests and your learnings, an A/B testing calendar helps you plan out your future tests and re-tests. Given all of its elements, testing takes planning because of the need for additional assets, additional setup time, and additional coordination across teams. 

“We create a shareable testing and optimization roadmap to make sure all of our players—from our copywriters to designers to coders to the client—are all on the same page for each and every test,” says Resendez.

If it’s not baked into your campaign planning, she says, you may find out at the last-minute that your test isn’t ready—and nobody holds up a campaign for a test. It simply means the test doesn’t happen, which means a missed opportunity to learn and a missed opportunity to get incrementally better returns.

Yes, that’s a lot of issues to keep in mind, but some of these pitfalls won’t trip you up now that you know about them and many of the remaining ones can be kept top of mind by keeping a testing results log and maintaining a testing calendar. With a good process in place and the fundamental issues under your belt, you can confidently test knowing that you’re maximizing your chances of moving the needle on performance and you can trust your results.


Need help running a test? Oracle Marketing Consulting has more than 500 of the leading marketing minds ready to help you achieve more with the leading marketing cloud, including experts on A/B testing, whether you’re optimizing your creative, list growth, or other aspects of your digital marketing program.

Learn more or reach out to us at OMCconsulting@oracle.com.

Find out how Oracle Marketing Cloud can help with your A/B testing and personalization here


Chad S. White

Head of Research, Oracle Digital Experience Agency

Chad S. White is the Head of Research at Oracle Digital Experience Agency and the author of four editions of Email Marketing Rules and nearly 4,000 posts about digital and email marketing. A former journalist, he’s been featured in more than 100 publications, including The New York Times, The Wall Street Journal, and Advertising Age. Chad was named the ANA's 2018 Email Marketer Thought Leader of the Year. Follow him on LinkedIn, Twitter, and Mastodon.

Previous Post

Brands Ignore Emojis at Their Own Risk

Chris Zilles | 2 min read

Next Post

How Marketing Can Prime a Prospect for Sales

Serenity Gibbons | 4 min read