A/B Testing in Design Projects

A/B testing is a unique form of usability and UX testing in which at least two solution variants are compared. This testing can be conducted either with prototypes or in live environments.

In the first case, A/B testing corresponds to a test variant of usability and UX testing, where two prototype versions are compared. In the second variant, the implementation works differently. To present the method in this section, we focus on A/B testing in live systems (Stegemann & Suwelack, 2020; Siroker & Koomen, 2013).

How Does It Work?

A/B tests are mainly tool based. Whether A/B testing is possible depends heavily on the technical platform on which your website or online store is hosted. If your website runs on WordPress, which is the case for about 40% of all websites worldwide, then A/B testing is possible almost without any problems. All you need to do is install a suitable plug-in. The same advice applies to large online store providers.

But let’s leave the technology aside for now. The idea of A/B testing live applications is not so much to compare two or more completely different solutions as to fine-tune existing ones.

Meaningful A/B testing on a live system requires excellent planning and a sensible and structured approach. Since everything you test here is played out to your users, the things you want to test must be largely ready. Usually, the users are unaware they are contributing to an A/B test with their activity. Still, they are simply shopping in a web store, for example, or visiting a website to get information. So, these A/B tests only work with elaborated ideas—or you accept the damage to the brand’s image. An alternative is to test still immature ideas with only a small group of users. This approach limits the damage to just a few people but, unfortunately, only works if you already have a sufficiently large user group.

For an A/B test, you usually proceed by deciding which detail you want to change on the web page. For example, that could be a headline, a button label, or a contact information position. Then, you set up an A/B test where you keep everything the same—except for just that one thing. Thus, you end up with two versions of the same web page, for example, one of which uses Headline A and the other of which uses Headline B. Everything else is the same, as shown in this figure.

Website during A/B testing

Now, start the A/B test. Depending on the tool, you can define the end of the A/B test by time (“User test runs for a week until Sunday noon”) or by the number of people who have seen the different versions. Some tools also offer an automatic stop when enough information has been gathered to prompt you to make a decision. The tool automatically ensures that half of the visitors see one version and the other half see the second.

The results from such an A/B test are usually purely quantitative and must be defined and created by you in advance. For example, conversions are usually measured. For this task, you can define in advance what counts as a conversion: In the case of a store, a conversion could be making a purchase; in the case of a website, it could be subscribing to a newsletter.

Afterward, you can check in the tool to see which version performs better in terms of conversions. As with anything you do, it’s imperative with A/B testing to think about the right metrics in advance. You can choose just one, such as conversion rate. But looking at different metrics and considering them together, such as the number of purchases made and the average purchase volume, can be helpful. Once you can determine a winner based on your measurement data, the A/B test is completed. Usually, the “winner” of the test is made live for all users, so from then on, only one version (the better one) is available for everyone again.

For this procedure to work well, a crucial consideration is that you only change one thing per test run. If you turn many screws simultaneously, you won’t be able to tell at the end which change was effective. For example, you could theoretically change the headline, the button label, and the position of the contact button at the same time and find out that the conversion rate has dropped by 3%. This result would make you discard the experiment and stick with the old version. But maybe the different headline led to a 2% better conversion rate; the different button text also led to a 2% better conversion. However, the new position of the contact button led to 7% less conversion because users suddenly saw your page as less trustworthy. Then, you would have just given away 4% conversion due to a lousy test setup.

Therefore, A/B tests on live systems are usually planned rigorously and implemented on a small scale. In return, the execution time can be extremely short, at least if you have sufficient website or store visitors. In most cases, A/B tests are permanently used so that tests are constantly taking place somewhere. For this purpose, many small tests are often run one after the other. Substantial providers, such as Amazon, rely on continuous testing and improvement. Recently, Amazon even offered this feature to its retailers and affiliates so that they can place and sell their own products even better.

When Is It Used?

A/B tests are usually used mainly on websites and online stores or in advertising on social media that regularly have a larger number of visitors. The type of A/B testing described in this section is thus more of an optimization activity than a creative activity, which makes A/B tests particularly suitable for existing products in live operation. Conventional usability and UX tests that present the different variants to the participants are suitable for comparative testing of different variants in the prototype or concept stage.

What Do You Need?

Since A/B testing is implemented technically, the requirements are relatively clearly defined. Therefore, for the implementation, you need the technical possibility to play out at least two variants to users and measure their performance. As a result, you’ll need the following materials:

A tool that supports A/B testing for your application
At least two variants that you want to compare (and that are suitable for use in a live system)
A definition of success
A sufficiently large number of users using your live systems

What Are the Advantages?

Advocates of A/B testing argue that A/B tests on live systems deliver much better results than usability and UX tests because they do not simulate the behavior of users but instead capture and compare actual behavior. Thus, measurable differences are easily tangible and visible. And they are right about that. However, to defend usability and UX tests, note that these tests are used in development and thus pursue a different objective. Especially in the case of high-traffic web offers, the execution of an A/B test can be speedy. Thus, in some cases, an A/B test can be carried out and evaluated faster than it takes to discuss which change should now be tried.

In addition, the procedure for A/B tests is structured. The fact that only individual aspects are adjusted in a targeted manner means that the learning effect is quite high. Theoretically, knowledge can be acquired and applied to other websites and online stores. Permanent and step-by-step improvement can be driven forward through rigorous testing with large numbers.

What Are the Disadvantages?

Because they are used on the live system, A/B tests are rather systematic and small scale. Thus, they are less suitable for covering giant leaps or changes: Especially if entirely new and innovative concepts are to be used, direct use in an A/B test is relatively risky because actual revenue losses are expected if a concept does not work and only finds little acceptance.

So, if A/B testing is used as the only method, then further development can be expected to take the form of an evolution rather than a revolution, and thus, potential may be wasted in implementation. If A/B testing is also used for more extensive tests, bad test variants can fall back on the company. Therefore, a little intuition is required at this point in the selection and preparation—especially because people do not know it is a test.

Another disadvantage of A/B testing is the complete lack of subjective data to help justify the differences in results. As a result, you learn what works and what doesn’t step-by-step, but for the most part, you must rely on a trial-and-error approach because other than the fact that something worked or didn’t work, there is no evidence of improvement. Thus, you are always guessing about the “why.”

What Are the Alternatives?

Using usability and UX tests provides a first impression of two variants in comparison. Theoretically, rapid user tests could be performed using a copied instance of the existing website. If an adjustment of the visual appearance should be made and queried, online surveys can also be used, whereby the focus is on determining subjective data.

This table shows the profile for this method.

Profile for the A/B testing method

Editor’s note: This post has been adapted from a section of the book Usability and User Experience Design: The Comprehensive Guide to Data-Driven UX Design by Benjamin Franz and Michaela Kauer-Franz. Benjamin received his doctorate in engineering; his UX-related dissertation on highly automated driving was awarded the Walter Rohmert Research Prize. In his jointly founded and managed company, Custom Interactions, he focuses on user interface design. He also works as a lecturer and a keynote speaker. Michaela has a doctorate in psychology with a focus on product design and user experience. Together with Benjamin Franz, she founded the data-driven UX agency Custom Interactions, which they manage together. She also contributes her extensive experience as a trainer, a lecturer at Technische Universität Darmstadt, a speaker, and on an International Organization for Standardization (ISO) committee.

This post was originally published 2/2024.

A/B Testing in Design Projects

How Does It Work?

When Is It Used?

What Do You Need?

What Are the Advantages?

What Are the Disadvantages?

What Are the Alternatives?

Recommendation

Comments

Latest Blog Posts

What Is a Rapid User Test in Software Development?

What Are the Gestalt Principles?

The official Rheinwerk Computing Blog

Blog Topics

Blog curated by

About