A/B Testing (a.k.a. split-run testing) – Why, When & What?


 

Why is A/B testing important?

Advertising is money. Optimizing the spend of advertising dollars towards improving the relevant metrics is top priority for every advertiser. While programmatic advertising enables automatic optimization of a campaign toward the best possible inventory, there are still quite a few variables that the user needs to tune in order to achieve the ultimate performance.

A/B testing is the typical first-choice method when it comes to optimizing an ad campaign setup [1]. It enables users to utilize a more scientific choice when optimizing their ad spend instead of relying on guess-work. The following is a quick and easy read for a non-expert to familiarize themselves with A/B testing in the ad tech space, and an introduction on how to start testing at dataxu.

How is A/B testing different to incrementality testing?

Incrementality testing is a specific flavor of A/B testing that focuses on the lift¹ of key purchase indicators as measured by the conversion rate². In general, it is a methodology used to measure the lift provided to a brand by some channel of digital advertising, versus not using that channel. It is an important tool for advertisers who need to compare the performance of their campaigns on targeted user populations against groups where their ads are not served. While the typical methodology in the industry for such a test is treatment vs. Public Service Announcement (PSA), at dataxu we offer a range of choices including a version of Predicted Ghost Ads [2] and a version of the Intent-to-Treat method that we call Audience Holdout. While incrementality testing is in itself an interesting subject, the rest of this blog post will be focused on general A/B testing or simply A/B testing for short.

When should A/B testing be used?

A/B testing is used when required to measure the performance boost that can be achieved by changing one variable in an ad campaign setup. If the first version is the one currently used (commonly referred to as control), the new version is usually modified in one aspect (commonly referred to as treatment).

The following lists several variables that are commonly used in the advertising space:

  • Creatives
  • Pixels (or landing pages)
  • Algorithms (optimizing strategies)
  • Audiences
  • DMAs/regions/locations

What is A/B testing?

A/B testing is a method used to measure the effectiveness of one version of a given variable against another. At dataxu, we typically run A/B tests at the flight-level. In the world of ad tech, a flight can be defined as a collection of creatives that belong to a campaign for a specific advertiser. Typically, most targeting and delivery rules are set at the flight-level. These rules include budget, impression goals, active day parts, tracking methods, dates to run, etc [3]. In order to run an A/B test, typically a copy of a test flight is created which differs only in the “single” variable that needs to be measured. As listed above, common test variables are creatives, pixels or landing pages, algorithms, audiences and regions. Once these two flights have completed their course over a specific period of time, a pre-chosen KPI is analyzed to study the performance of the variable in test (see Figure 1 for an example case). The time period chosen to run the test should be enough to obtain a significant result.

Figure 1: This figure illustrates a simple A/B test. In order to test a new variation of a creative (in blue) vs. the original creative (in red), an A/B test is run. The available audience is split into two equivalent parts and each portion is exposed to one of the two creatives. In this toy-example, one out of four people (25% of the audience) that sees the original creative convert, while two out of four people (50% of the audience) that sees the new creative converts. This suggests that the new creative might be a better option to move forward with since it received a higher conversion rate (50%).

What to do before an A/B test…

One fundamental requirement of an A/B test is to ensure your two test flights are both targeting unbiased audiences. If the audience-splitting process is biased your A/B test could yield invalid results.

dataxu’s A/B testing framework enables audience splitting on a range of identifiers. With our OneView™ and graph technologies, we are able to split audiences on:

dataxu’s framework uses a hashing function applied on a selected identifier (one of the above) to separate incoming bids into treatment and control groups. This hashing function is applied in real-time on bids arriving from exchange enabling us to do real-time audience-splitting. The hashing mechanism also ensures that the identifiers that arrive in our bid-stream, and are assigned to either treatment or control, stay within the assigned group for the duration of the test. This ensures no overlap in groups and thus a fair test.

Here is an example setup:

Let’s say you have a new creative and wanted to test how well it would perform in comparison with an existing creative. The summary steps to execute are as follows:

  1. Create a copy of the existing flight with the exact same settings.
  2. Setup the new flight with the new creative that you want to test.
  3. Launch the flights to two audiences which are split using the hashing method described above.

However, there are a few things to check off before you begin this test:

However, there are a few things to check off before you begin this test:

1. Identify test variable

  • Each flight contains many variables that can be tested. Identify which one you would like to test.
  • An A/B test should only test one variable at a time.
  • In the example above, the variable is the creative.

2. Identify KPI

  • Identify which KPI you would like to measure ahead of setting the test. In our test involving the two creatives, we can consider using action-through-rate (ATR) as the KPI. ATR is the number of converters divided by the number of impressions served. It can reveal which creative is driving more conversions, therefore, is a good KPI to measure.

3. Setup a control and a treatment flight

  • All settings in both flights should be equal except for the variable you would like to test.

4. Determine your budget (sample size)

  • In advertising, your budget determines the number of impressions you receive and thus your sample size for the A/B test. It is important that your budget is of sufficient size to allow detection of an actual difference in the KPI between the A and B flights.
  • The time taken to complete the test would depend on the time taken to spend this budget.
  • This time can be estimated by using features such as impression-pacing and spend-forecasting, available on the platform.
  • In order to avoid bias due to the effects of seasonality, spend can be further regulated so that the time span would be a whole number of cycles (typically a week or multiple weeks).

5. Decide the split proportions and how significant your results need to be

  • First, decide how much of a change you need to see.
  • Next, decide the level of significance that needs to be achieved in order to consider the results valid.
  • For our creative example above, let’s assume that any change less than a 10% difference between treatment and control ATRs is not of use to us. Let’s also assume that you need to be 95% positive that one creative is working better than the other creative. In this case, the test should be set up with the expectation of seeing at least a 10% improvement, at 95% significance (p-value < 0.05). At dataxu, we provide a handy tool which enables you to tune your budget and split it in order to increase your probability to see the desired result, given the usual KPI you expect to see.

6. Make sure that only one test is running at a given time on any campaign

  • Tests can easily yield inaccurate results if there are multiple variables being changed. Make sure you are only running one test at a time

Now we have all the information that is needed to setup and run the A/B test!

What to do after an A/B test…

Once the test is complete, you need to collect and analyze your results. In the case of dataxu, the use of a reproducible deterministic hash on the identifier to panel to A/B makes it easy to split the results for analysis. Recall the KPI that you chose prior to the test and measure its difference across the two groups. Measure the significance of your result by calculating a p-value [4] for the test. Finally, if the results of the test are significant, the most important step is to make a choice. Now that you are armed with results based on an A/B test, you can make a scientific choice on how to optimize the variable you are testing. And don’t stop there. Pick the next variable you want to optimize and continue to test!

Multivariate testing

You can also set up multivariate testing which allows for many tests to run at the same time. However, please note that this will significantly lower the potential of achieving statistical significance and might not always help with the decision-making process. Even when there is a need to test multiple changes, we recommend testing a single change at a time.

For more information on A/B testing at dataxu, reach out to your dataxu representative.

References:

[1] https://en.wikipedia.org/wiki/A/B_testing

[2] https://medium.com/dataxutech/predicted-ghost-ads-an-accurate-and-cost-effective-method-for-advertisers-to-measure-ad0068b4da1

[3 ] https://dev.adzerk.com/docs/flights

[4] https://www.statsdirect.com/help/basics/p_values.htm

¹Lift represents an increase in sales in response advertising.

²Conversion rate in this context means the ratio of outcomes to investment, such as product purchases per ad impression, or the number of units of an awareness increase metric (for example views or clicks) per ad impression etc.