What’s the real problem with your lift test?

As marketers, we’re all truth seekers. We test, measure, and iterate towards the most impactful outcomes. Subsequently, A/B tests have become a primary tool in our digital advertising kit. When it comes to display, public service announcement (PSA) testing is the most common approach – especially when it comes to measuring lift and incrementality.

PSA testing is done by splitting the segmented audience into treatment and control groups: the treatment group seeing brand- or product-related ads, while the control group sees ads for a nonprofit, for example, the Humane Society. The idea is that we can measure the difference in performance between the two groups, while potentially driving some traffic to a worthy cause.

Yet despite being one of the most accessible solutions, PSA testing has some major flaws that can result in inflated, inaccurate results.

The problem with PSAs

First, PSA testing removes a key variable from the equation: your competitors. If what you’re measuring is the impact of showing your would-be customer an ad versus not showing them one, then your test needs to ensure that it doesn’t introduce any new variables to the control group. By showing PSA ads, you’re removing the opportunity for your competitor to show ads to these users. And while that might sound great, it’s introducing a new bias that can throw off your test. In the real world, if you’re not reaching your customers, your competitors likely are, and that is going to impact their buying behavior.

Second, with PSA testing, bidding and optimization prevent the groups from remaining truly randomized. While the treatment and control groups are generally split randomly at the start, they won’t stay that way. That’s because the machine learning algorithms are constantly learning as users interact with ads. And people interact with targeted, product-specific creative differently than they do with Humane Society ads with adorable puppies on them. So as your vendor’s algorithm learns from those interactions, they’re going to bid for the control differently than they would for the treatment group. This again leads to inaccurate and inflated results.

The other problem—one marketers really don’t like—is that you actually have to pay to show those PSA ads—so there is a major cost associated with running these tests. As a result, marketers tend to err on the side of shorter, 30-day tests.

Fortunately, some leading thinkers out of places like Google and Netflix have developed two more accurate solutions in order to remove the bias and accurately determine the incremental impact of your ad campaigns.

The “Intent to Treat” approach

Also known as “Bid Opportunity Testing”, Intent to Treat (ITT) moves the creation of a control versus a treatment group further into the funnel in order to keep the two groups as balanced as possible. Rather than divide the two groups when they’re first cookied, as PSA testing does, ITT does so after the users have been identified as a desired target—they fall within the right segment, geography, etc. This means that the system has identified that a given user is targeted before they’ve been bucketed, which solves for the second problem that we highlighted with PSA – that the groups become unbalanced as the algorithms learn. This removes a lot of bias and ensures comparable audiences.

The key benefit of ITT is that it is an unbiased way of conducting testing because it compares alike audiences. However, because this framework doesn’t ensure an actual bid or ad treatment, it doesn’t remove the noise of internal and external auction behavior, thereby causing fluctuations in lift results.

Enter ghost bids

Recently, a more sophisticated solution known as ghost bids testing has made its way into the scene (believe it or not there is an A/B testing “scene”).  Ghost bids strive to remove any type of noise and ensure that the reflected results can be trusted, unbiased and reliable. This allows for actionable insights and reduced wasted ad spend.

In many ways, the implementation of ghost bids is similar to the intent-to-treat frameworks. Ghost bids takes it a step further by going all the way to auction. It works like this. Your brilliant machine learning system determines that this user on the nytimes.com is of high value. It determines the bid value, creative to serve, etc, and right before actually issuing the bid, it does a lookup to determine if the user is in the treatment or the control group. If the user is in the treatment, it delivers the bid. If the user is in the control, it simply logs the bid value but then withholds the bid.

This means that with ghost bids, the auction, pricing, and optimization of an ad happen before the split. Because the split is conducted later with ghost bids, you remove a lot of the other noise. Your two groups remain randomized & balanced. And, because you’re withholding the bid, you’re more accurately mimicking the experience of an unreached user.

Best of all – you don’t have to pay for impressions like you would with a PSA test, meaning you can afford to always run this test in the background so you can constantly monitor how different tactics and strategies are impacting lift and incrementality.

If a vendor is putting A/B test results in front of you to justify an increase in spend, ask them what framework they’re using. If it’s PSA testing, demand that they run a more sophisticated test with ghost bids. If they won’t, find a vendor that will, because as a marketer you can’t afford to not know the truth.

For a deeper dive into A/B testing, download the Incrementality Quick Guide – your resource to understanding which marketing efforts are the most effective at moving key business metrics, such as revenue growth and profitability.