What is incrementality testing? The only honest measure of ad performance
Incrementality testing is the discipline of measuring what your ads actually caused vs what would have happened anyway. It's the answer to 'is Meta lying to me about ROAS?' (often, yes). This primer explains what incrementality is, the six study designs that actually work, and the amateur-vs-elite gap between dashboard-watching and lift-measuring.
Incrementality testing measures lift caused by your ads - not conversions that happened during your ads
Platform-attributed ROAS tells you how many conversions Meta or TikTok credited to your ads. It does NOT tell you how many of those conversions would have happened without your ads. The gap between the two is the difference between attributed performance and incremental performance - and that gap can be huge.
The honest answer to 'are my ads working?' isn't on the dashboard. It's in a controlled experiment: show ads to one group, withhold ads from a comparable group, measure the difference. That's incrementality testing. The math is simple; the discipline of running it well is rare.
Common Thread Collective's published research found that platform attribution captures only ~86% of actual new customer acquisition at best, and the standard 1-day click window captures only ~47%. Operators making decisions on the dashboard alone are systematically over-funding the formats that capture last clicks and under-funding the formats that drive upstream lift.
Common misidentifications
It's not this. It's that.
The most-common confusions, lined up side-by-side.
Not this
Incrementality = A/B testing creative
This
Incrementality = testing whether the channel/campaign drives lift vs no-ads control
Not this
ROAS measures incrementality
This
ROAS measures attributed conversions during ad exposure - not lift caused by the ads
Not this
Incrementality is for big budgets only
This
Incrementality is for any budget where 20%+ can be held out for 1-2 weeks - usually means $50K+/mo accounts
Not this
One incrementality test is enough
This
Incrementality needs quarterly cadence - channels and creatives drift, last quarter's lift number expires
Anatomy
The 6 components of a working incrementality test
Incrementality tests fail in predictable ways. The six components below have to be in place or the test won't produce a defensible number.
Why it matters
Without a true control, you can't distinguish 'caused by ads' from 'would have happened anyway'.
Concrete example
Meta GeoLift test: ads run in 60% of US metros, paused in matched 40%. Measure conversion difference over 14-21 days.
The gap
The 8 differences between amateur and elite incrementality practice
Most operators talk about incrementality. Few run it well. The gaps below are what separate the practice from the rhetoric.
Pitfalls
The most common mistakes
Each one alone is recoverable. Several stacked together break the practice.
Confusing A/B creative testing with incrementality
A/B creative tests measure which creative wins; incrementality tests measure whether the channel/campaign drives lift. Different jobs, different methods. Most operators conflate them.
Running under-powered tests
A 7-day test on $30K/day spend usually can't detect a 10% lift. The sample math has to come first; the test schedule has to fit the math.
Moving the threshold after seeing the result
Pre-register the threshold before launch. Post-hoc threshold-changing is the #1 way incrementality 'proves' channels that didn't actually lift.
Ignoring the lift number once measured
Most operators run incrementality, get a result, and don't change their budget allocation. The discipline is letting the number rewrite the math. Otherwise the test is decoration.
Glossary
Related terms you should know
The vocabulary that surrounds this concept. Bookmark this section.
Incrementality
The conversions caused by an ad/channel/campaign, measured against a no-ad control.
Lift
The difference in outcomes between test and control groups, attributable to the intervention.
Geo-holdout / GeoLift
Incrementality test design where geographic regions are split into test (sees ads) and control (doesn't). Meta GeoLift is the standard tooling.
Conversion Lift
Meta's first-party incrementality study product - randomized within-audience holdout.
MMM (Marketing Mix Modeling)
Statistical model that decomposes total sales into contributions from channels + non-channel factors. Recast, Haus, Recast all build MMM tools.
MTA (Multi-Touch Attribution)
Attribution methodology that credits multiple touchpoints in a conversion journey. Northbeam, Triple Whale, Hyros build MTA.
Attribution gap
The difference between platform-reported conversions and incrementally-measured conversions. Often 30-50%.
Lift multiplier
The ratio of incremental conversions to platform-attributed conversions. Used to adjust attributed ROAS into incremental ROAS.
Pre-registration
Writing down the hypothesis, threshold, and timeframe before the test starts. The discipline that prevents post-hoc rationalization.
Foundational knowledge in. 25 variants out.
Once you understand the discipline at this level, the bottleneck moves to production. Shuttergen turns one validated concept - anchored to your starting image - into 25 brand-safe variants you can test. The strategist stays in the loop; the production grind goes away.
Try Shuttergen freeRelated Shuttergen reading
Where to go next
The connected pages that compound on this one.
Primer · Attribution
What is attribution? The mostly-broken science of crediting conversions to ads
Foundational primer on attribution - 6 models (last-click, first-click, linear, position-based, MTA, MMM), the amateur-vs-elite gap, and why dashboard ROAS is rarely the truth.
ReadPlaybook · Testing
How to run creative tests that actually move the business
6-step framework - isolated variables, power calculations, pre-registered thresholds, geo-holdouts, documentation discipline.
ReadCalculator · Sample size
Creative test sample size calculator: how big does your test need to be?
Interactive power-analysis calculator for creative testing. Real two-proportion z-test math. Inputs: baseline CVR, MDE, alpha, power, CPC, traffic. Outputs: conversions per side, spend estimate, days to complete.
ReadResearch · Static vs video
Static vs video ads: which converts better, and when?
Honest interactive research with operator citations from Plofker, Shackelford, Hott, Pilothouse, Common Thread, Foxwell, Triple Whale, Recast. The 6 variables that decide + the advertorial-alignment caveat.
ReadSources
What we read to build this
How to run creative tests (Shuttergen playbook)
Shuttergen
Creative test sample size calculator (Shuttergen)
Shuttergen
Static vs video ads research (Shuttergen)
Shuttergen
Triple Whale - Incrementality
Triple Whale KB
Haus - Meta Incrementality Testing
Haus
CTC × Northbeam - Enterprise Attribution for DTC
Common Thread Collective
Foundational knowledge. Now ship the variants.
Shuttergen turns understanding into output - one validated concept into 25 brand-safe variants in hours, not weeks.
Start free