How to run creative tests that actually move the business
Most creative tests don't have enough power to detect the lift they're hoping for - and the team calls a winner on noise. This is the 6-step framework that gets you statistically real answers without burning $50K on inconclusive runs.
The short read before the steps
A creative test is just an experiment: ship two ads, see which one performs better. The catch: 'better' requires enough conversions on each side to rule out noise. Most teams call winners on 30-50 conversions - not enough.
The fix is to calculate the sample size before the test starts, run for at least one full purchase cycle, and resist the urge to stop early. The discipline pays for itself the first time you correctly call a non-winner.
The 6 steps
Walk through these in order
Tap any step to expand the how / why / watchout for each one.
How
If both variants change two axes (e.g., new hook AND new format), you can't attribute the lift to either. Lock everything else.
Why it matters
Confounded tests produce false confidence. The whole point of testing is causal inference - confounds break it.
Watch out
'Just try a bunch of variants and see what wins' is the most common anti-pattern. It's marketing, not testing.
Walk through this before you ship the next test
your coverage
Pitfalls
The 4 mistakes that kill the most teams
Each one alone wastes a quarter; stacked, they waste a year.
Calling winners on under-powered tests
30 conversions per side will let you 'see' a winner that won't replicate. The variance is too high. If you can't get statistical power, the test isn't a test - it's a vibe check.
Stopping tests early because results look exciting
Early peeking inflates false positives. The math assumes you check at the end, not every morning. If you must check daily, increase your p-threshold accordingly.
Mixing prospecting and retargeting in one test
Cold and warm audiences respond to creative differently. A static that wins cold often loses in retargeting. Split-test by audience, not just by creative.
Forgetting that platform attribution lies
Last-click flatters statics, video over-attributes earlier in the funnel. Tests run only on in-platform metrics will systematically pick the wrong winner for incremental growth.
Tests run. Now ship the variants.
Shuttergen turns one validated winner into 25 brand-safe variants - the production half of the testing loop.
Start freeTests run. Now ship the variants.
Shuttergen turns one validated winner into 25 brand-safe variants - the production half of the testing loop.
Start free