Statistically realPlaybook · Creative testing·11 min read

How to run creative tests that actually move the business

Most creative tests don't have enough power to detect the lift they're hoping for - and the team calls a winner on noise. This is the 6-step framework that gets you statistically real answers without burning $50K on inconclusive runs.

Start here

The short read before the steps

A creative test is just an experiment: ship two ads, see which one performs better. The catch: 'better' requires enough conversions on each side to rule out noise. Most teams call winners on 30-50 conversions - not enough.

The fix is to calculate the sample size before the test starts, run for at least one full purchase cycle, and resist the urge to stop early. The discipline pays for itself the first time you correctly call a non-winner.

The 6 steps

Walk through these in order

Tap any step to expand the how / why / watchout for each one.

How

If both variants change two axes (e.g., new hook AND new format), you can't attribute the lift to either. Lock everything else.

Why it matters

Confounded tests produce false confidence. The whole point of testing is causal inference - confounds break it.

Watch out

'Just try a bunch of variants and see what wins' is the most common anti-pattern. It's marketing, not testing.

Your checklist

Walk through this before you ship the next test

0/7

your coverage

Pitfalls

The 4 mistakes that kill the most teams

Each one alone wastes a quarter; stacked, they waste a year.

Pitfall 1

Calling winners on under-powered tests

30 conversions per side will let you 'see' a winner that won't replicate. The variance is too high. If you can't get statistical power, the test isn't a test - it's a vibe check.

Pitfall 2

Stopping tests early because results look exciting

Early peeking inflates false positives. The math assumes you check at the end, not every morning. If you must check daily, increase your p-threshold accordingly.

Pitfall 3

Mixing prospecting and retargeting in one test

Cold and warm audiences respond to creative differently. A static that wins cold often loses in retargeting. Split-test by audience, not just by creative.

Pitfall 4

Forgetting that platform attribution lies

Last-click flatters statics, video over-attributes earlier in the funnel. Tests run only on in-platform metrics will systematically pick the wrong winner for incremental growth.

Where Shuttergen fits

Tests run. Now ship the variants.

Shuttergen turns one validated winner into 25 brand-safe variants - the production half of the testing loop.

Start free

Tests run. Now ship the variants.

Shuttergen turns one validated winner into 25 brand-safe variants - the production half of the testing loop.

Start free