When Incrementality Testing Is a Waste of Money (And the 4 Cheaper Measurement Layers to Use Instead)

Nobody buys a tile saw to hang one picture frame.
You'd own a hammer first. Maybe a drill once you've got a few jobs on. The tile saw comes much later, when you're actually re-doing a bathroom and the cheaper tools genuinely can't do the work. Buy it on day one and it just sits in the shed, expensive and unused, while you tell yourself you're the kind of person who owns a tile saw.
Measurement is the same. And right now the tile saw everyone wants to buy on day one is incrementality testing.
I get why. Every second post in the founder group chats is about holdouts and geo-lift and whether a channel is "truly incremental". It sounds rigorous. It sounds like what the grown-ups do. So a brand doing a few hundred grand a year starts shopping for the heavy tooling before they've worn out the hammer.
Here's my take. For most brands under ~$5M, a proper incrementality test is the wrong tool at the wrong time, and the maths is the reason why. A geo holdout works by hiding your ads from a slice of the country and measuring the gap. If you're spending ~$12k a month, you simply don't generate enough conversions in any one region for that gap to mean anything. The test comes back noisy, you read a story into the noise, and you make a real decision off a fake signal. That's not rigour. That's an expensive coin toss.
So before you buy the tile saw, here are the four cheaper measurement layers I'd actually reach for, cheapest first, each mapped to roughly where a brand sits.
1. Ad-level benchmarks (from your first dollar)
The cheapest measurement layer is the one already sitting in your ads manager, and most brands skip straight past it.
At the early stage, you're running conversion campaigns on Meta and you can see click-through rate, cost per click, cost per add-to-cart and a one-day-click ROAS for every ad. None of that is a perfect read on truth. But it's free, it updates daily, and it tells you which creative is pulling and which is dead weight.
To put that in context: if one ad is doing a 40% better cost per click than the rest of the account, that's a real signal you can act on this afternoon. You don't need a holdout to tell you to put more behind it. You need to read the numbers you're already paying to generate.
This is the hammer. It's not glamorous and it won't answer every question, but at this stage it answers the only question that matters: which of my ads is working hard enough to feed more budget into?
2. Blended MER (once you're past one channel)
The moment you're spending across more than one place, ad-level numbers start lying to you by omission, because every platform takes credit for the same sale.
This is where blended MER earns its keep. Total revenue divided by total ad spend, the whole business on one line. It doesn't care what Meta's pixel claims or what the Google rep tells you. If you spent more this month and your blended number held or climbed, the spend is probably pulling its weight. If you poured in 30% more and MER sagged, something you added isn't incremental, even if every platform dashboard is lit up green.
I love this layer because it's almost impossible to fool yourself with it. To put it in perspective, you can have Meta reporting a 3x and Google reporting a 4x and still be going backwards, because both are claiming the same customer. Your bank account can't double-count. Blended MER is the closest cheap proxy you've got to the bank account.
For a brand somewhere in the low millions running two or three channels, watching MER move as you change spend is most of the measurement you need. It's a snorkel, not scuba gear, and the water isn't deep enough yet to need more.
3. New-visitor and demographic comparison (when you're testing what's actually new)
Here's the question blended MER can't answer on its own: when an ad performs, is it bringing you new people or just mopping up sales you'd have got anyway?
This is the one I'd reach for the second you start worrying about cannibalisation, and it's still free. Pull a new-but-performing ad and look at its percentage of new visitors. Then do the age-and-gender breakdown for that ad, and put it side by side with your established scaled ads. If the new ad is reaching a genuinely different demographic with a high new-visitor rate, you're almost certainly reaching new people. And if you're reaching new people, you're probably driving incremental orders.
It's a soft signal, not a courtroom verdict. But it'll get you most of the way there for nothing. I've seen brands convince themselves a new design was incremental, then check the demos and find it was the exact same buyer they already owned, just in a different shirt. The breakdown would have told them in ten minutes.
Any brand running Meta can pull this report. A $3M brand can do it just as easily as a $30M one, which is exactly why I'd exhaust it before paying for anything fancier.
4. A single geo-lift test (once one channel is genuinely big)
Only now do we get to the tile saw. And even here, I'd run one test, not twenty.
A single geo-lift makes sense when you've got one channel scaled to the point where hiding it from a slice of the country still produces enough conversions to read cleanly, and where the spend is big enough that getting the call wrong actually costs you. That's usually a brand well into the millions, leaning hard into a second or third channel, asking a specific question: is this view-heavy channel I can't see in the pixel actually doing anything?
That's the right moment. You've added YouTube, the one-day-click data looks ugly because nobody clicks a YouTube ad and buys on the spot, but you suspect it's working up top and you want proof before you commit real money. A holdout answers exactly that.
The honest framing, though, is that incrementality is a hygiene check on channels you already spend on, not a hunt for hidden treasure. It won't uncover some buried pile of cheap growth. It tells you whether the money you're already spending is pulling weight. That's valuable at scale. It's a waste at $12k a month, because the test can't generate a clean enough read to be worth what you'll pay for it and the revenue you'll hold back to run it.
The other way to burn money here: over-testing what's already settled
The flip side of testing too early is just as expensive, and I see it more in the brands who do get to scale: they never stop testing things they've already settled.
Once you've calibrated a channel and you know roughly what it does, re-running a holdout on it every few weeks isn't discipline. It's a tax. Every geo test means hiding your ads from a slice of the country, so you're leaving 5 to 10% of that channel's revenue on the table for the length of the test. Run that on a channel you already understand and you've paid full price to confirm something you already knew.
A few traps worth naming, because they're common:
- Testing a channel you barely spend on. If a channel is a sliver of your budget, the test takes forever to reach significance and the answer changes nothing. Spend your testing budget where the money actually is.
- Re-testing a settled winner out of FOMO. You don't need to prove Meta works for the fifth time because everyone online is talking about holdouts this month. Test when something genuinely changes: a new channel, a big tactical shift, a real change in your revenue mix.
- Testing during chaos. If you've just changed your tracking setup, swapped your landing page and launched three new audiences, a holdout in the middle of that tells you nothing. You need a stable base or the read is rubbish. Settle the account first, then test.
- Holding out over peak. Running a geo test across your biggest sale of the year, where you deliberately hide the offer from 10% of the country, is just handing away revenue when it's most expensive to lose. Pick a calmer window.
The thread through all of it: every layer of measurement is a tool, and tools have a right depth of water. Use the cheap ones hard while they're still answering your questions, and only step up when the question genuinely outgrows them.
If you're staring at a measurement tool you're not sure you've earned yet, the cheapest move is to work out which rung you're actually standing on before you pay to climb the next one. That's most of what a Signal/Noise Audit does on the measurement side, by the way: it maps your spend to the cheapest read that would actually change a decision, rather than the priciest one you could buy. Which of these four are you already getting full value from, and which have you skipped on the way to wanting the expensive one?
.webp)





