Your Incrementality Test Said 2x - Here's Why You Shouldn't Bet the Budget on It

The thing everyone repeats: an incrementality test is the gold standard, so when it comes back at 2x, that's the truth, and you should reset your targets to it.

Here's what actually happens. You get one read, off one geo-holdout, off one window of weather and promos and competitor noise, and you treat that single number like it's carved in granite. Then next quarter's read comes back at 1.3x and you feel like the floor's fallen out.

Both numbers are real. Neither is the truth. And betting your budget on the first one is one of the more expensive mistakes I see good operators make.

Let me explain why, because once it clicks you'll never read a test the same way again.

One test is one hot season, not a new baseline

I want to borrow a frame from baseball, because it's the cleanest way I know to think about this.

Last year a catcher hit 60 home runs. Most ever for that position, one of only a handful of 60-homer seasons in the history of the sport. A genuinely enormous number.

Now, what do the forecasting systems predict he'll hit next year? Not 60. They land somewhere around the high 30s to mid 40s. Every single model pulls him back, because his three prior seasons were 34, 30, 30. The 60 wasn't fake. It just wasn't his true level. It was an outlier, and the bigger the outlier, the harder the maths drags it back toward the mean.

Your incrementality read works exactly the same way. A single test that comes back at 2.1x is the 60-homer season. It's one data point sitting on top of a real underlying number you can't see directly. The honest move isn't to crown it. It's to ask: what's this channel's actual level, and how far should this one read move my belief about it?

Here's my take, plainly. The more extreme a single read looks, the less you should trust it on its own, and the more you should want to see it again before you spend against it.

What we actually do with an outlier read

So a test lands at 2.1x on a channel where everything you've ever seen sits closer to 1.2x. What now?

We don't reset the target to 2.1x. We barely move it.

The way we run it, every account starts with a benchmark. For Meta 7-day-click acquisition, a sensible starting factor is around 1.2x, meaning whatever your platform-reported ROAS says, the genuinely incremental figure tends to land near 1.2 times it. That's not a number we invented on a whim. It lines up with the aggregate of hundreds of tests the platforms themselves have published, and it lines up with what we see across our own client accounts. Two large, independent piles of data pointing at the same place. That agreement is what gives the benchmark its weight.

Now a fresh single test comes in. Here's how we weight it:

  • If the read sits close to the benchmark, that's high confidence. The new test agrees with everything we already knew, so we lean into it and act.
  • If the read is an extreme outlier, we treat it as a flag, not a verdict. A 2.1x read on a 1.2x channel doesn't move our working factor to 2.1x. It nudges it a touch and goes on the list to be re-run, because one strange read is far more likely to be noise than a real step-change.

The size of that nudge isn't only a statistics question, it's a business one. A brand with very high lifetime value can afford to lean into a hopeful read a bit harder, because the downside of slightly overspending on a good customer is smaller. A thin-margin brand should stay closer to the benchmark and make the test prove itself.

The worst version of this, the one I genuinely want to talk people out of, is pivoting your entire budget onto a single hopeful number. I've watched a brand get one great read, scale hard into it, then get a 1.4x read eight weeks later and conclude the whole exercise is broken. The exercise wasn't broken. The expectation was. They wanted one test to be the answer, and one test is never the answer.

The few clean tests you get, spent on the right channel

Here's the part almost nobody plans for properly.

A clean geo-holdout takes time. Four to six weeks per test once you account for actually running it to completion. So even if you're disciplined and always have one in flight, you get maybe eight to twelve clean reads a year. That's it. That's your whole annual budget of trustworthy answers.

And the list of things you could test is effectively endless. Channel versus channel. Acquisition versus retention. Brand versus non-brand. One bidding setup versus another. New channel activation. You will never test all of it. So the real skill isn't running tests, it's choosing which handful of questions are worth your few clean reads.

The instinct most people have is to spend a test on the shiny new thing. "Let's test whether TikTok is incremental." And look, you'll probably learn something. But here's the maths problem with that choice: you're likely only putting 5% or so of your budget into that channel. Even a brilliant result there barely moves your overall picture.

What I'd do instead, and what we do, is sort by where the money actually is and where the uncertainty is widest. Two questions stacked together:

  • How much spend is riding on this channel? A channel taking half your budget deserves clarity before a channel taking 5% does.
  • How fuzzy is your current read on it? Some channels have a tight, well-understood incrementality range. Others, the high-funnel ones especially, swing wildly from test to test. Wide uncertainty on a big spend is exactly where a clean read pays for itself.

Put those together and you usually land on the same answer: test your biggest, fuzziest channel first. The boring one taking 40% of spend that you've quietly been over-crediting for two years. That's where a single good read changes real decisions. The new channel can wait until you actually understand the thing you're already pouring money into.

To put it in money terms: clarifying a channel you spend ~$80k a month on is worth far more than clarifying one you spend ~$6k a month on, even if the small one is more fun to be curious about.

So what do you do with the read on your desk right now?

If you're sitting on a single test result, here's the honest playbook.

Don't reset your targets to it. Treat it as one vote, weighted against everything you already knew about that channel.

If it roughly agrees with your prior reads or a sensible benchmark, your confidence goes up, act on it. If it's wildly higher or lower, hold your nerve, nudge gently, and re-run it before you commit budget. The answer you want is a read you've seen more than once, not a dramatic one you've seen once.

And get comfortable with the idea that the truth here is a range, not a single figure. Your real incremental ROAS on a channel is more honestly described as "somewhere between 1.5x and 2.2x" than as one confident number on a dashboard. Nobody loves a range. Ranges are how you avoid betting the quarter on a fluke.

The brands that get this right aren't the ones with the fanciest measurement stack. They're the ones who've decided in advance which numbers earn the right to change their behaviour, and which are just interesting.

If you'd value a sanity check on whether a recent read is a real signal or a one-off swing, that's a big part of what a Signal/Noise Audit does. We line your test up against your own account history and a benchmark across similar brands, so you can see at a glance whether it's worth acting on or worth re-running before one hot season reshapes your whole budget. What's the read that's been tempting you to make a big move?

Ethan To
CEO @ Pigeon Digital