Your Email Isn't Returning 35x: How to Holdout-Test Retention Like Paid Media

Nine times out of ten, when a brand shows me its channel ROI, email and SMS are sitting at the top of the table looking like the best money they spend all year. And nine times out of ten, that number is borrowing credit it never earned.

You know the figure. The retention platform reports something like 35x. Your email "returns" 35 dollars for every dollar it costs. Set against Meta scraping along at maybe 1.3x, it looks like you've been a fool for ever spending a cent on acquisition.

Here's the problem with that comparison. It isn't real, and once you see why, you can't unsee it.

Why the 35x is borrowed money

The reason email and SMS dashboards report such glorious numbers is simple: they get last-click credit for demand that other channels created.

Walk the actual path. Someone sees your Meta ad. They don't buy. They click through, land on your site, and hit the pop-up. They hand over an email and a phone number. Now they're "on your list." A few days later your welcome flow sends them a message, they click it, and they buy.

The retention tool records that sale as email's win. Clean attribution, lovely ROI, 35x. But who actually created that customer? The ad did the convincing. The flow just happened to be standing nearest the till when the sale landed.

This is the bit I want to hammer, because it's the whole game. Email and SMS are largely capturing demand, not creating it. The pop-up is the welcome mat for traffic you paid to drive. So when the dashboard hands the flow full credit, it's quietly stealing it from the paid channel that did the work. The 35x isn't a measure of value created. It's a measure of which tool got to the conversion last.

Look at the make-up of your own list and it gets obvious. Of the people who hand over a phone number, only a portion buy in the first month, often something like a quarter. The other three quarters sit there, subscribed, not buying yet. They didn't join because the flow was magic. They joined because an ad sent them. Your retention channel is sitting on a pile of demand that paid media created and then handed over. Crediting that pile to email is like crediting the waiter for the meal.

Which is why holding email to the same suspicion you hold Meta to isn't cynical. It's just consistent.

There's a useful counter-thought here, though, before anyone swings too far the other way. Ensuring a channel is truly incremental matters far more when it's expensive and near break-even than when it's a cheap channel reporting 35-to-1. If retention is genuinely cheap to run, a little inflated credit doesn't ruin your economics the way it would on a channel costing real money per order. So the goal isn't to panic about email. It's to find out what it's actually worth, then spend accordingly. The only honest way to do that is a holdout.

The test: a 50/25/25 user-level holdout

Forget geo tests for retention. The clean way to measure a flow, an SMS programme, or a direct-mail drop is at the user level. You split the audience and you withhold messages from some of them.

The split I'd reach for first is 50/25/25.

  • 50% get the normal cadence. Your current programme, unchanged. This is your business as usual.
  • 25% get nothing. A true holdout. These people get no messages from the flow you're testing, full stop.
  • 25% get more. Roughly double the messages, so you can see whether sending harder actually pays or just annoys people.

Then you wait, and you measure revenue across the three groups. The question you're answering is brutally simple: how much extra did the people who got messaged actually buy, compared to the people who got nothing? That gap, and only that gap, is your incremental return. Not the last-click number. The lift over the group you stayed silent on.

This is the part that surprises people. Run it honestly and the incremental figure almost never looks like 35x. It looks like a real number, sometimes a good one, sometimes a sobering one. Either way it's the truth, and you can spend against the truth.

The double-cadence cell is the quiet hero of this design. I've seen this exact structure, normal versus none versus double, come back showing that the group getting more messages drove a genuine, statistically convincing lift in revenue, against the holdout. Which is the opposite of what most founders expect, because everyone's terrified of annoying their list. But if you were truly annoying people, it would show up as unsubscribes, not as quiet resentment. So you watch the unsub rate as your safety gauge while the revenue cell tells you whether sending more is leaving money on the table.

Set the holdout length by consideration period, not by habit

Here's where most retention holdouts quietly break, and it's worth getting right because the whole test hinges on it.

A holdout has to run long enough for the held-out group to actually convert. If you cut it off too early, you'll see the messaged group buy quickly, you'll see the silent group looking flat, and you'll declare the channel wildly incremental. But you haven't measured incrementality. You've just measured speed. Given more time, a chunk of that silent group would have bought anyway. You stopped the clock before they could.

I watched a version of this play out with an SMS programme reporting a beautiful incremental number. It looked too good, so we dug in, and the holdout was running for about four days. Four days. For a considered purchase, that's nowhere near long enough for the held-out group to come back on their own. The held-out people simply hadn't had time to convert, so every quick sale in the messaged group looked incremental when plenty of it wasn't. When the holdout was stretched to a sensible length, the picture got far more honest.

So the rule is: match the holdout length to your consideration period.

  • Impulse buy, low price, quick decision? A short holdout is fine. If people typically decide in days, a couple of weeks will capture most of the truth.
  • High price, long deliberation, the kind of product people sit on for weeks? Your holdout needs to run for weeks, sometimes longer. I've heard of considered-purchase brands running tests for months, and for their economics that's the correct call, not an overcautious one.

The trap is letting your tooling pick the length for you. A lot of vendor holdouts default to a window that suits a fast-moving impulse product, because that flatters the result. If your buyer takes six weeks to decide, a four-day or even two-week holdout is going to lie to you. Set the window to your customer, not to the dashboard's convenience.

One more practical note. Push your vendors on this. The better partners will happily extend a holdout when you explain your consideration period, and the genuinely good ones now build user-level holdouts in by default and report an honest incremental number rather than a last-click one. If a partner won't let you hold out a group for long enough to get a real read, that tells you something in itself.

What to do with the answer

A test you don't act on is just expensive trivia. So here's how I'd read the three outcomes.

The channel is strongly incremental. Brilliant, that's the rare and lovely result. The held-out group bought meaningfully less, the double cell drove more, and the unsub rate held steady. This is your signal to send more. You're under-mailing. There's demand sitting in your list that you're not collecting, and the test just proved it's safe to go and collect it.

The channel is mildly incremental. This is the common one. Real lift, but nothing like the dashboard claimed. Here you keep the programme but stop treating it as free money. You hold it to a sensible return like you would any other spend, and you stop over-investing in it on the strength of a fake 35x.

Frequency is maxed. The double-cadence cell didn't beat the normal one, or unsubs ticked up when you pushed harder. This is the signal nobody wants but everyone should want. It means more messages won't buy you more sales, you've saturated the list, and the next dollar belongs somewhere else entirely. This is where you reallocate, out of cranking email and SMS frequency and back into demand creation, because the constraint is no longer how often you message people, it's how many people are on the list in the first place.

And this is the same logic for any variable-cost retention channel, not just email flows. Direct mail, SMS sends, a paid conversational layer, anything where each extra touch costs you real money should clear the same bar. If it carries a per-message or per-piece cost, it gets a holdout before it gets more budget. Whoever's asking for that budget, internal team or outside vendor, the answer is the same: prove it's incremental, or it doesn't get the money. Free-to-send email is the one place you can be a bit looser, because the marginal cost is near zero. The moment a channel costs you per touch, the rigour has to go up.

That last point reframes the whole exercise. The reason to test retention isn't to win an argument with your email manager. It's to find the ceiling. Once you know the flow is maxed, pouring more energy into it is just polishing a channel that's already done its job, while the actual growth lever, getting more qualified people onto the list, sits untouched.

The honest way to think about retention

None of this is an argument that email and SMS are weak. They're often genuinely strong, and they're usually cheap, which is exactly why an inflated number there does less damage than an inflated number on paid. The argument is narrower and more useful than "retention is overrated."

It's this: you cannot manage what you measure wrongly. As long as your retention channels report borrowed last-click credit, you'll over-invest in them, under-invest in the demand creation feeding them, and never know where the real ceiling sits. A user-level holdout, sized to your consideration period, replaces a flattering fiction with a number you can actually run the business on.

So here's the question I'd sit with. If you ran a clean holdout on your best-performing flow tomorrow, sized properly for how long your customers really take to buy, what do you genuinely think the incremental number would come back as, and how far is that from the figure on your dashboard right now?

Ethan To
CEO @ Pigeon Digital