Test Velocity Is the Only Growth KPI You Fully Control (Most Brands Run Under 2 Real Tests a Month)

The thing everyone repeats is that growth comes from the big swings. The winning angle. The genius offer. The one creative that changes everything. So founders sit around waiting for lightning, and they treat testing as the boring admin you do between the exciting bets.

Here's what actually happens in the accounts I look at. The brands that grow aren't the ones with better ideas. They're the ones who run more tests, more often, and let the maths do the work. Almost nobody has a shortage of ideas. What they have is a shortage of shots taken.

And most of them are taking far fewer shots than they think. When I audit an account and ask how many real tests ran last month, the honest answer is usually one. Sometimes none. A tweak to a budget isn't a test. Swapping a headline on a whim isn't a test. I mean a deliberate experiment with a question attached. By that bar, most brands sit under two a month.

That's the whole problem in one number. So let me make the case for treating test velocity as the KPI you build everything else around, and then show you the cadence I'd actually run.

Why velocity is the only input you fully control

Think about your real growth levers for a second. You don't control CPMs. You don't control what Meta does to its algorithm next quarter. You don't control whether the consumer feels rich this month or skint. Most of the things that move your numbers are weather, and you don't get a vote on the weather.

You control one thing completely: how many tests you put live, and how good you are at reading them. That's it. That's the lever with your name on it.

I keep coming back to a piece of data operators love to cite. A big platform ran an analysis with some academics and found that teams who experiment past a certain frequency carry meaningfully lower acquisition costs than teams who don't, something on the order of 17% lower CAC. Now, I treat round stats like that as a direction, not gospel. But the direction is the point. Zoom out over a full year and it's almost obvious. More shots, more winners, lower cost to acquire. The teams winning aren't smarter. They're just up at bat more.

There's a reason the best operators have stopped chasing a KPI like "tests that win" or "tests that drove an X% lift". Tying a goal to the outcome is nearly impossible, because you can't will a test into winning. So they set the goal on the input instead: volume. Run the experiments properly and the wins take care of themselves.

The compounding bit founders underrate

Here's the part that doesn't show up on a weekly dashboard, which is exactly why it gets ignored.

Say two brands are the same size today. Brand A runs two real tests a month. Brand B runs eight. If one in five tests produces something worth keeping, Brand A banks roughly five keepers a year. Brand B banks around nineteen. Those aren't one-off wins. A better hook, a cleaner offer, a landing page that converts a bit harder, each one becomes the new baseline that the next test builds on top of.

To put that into perspective: a year on, Brand B isn't 4x better in some single metric. It's compounding from a higher floor in four or five places at once, and the gap between the two is now structural. Brand A can't close it by having one brilliant idea, because Brand B is improving from a stack of small wins that Brand A never collected.

That's the case for velocity in one line. You're not playing for this month's win. You're raising the floor you compound from.

A sane cadence, by surface

"Test more" is useless advice on its own. The useful version is a cadence per surface, so it becomes a system instead of a vibe. Here's the rhythm I'd hold a growing Shopify brand to. Treat these as targets to build toward, not day-one mandates.

  • Ad creative: aim for three to four real tests a month. This is your engine and it should always have something live. Different angle, different hook, different format. The discipline is that you're testing one variable with a question behind it, not just shovelling out more ads and hoping. Volume here is non-negotiable, because creative is where the biggest swings actually hide.
  • Landing pages and on-site: roughly six to ten a month, traffic permitting. This is where most brands badly underinvest. If you're sending real volume to a page, you've got the sessions to run more than the one test a quarter most brands manage. The cap here is your traffic, not your imagination.
  • Offers and post-purchase: one or two a month. Slower by nature, because the stakes per test are higher and offers touch margin. But "slower" still means something is always in flight. Volume of emails on a launch, a delayed upsell window, a bundle against a single SKU.

None of those numbers are sacred. The point is that every surface has a heartbeat, and you can look at a calendar and see whether it's beating. If a surface has gone a month with nothing live, that's the gap, and you've found it in seconds.

The test that's worth keeping vs the one that isn't

A trap I see constantly: a brand finally builds some velocity, then burns it on tests so specific to one launch that the learning dies the moment the launch ends.

So here's the filter I'd put on every proposed test. A good test produces a learning you can reuse. Before you greenlight one, ask: if this wins, does it become a best practice for the next launch, or is it a one-off?

"Does five emails beat one email when we drop a new seasonal range" is a brilliant test, because whatever you learn applies to every seasonal drop after it. "Does this exact bespoke layout work for this exact hero product we'll never sell again" mostly isn't, because the answer evaporates. Same effort, wildly different payoff. Skew your volume hard toward the tests that leave you with a rule you can run again.

Single-channel tests run themselves. Cross-channel ones need an owner.

This distinction is what keeps velocity from turning into chaos, and it's the bit brands miss.

Most of your tests are single-channel and self-managed. A subject line. An ad variant. A new content angle. A post-purchase flow tweak. Nobody needs to project-manage these. The expectation is simply that they run constantly and the result gets shared, so the whole team learns from it rather than one person quietly logging it and moving on. That's where raw volume comes from.

The other kind is the cross-channel test, and it's a different animal. An eight-hour delayed-order upsell that needs ops, the ecommerce team, and email all moving together. Those are genuinely hard to pull off without deliberate coordination, and if you try to wing them they collapse, because everyone assumes someone else owns the handoff. So those get a named owner and a slot on a roadmap. Everything else stays loose and fast.

Get that split right and you can run high volume without it becoming a mess. The cheap, frequent tests stay quick and low-friction. The expensive, coordinated ones get the structure they actually need.

If you don't have a data team

You don't need a data scientist to run a real testing programme. You need two tiers, and the honesty to know which one you're in.

Tier one: the before-and-after read. This is the workhorse for most brands and it's completely fine. You change one thing, you compare a clean window before to a clean window after, and you watch a couple of supporting signals so you're not fooled by noise. It isn't a controlled experiment and you shouldn't pretend it is. But if a channel is clearly your core, and you're being genuinely careful about what you compare it against, you can make good calls this way and move fast. Speed is the whole point at this stage.

Tier two: the lift study. When a change is about to scale, say you're moving from 10% of your spend to most of it, the before-and-after read stops being enough. That's when you want an actual holdout or a geo-style test, where a slice of your audience or your regions doesn't get the thing, so you can see the true incremental effect rather than something the platform happily takes credit for. You don't run these on everything. You run them on the decisions big enough to hurt if you've got them wrong.

The mistake is using the wrong tier for the moment. A lone subject-line test doesn't need a geo holdout. A decision to flip your whole account onto a new setting absolutely does. Match the rigour to the stakes and you get speed where speed is safe and rigour where it actually matters.

Where to start

Pull up last month and count it properly. Not the tweaks, not the budget nudges, the real tests with a question attached. If you land under two, you've just found the cheapest growth lever you've got, and it doesn't depend on CPMs cooperating or a flash of genius arriving on schedule.

If you'd value an outside count of where your velocity actually sits, and which surfaces have gone quiet, that's a fair chunk of what a Signal/Noise Audit surfaces. Sometimes the most useful thing is just someone tallying the shots you're really taking, so you can see the gap for yourself.

Ethan To
CEO @ Pigeon Digital