The AI UGC Ad Stack We Actually Run for Shopify Clients in 2026 (Claude + Nano Banana + Seedance)

Roughly half the creators in the best AI ad I've shipped this year don't exist. Not "look a bit polished." Don't exist. No casting call, no shipping address, no invoice. And the account didn't blink - that ad scaled the same as the ones shot with real people.
That number does something to your head once it sinks in. If half a winning mashup can be conjured from a few text prompts, the old maths of creative production - the bottleneck that's throttled most brands for years - just quietly broke.
So let me walk you through the exact stack we run for Shopify clients now. The tools, the order, where it's brilliant, and the bit nobody on YouTube tells you: where it still flops, and where a real human is worth every dollar.
The stack, in one line
Claude writes the script and the prompts. Nano Banana builds the still frames. Seedance turns those frames into video. Then a human edits the pieces into something that actually sells.
That's it. Four stages. Each one is doing a job the next can't, and the order matters more than the individual tools. Swap Nano Banana for another image model and the pipeline still holds. Skip the assembly stage and you'll ship something that looks real and converts like a brick. More on that later.
Let me take the stages one at a time.
Stage 1: Claude writes the script (and it's the part that matters most)
Everyone gets excited about the video model. The video model is the least important thing here. The script is the asset. A flawless AI actor reading a flat script is a flat ad.
Here's how we feed it. First we grab a brand one rung up the ladder from our client - not the category giant, the brand that's clearly a stage ahead. We pull two or three of their best-performing ads from the ad library. Then we transcribe them. Gemini is handy for this because it'll actually watch a video and hand you the spoken transcript word for word, which saves an hour of manual work.
Those transcripts on their own mean nothing. They're scaffolding. We hand them to Claude along with the real product details - the actual PDP copy, the ingredient list, three genuine reviews - and ask it to write us several script options in that natural, spoken UGC register. Opus is the model we reach for here. It's the strongest at copy and the least likely to write something that smells like a press release.
What comes back is usually five or six hooks. Most are fine. One or two are genuinely good. The skill is knowing which - and that's a human judgement, not the AI's. You're looking for the hook that sounds like a friend who's a bit too obsessed with the product, not a brand talking at you. "I've been seeing this everywhere and I finally caved" beats "are you tired of struggling to find a moisturiser that hydrates" every time.
One thing I'd flag: do not let the model read its own script verbatim and call it done. The script is a guide. You still apply the brand's specific voice and flair before it goes anywhere near a video.
Stage 2: Nano Banana builds the frame
Before any video gets generated, you build a reference image. This is the second thing people skip, and skipping it is why so much AI UGC looks like soup.
The frame needs three ingredients, and you build each one deliberately:
- The avatar. The person. Be specific - "a woman in her mid-20s in a gym outfit" gets you a usable result, "a person" gets you mush.
- The environment. Where they are. An energy drink belongs in a kitchen, not a void. Keep it simple and real - modern kitchen, soft daylight, nothing baroque.
- The product. Yours, clear and visible. This is the one bit you don't generate. You bring the real packaging so the model has something true to anchor to.
Nano Banana stitches those into a single reference still. The reason this stage exists is control. If you let the video model invent the scene from a text prompt alone, it invents everything - the face, the lighting, the product, and the product is the one thing it'll get wrong. Lock the frame first and the video has a leash.
Here's a trick worth stealing: have Claude write you two prompts in this stage, not one. One prompt for the reference image. A separate prompt for what should actually happen in the video. They're different jobs. The image prompt is describing a photograph. The video prompt is describing a motion. Keeping them separate stops the model getting confused about which it's doing.
Stage 3: Seedance turns the frame into video
Now the still moves. You feed the reference frame and the motion prompt into Seedance, and out the other side comes an eight-second clip of your avatar doing the thing - applying the cream, sipping the drink, whatever the prompt asked for.
Two honest warnings here, because this is where money and time get wasted.
First, it isn't cheap. Generating video from Seedance, or Higgsfield, or Veo for the more cinematic from-scratch stuff, costs real credits per render. So check everything before you hit generate. Read the prompt back. Look at the frame. A sloppy prompt that produces a clip with the product in the wrong hand is money you don't get back. Fewer mistakes, fewer renders, lower cost.
Second, watch the whole clip before you trust it. I've had a generation where the actor started doing core exercises halfway through a skincare ad for no reason anyone could explain. The models drift. They'll do something odd in second six that you didn't ask for. If you're not watching, that odd second ends up in your ad set.
When it lands, though, it genuinely lands. An eight-second clip of a person who does not exist, with better lighting and a better angle than you'd get from most real shoots, saying exactly your script. That clip might have cost a month and a few thousand dollars to produce the old way.
Stage 4: the human assembly (the step that decides whether it sells)
This is the stage the tutorials skip, and it's the one that separates a real ad from a tech demo.
You do not run a single AI clip as a standalone testimonial. I haven't seen one work yet, and that's not really the AI's fault - single-testimonial UGC, human or not, has been fading for a couple of years now. A lone talking head, however real it looks, just doesn't carry an ad the way it did in 2022.
The value of all this generation is a content library. You take your AI clips and you cut them into a mashup - intercut with other footage, product shots, screen-recordings, sequences you already know perform. The AI clip becomes one strong beat in a piece that a human has paced. That's where the conversions live. The assembly is the ad. The clips are raw material.
And once you've got a winner, the scaling is where this stack earns its keep. Take the proven hook and remix it - same script, different actor; same actor, different read. You can generate a dozen variations of a winning concept in an afternoon. Push four to eight of those into a fresh testing pack, keep the original winner running in your scale campaign, and feed the new winners back into the remix machine. Round and round. The bottleneck used to be making the next variation. Now there isn't one.
What actually converts, and what just looks real
Here's my honest take after running this against real-creator UGC across a few accounts. Looking real and converting are two different things, and people conflate them constantly.
What converts:
- AI clips inside a human-edited mashup. Far and away the best use. The AI does the expensive part, the editor does the selling part.
- B-roll and scene-setting. Hands applying a product, a kitchen at golden hour, a lifestyle moment. Low-risk, high-quality, cheap to generate in bulk.
- Volume for testing. When you need 20 angles to find the one that hits, generating them beats waiting three weeks on creator shipments. Speed is the edge here, not realism.
What looks real and flops:
- The lone AI testimonial. Covered above. The format is tired regardless of who's in it.
- Anything emotionally complex. A genuine before-and-after where the person needs to convey relief, or trust, or a specific lived frustration. The voice still goes a touch monotone, and audiences feel the flatness even when they can't name it.
- High-consideration, high-trust products. If someone's spending A$200 and needs to believe a real human vouched for it, a synthetic face is working against you, not for you.
To put the cost side in perspective with invented but realistic numbers: on one homewares account we ran a batch of AI mashups against real-creator UGC for a fortnight. The AI batch came in at maybe a tenth of the production cost and held a CPA within about 10% of the human creators on cold prospecting. On a skincare account where the buyer needed to trust a face, the AI testimonials ran a CPA around 40% worse and we pulled them. Same tools. Opposite verdict. The product decided, not the tech.
Where the humans still win
Real creators aren't going anywhere, and I'd be wary of anyone telling you they are.
A genuine creator brings things the stack can't fake yet: real emotion in the eyes, an unscripted aside that lands because it's true, the specific authority of someone who actually uses the thing. For founder-led brands, for anything built on trust, for the hero testimonial that anchors a whole campaign - that's still a person's job. The AI fills the library around them. It doesn't replace the one clip that makes someone believe.
So the way I'd think about it isn't "AI or humans." It's a question of fit. Which parts of your funnel are about volume and speed, where AI is a gift - and which parts are about trust and emotion, where a real face still earns its cost?
Pull up your own account and ask it honestly. The brands getting this right in 2026 aren't the ones who went all-in on either side. They're the ones who knew, ad by ad, which lever they were actually pulling.
.webp)





