How to Make AI UGC Store-Discovery Ads with Claude, Seedance 2.0, and GPT Image 2

Written By
Ahad ShamsAhad Shams
hero=section

TL;DR

The store-discovery UGC ad follows a creator on a handheld camera as they walk into a store hunting for a product, scan the shelves, find it, react to the camera as if it is a lucky find because the product is always sold out, and then deliver a short talking-head pitch on why it is worth the hunt. You can produce this format end to end with three AI tools: GPT Image 2 generates the actor as a consistent character and the product reference, Claude Opus 4.8 writes the script and the per-shot video prompts, and Seedance 2.0 renders the clips from those prompts.

Every prompt you need is written in full on this page. Once you have done it once and dialed in your actor, one finished store-discovery ad takes about an hour from idea to export. The creator in these ads is AI-generated and never filmed, which is what makes the format cheap to produce at volume.

Why does the store-discovery format convert?

The format stacks three psychological triggers into one fifteen to thirty second clip, and it does it while looking like a video a friend shot on their phone. That combination is what beats a polished product ad in the feed.

The scarcity framing does the heavy lifting. When the creator says the product is sold out everywhere and they finally found it, the viewer reads the product as already validated by demand. Nobody hunts for something nobody wants. The scarcity line converts a feature claim into social proof without stating a single statistic.

The hunt builds the anticipation that makes the payoff land. You are watching someone scan shelves with real tension, so when they find it, you feel the small hit of relief with them. That earned reaction reads as genuine in a way a staged smile never does. The discovery moment is the emotional center of the ad, and it only works because the hunt set it up.

Then the talking-head pitch closes. By the time the creator turns to camera, you have already accepted that the product is in demand and that this person is excited for real. The pitch is no longer an ad read. It is a recommendation from someone who just got lucky. The handheld, slightly imperfect camera work is the final piece, because it signals organic content and slips past the part of the viewer that scrolls away from anything that looks bought.

What is the 6-beat structure of a store-discovery ad?

Every shot you generate maps to one of these six beats. This is the destination. The Claude and Seedance steps below produce exactly these beats in order.

  1. Cold-open hook. The creator walks through the store entrance, camera following, already talking. The hook line lands in the first second.
  2. The hunt. They move down an aisle scanning shelves, mild tension, narrating that this thing is impossible to find.
  3. The find. Camera pushes in on the product on the shelf. A beat of recognition.
  4. The reaction. They grab it, turn to camera, and deliver the scarcity line with genuine excitement.
  5. The pitch. A short talking-head segment, product in frame, on why it is worth hunting for.
  6. The soft CTA. One natural line telling the viewer where to get it, no hard sell.

Six beats, four to six rendered shots, because the pitch and CTA usually share one talking-head shot. Keep that map in front of you for the rest of this guide.

What you need

  • GPT Image 2 access, for the actor and product reference images. It has a free entry point you can start with; high-volume generation moves to paid credits. [VERIFY: current GPT Image 2 free-tier limits]
  • Claude access, for the script and shot prompts. The free tier is enough to run the prompt below; heavy use moves to a paid plan.
  • Seedance 2.0 access, for rendering the clips. Image-to-video rendering generally consumes credits, so plan for paid tiers at the render step. [VERIFY: current Seedance 2.0 pricing and free-tier limits]
  • A basic editor for stitching shots, adding captions, and adding the CTA frame. CapCut or any timeline editor works.
  • Your product image, ideally a clean front-facing shot of the packaging with a readable label.

Honest note on cost: the tools have free entry points, but a finished, rendered video usually touches paid credits at the Seedance step. Treat this as low-cost, not zero-cost. If chaining three tools is not appealing, this same store-discovery format also exists as a one-step template inside HeyOz. The rest of this guide teaches the manual method, which is the main event.

Step 1: Build the actor and product with GPT Image 2

Everything downstream depends on one thing: a fixed reference image of your actor. This is the single most important habit in AI UGC. If you generate a fresh face for every shot, the creator will look like a different person in each clip, and the ad falls apart. You generate one actor reference image, lock it, and reuse it as the seed for every shot in this ad and every future ad with the same creator.

Specify your creator like you are casting. Age range, ethnicity, build, hair, wardrobe, and most importantly a lighting and camera style that reads as phone-shot, not studio. Studio lighting is the fastest way to make UGC look like an ad. You want slightly uneven light, a normal phone lens look, and a real-world background.

Prompt A: the actor reference

Paste this into GPT Image 2 and fill the brackets. Generate several, then pick one and treat it as the locked reference.

===== COPY: GPT IMAGE 2 ACTOR REFERENCE =====

A candid vertical portrait of a [AGE RANGE, e.g. 27 year old] [GENDER] [ETHNICITY] content creator, [HAIR DESCRIPTION], [BUILD], wearing [EVERYDAY WARDROBE, e.g. a plain oversized hoodie and small gold hoops]. Natural expression, looking slightly off camera, relaxed. Shot on a front-facing phone camera, slightly uneven natural indoor lighting, mild lens distortion, realistic skin texture with visible pores and minor imperfections, no makeup gloss. Plain everyday background. The image should look like a real selfie or a frame from a phone video, not a studio photo, not retouched, not an advertisement. Vertical 9:16 framing, head and upper shoulders in frame.

===== END =====

Prompt B: the actor in-store, product in hand

This places your locked actor in the store setting holding the product, so the find and the pitch read natively. If your tool supports a reference image, attach your locked actor reference and your product image. Restate the face so it does not drift.

===== COPY: GPT IMAGE 2 ACTOR IN-STORE =====

The same person from the reference image, identical face, identical [HAIR] and [WARDROBE], standing in the aisle of a [STORE TYPE, e.g. pharmacy / beauty retailer / supermarket]. They are holding [PRODUCT NAME], a [PRODUCT DESCRIPTION: shape, color, label], turned toward the camera with a surprised, excited expression as if they just found it. Bright retail overhead lighting, shelves stocked with blurred products behind them, phone-camera look, handheld feel, realistic skin texture, slight motion imperfection. The product label is fully visible, sharp, and correctly spelled. Vertical 9:16, waist-up framing. Looks like a real phone video frame, not a studio ad.

===== END =====

Two things break most often here. First, the product label garbles. Regenerate until the label is sharp and correctly spelled, and if your tool accepts a product reference image, attach the real packaging so the model copies it rather than inventing it. Second, the face drifts from your reference. When it does, regenerate that single image with the line identical face from the reference, do not change facial structure or age, and do not push forward with a face that is even slightly off. Drift compounds across the video, so kill it at the image stage.

Holding one face steady across many generations is the same discipline covered in the storyboard method for AI video, which is worth reading if you want the deeper version of character locking.

See: The storyboard method for AI video ads .

Step 2: Write the script and shot prompts with Claude

Claude is the brain of this workflow. You are not using it to write a caption. You are using it as a UGC scriptwriter and shot-list director that outputs both the spoken lines and a ready-to-paste video prompt for every shot. The output of this one prompt feeds the entire Seedance step.

Paste this into Claude and fill the brackets. It is built around the six-beat structure from earlier, so it returns shots in the exact order you will assemble them.

===== COPY: CLAUDE SCRIPT AND SHOT-LIST PROMPT =====

You are a short-form UGC scriptwriter and shot-list director for AI-rendered video ads. You write scripts that sound like a real person talking to their phone, never like an ad read.

Here is the brief:

PRODUCT: [product name and one line on what it is]

TARGET BUYER: [who buys this, their situation, the outcome they want]

SCARCITY ANGLE: [why it is framed as hard to get, e.g. sells out online constantly, viral on TikTok, never in stock]

STORE SETTING: [pharmacy / beauty retailer / supermarket / etc.]

ACTOR: [one line describing the locked actor from your GPT Image 2 reference: age, look, wardrobe]

Write a store-discovery UGC ad that follows this exact 6-beat structure: (1) cold-open hook walking into the store already talking, (2) the hunt scanning shelves with mild tension, (3) the find, (4) the excited to-camera reaction delivering the scarcity line, (5) a short talking-head pitch on why it is worth it, (6) a soft CTA. Total spoken length should fit roughly 15 to 30 seconds.

Output a structured shot list. For each beat, give me a block with these exact fields:

SHOT [number] - [beat name]

ON-SCREEN ACTION: what physically happens in frame.

CAMERA: the camera movement (handheld follow, shelf-scan pan, push-in on the find, or talking-head framing).

EXPRESSION: the actor's facial expression and emotion in this shot.

SPOKEN LINE: the exact words said, written casually with natural filler, contractions, and the scarcity or finally-found-it hook where it fits. No corporate phrasing.

SEEDANCE PROMPT: a ready-to-paste image-to-video prompt for this shot. Describe the motion, the camera handling as handheld phone footage, the lighting, and restate the actor as the same person from the reference image with an identical face. Keep each shot to a few seconds of motion. Avoid anything that would warp the face or hands.

After the shot list, give me three alternate hook lines for SHOT 1 and three alternate scarcity lines for SHOT 4, so I can test variations.

===== END =====

Good output reads like a person, not a brand. The spoken lines should have contractions and small imperfections, the scarcity line should sound like real surprise rather than a slogan, and every Seedance prompt should already restate the actor identity so you can paste it straight into the render step. If a shot block is missing the actor restatement or the camera direction, tell Claude to fix that field and it will.

One tip for the discovery reaction, the beat that makes or breaks the ad. Ask Claude to write the reaction line as if the camera caught it half a second late, so the creator is already mid-gasp when they turn. A reaction that starts before the line, rather than a clean delivered sentence, is what stops it from feeling scripted. You can add this directly: make the SHOT 4 reaction feel caught off guard, not performed.

For more on writing UGC scripts that hold attention, see how to create AI UGC ads at scale .

Step 3: Render the shots with Seedance 2.0

This is the hardest step, so treat it as such. Two problems will eat your time if you do not plan for them.

The first is identity consistency across shots. The walk-in, the find, and the talking-head are separate renders, and the model wants to drift the face between them. Your defense is to use the GPT Image 2 actor reference as the image-to-video seed for every shot, and to restate the actor description in text inside every Seedance prompt. Never render a shot from text alone if you can seed it from the locked image.

The second is motion that feels too clean. Seedance can produce smooth, gliding, cinematic motion by default, and cinematic motion is the opposite of phone-shot UGC. You have to ask for the imperfection on purpose: handheld sway, small bumps, the slight lag of a real arm holding a phone.

Below are templates for the four key shot types. Drop the spoken line and any specifics from your Claude shot list into the brackets, and seed each one with the matching GPT Image 2 frame.

Walk-in follow shot

===== COPY: SEEDANCE WALK-IN =====

The same person from the reference image, identical face and wardrobe, walking forward through a [STORE TYPE] entrance, camera following from the front at chest height as if a friend is walking backward filming on a phone. Natural handheld sway and small bumps, slightly shaky, phone-camera look, bright retail lighting. They are talking to the camera casually. Realistic skin, no warping. Short clip, continuous motion, vertical 9:16.

===== END =====

Shelf-scan shot

===== COPY: SEEDANCE SHELF-SCAN =====

Handheld phone camera scanning along a stocked store shelf, slight pan and searching movement, mild urgency, as if looking for one specific item. Occasional small refocus, natural camera shake, bright retail overhead lighting, blurred product packaging on the shelves. No people in frame or the same person's hand entering frame to move products. Short clip, vertical 9:16, realistic phone footage.

===== END =====

The find / push-in shot

===== COPY: SEEDANCE FIND PUSH-IN =====

Handheld phone camera pushing in toward [PRODUCT NAME] sitting on a store shelf, a quick excited move closer as if the person just spotted it, slight zoom and shake. The product label stays sharp, correctly spelled, and centered. Bright retail lighting, blurred shelf around it. Very short clip, fast but not smooth, realistic phone footage, vertical 9:16.

===== END =====

Talking-head delivery shot

===== COPY: SEEDANCE TALKING-HEAD =====

The same person from the reference image, identical face, [HAIR] and [WARDROBE], holding [PRODUCT NAME] up near their face with the label facing the camera, talking directly to the phone in a selfie-style handheld shot. Natural lip movement matching speech, expressive and excited but real, small handheld sway, bright store or natural lighting, realistic skin texture. The face must stay identical to the reference throughout, no morphing. Short clip, vertical 9:16, looks like a real phone selfie video.

===== END =====

Troubleshooting

  • Face drifts between shots: seed every shot from the same GPT Image 2 reference and add identical face from the reference, no morphing to the prompt. Re-render rather than accepting a near match.
  • Product warps or the label garbles: shorten the clip, keep the product more static in frame, and restate label sharp, correctly spelled, do not distort.
  • Motion too smooth and cinematic: add handheld sway, small bumps, slightly shaky, phone camera and remove any word like smooth, gliding, or cinematic.
  • Lip-sync off on the talking head: keep the talking-head clip short, render the line in one take, and if your tool has a dedicated lip-sync or audio-driven mode, use that rather than free generation.
  • Lighting too clean and ad-like: ask for uneven natural lighting, mild lens imperfection and drop any studio or softbox language.

Keeping one actor consistent across the walk-in, the find, and the talking-head, and chaining all these prompts in order, is the part that takes practice. If you would rather not learn the chaining, HeyOz hosts this store-discovery format as a ready-made template that handles character consistency and the full shot chain internally. You pick the template, drop in your product, and optionally choose the actor and setting. The manual method here teaches you the craft and gives you full control; the template just gets you the ad.

Step 4: Assemble and ship

Drop your rendered shots into any editor in beat order: walk-in, hunt, find, reaction, pitch, CTA. The assembly is tool-agnostic, so use whatever timeline editor you already know.

Cut tight. The cold open has to hit in the first second, so trim any dead frames before the creator starts talking. Do not linger on the hunt; it builds tension fastest when it is short. The find and the reaction can hold a half beat longer because that is the payoff. Then move to the pitch and out.

Add captions, because most of the feed plays muted and the scarcity line has to land on the screen as well as in the audio. Add a clean CTA frame at the end with where to buy. Export vertical at a Meta-friendly ratio and keep the total length tight, in the fifteen to thirty second range the script was written for. Once your actor is built and your Claude prompt is dialed in, the whole loop from idea to exported ad runs in about an hour.

A note on AI disclosure

The creator in these ads is AI-generated and was never filmed, so do not present the clip as a genuine personal testimonial from a real customer. Some platforms and regions are introducing rules on disclosing AI-generated people and synthetic media in advertising, and those rules are moving. Check the current ad policies for your platform and your market before you run, and when in doubt, keep your claims about the product honest and verifiable. This is practitioner guidance, not legal advice.

How do you scale this to 100 ads and test on Meta?

The reason this format is worth learning is that the system is reusable. Your locked GPT Image 2 actor and your Claude prompt are not single-use. Change one variable and you have a new ad: a different actor, a different store, a different scarcity hook, a different product. Run the Claude prompt again with new brackets and it returns a fresh shot list in seconds.

That flips where your bottleneck sits. Production used to be the constraint, so you tested a handful of angles and guessed. When one creator can shoot dozens of store-discovery variations without ever being filmed, production stops being the limit and testing throughput becomes the game. You ship many, point a Meta test at them, and let the data pick the winner instead of picking it yourself in advance.

Build a small matrix: three actors by three scarcity hooks by three stores is twenty-seven distinct ads from the same system. Producing that many actor, hook, and store variations quickly is where the template path saves the most time, since HeyOz spins each variation without re-chaining the three tools by hand. For the broader volume-and-testing logic, see how to make AI UGC ads without filming yourself .

Frequently Asked Questions

How do you make AI UGC videos?

You generate a consistent actor as a reference image with an image model like GPT Image 2, write the script and per-shot video prompts with a language model like Claude, then render each shot with an image-to-video model like Seedance 2.0 using the actor reference as the seed. You stitch the shots in an editor, add captions and a CTA, and export vertical. The key skill is keeping the same face across every clip.

What AI tool makes a talking-head creator video?

An image-to-video model such as Seedance 2.0 animates a still of your actor into a talking-head clip with natural motion. You seed it with a fixed actor image from GPT Image 2 so the face stays consistent, and you write the spoken lines with Claude. For best lip-sync, keep the talking-head clip short and use a dedicated audio-driven mode if your tool offers one.

How do I keep the same AI face across clips?

Generate one actor reference image and lock it. Use that exact image as the image-to-video seed for every shot, and restate identical face from the reference, no morphing in each prompt. Do not render any shot from text alone if you can seed it from the locked image, and re-render any clip where the face drifts rather than accepting a near match. Drift compounds, so fix it shot by shot.

Can I use AI UGC creators in Meta ads?

Yes, AI-generated creators are used as Meta ad creative today. Keep your product claims honest and follow Meta's advertising policies, which apply to the claims regardless of how the video was made. Because rules on synthetic media are evolving, check the current policy for your market before you run.

Do I have to disclose that a creator is AI?

Some platforms and regions are introducing disclosure rules for AI-generated people and synthetic media in advertising, and those rules are changing. Do not pass an AI creator off as a genuine personal testimonial from a real customer. Check the current ad policies for your platform and your market, and treat this as practitioner guidance rather than legal advice.

How long does it take to make one AI UGC ad?

Once your actor is built and your prompts are dialed in, one finished store-discovery ad takes about an hour from idea to export. The first one takes longer while you generate and lock the actor and learn the render quirks. After that, new variations are fast because you reuse the same actor and the same Claude prompt.

How do you make AI video look like real phone footage?

Ask for the imperfection on purpose. Specify handheld sway, small bumps, a phone-camera look, uneven natural lighting, and realistic skin texture, and remove any word like smooth, cinematic, or studio. Generate the actor reference in a phone-selfie style from the start, since the source frame sets the look. Tight cuts and on-screen captions complete the organic feel.

Can I make this without prompting or chaining three tools?

Yes. HeyOz hosts this exact store-discovery format as a one-step template. You pick the template, drop in your product, and optionally choose the actor and setting, and it handles character consistency and the full shot chain internally. The manual method in this guide teaches the craft and gives you full control; the template is the shortcut to the output.

Start with the prompts

The workflow above is the real method, and every prompt you need to run it is on this page. Build your actor in GPT Image 2, write the shot list with Claude, render in Seedance 2.0, and assemble. Do it once and you own a system you can rerun for any product, any store, any scarcity hook.

If you want the output without the learning curve, the same store-discovery format is a ready-made template inside HeyOz: drop in your product and get the finished UGC clip, no prompting and no chaining. Either way, the bottleneck is no longer production. It is how many angles you are willing to test.

About the author

Ahad Shams

Ahad Shams is the Founder of HeyOz, an all-in-one ads and content platform built for founders and small teams. He has worked across consumer goods and technology, with experience spanning Fortune 100 companies such as Reckitt Benckiser and Apple. Ahad is a third-time founder; his previous ventures include a WebXR game engine and Moemate, a consumer AI startup that scaled to over 6 million users. HeyOz was born from firsthand experience scaling consumer products and the need for a unified, execution-focused marketing platform.