How to Generate an AI Authority Speaker Image (Full Guide)

Written By
Ahad ShamsAhad Shams
hero=section

Key Takeaways

  • A single keynote-on-stage image signals authority before a single word is read — it does what a selfie or headshot never can in high-trust niches like health, finance, coaching, and B2B.
  • The full pipeline is four steps: find a lighting reference on Pinterest, use Claude to build a precise prompt, generate the still in HeyOz with GPT Image 2, then animate it with Seedance 2.0 — all inside HeyOz.
  • Pinterest is the best free tool for finding lighting and composition references because it surfaces editorial and conference photography filtered by mood — give Claude the description of what you find, not the image itself.
  • Claude's job is to translate your loose creative direction into a technically precise image prompt — one that specifies focal length, lighting setup, stage environment, and subject body language in GPT Image 2's preferred syntax.
  • Seedance 2.0 animates the still with natural speaker gestures and subtle environmental motion — the result reads as real conference footage, not a slideshow.
  • Both GPT Image 2 and Seedance 2.0 are live inside HeyOz, so the entire pipeline — from prompt to finished ad — runs in one place without API keys or separate subscriptions.

What Is the Authority Shot and Why Does It Build Trust Instantly?

The authority shot is a specific type of image: a person mid-gesture on a physical stage, usually with a slide or screen glowing behind them and a dark audience implied below. It is the visual shorthand for expertise. When someone scrolls past it in a feed, their brain processes it as: this person has been invited to speak in front of a room. That is a very different signal from a headshot, a selfie, or even a polished studio portrait.

The shot works because stages have social proof baked into their visual grammar. Being on a stage means someone else vetted you, selected you, and gave you a microphone. That vetting is communicated entirely through the image — no caption required. Research in social psychology consistently shows that environmental cues like positioning and lighting significantly affect perceived authority before any content is consumed.

Until recently, getting this shot required a real speaking engagement or a $2,000+ event photoshoot with a rented venue and stage set. The pipeline in this guide produces the same visual output for the cost of a coffee, using tools available to anyone.

Which Niches Benefit Most?

The authority shot is highest-leverage in niches where trust is the conversion bottleneck — not price, not features. These include:

  • Healthcare and medical professionals running paid acquisition for practices or courses
  • Financial advisors, wealth coaches, and trading educators
  • Business coaches, executive coaches, and leadership consultants
  • B2B SaaS founders and thought leaders building a personal brand alongside their product
  • Agency owners positioning for enterprise clients

In all of these categories, the person in the ad is the product. The stage shot signals that they are the authority in the room before the viewer reads a single word of copy.

How Do You Find the Right Lighting Reference on Pinterest?

Pinterest is the best free reference library for this workflow because it surfaces real editorial and conference photography sorted by visual mood rather than keywords. You are not looking for a template — you are looking for a lighting and composition reference that you will describe to Claude in the next step.

What to Search on Pinterest

Use specific photographic language in your search to surface the right results. The most effective search queries for this purpose are:

  • keynote speaker photography dark stage
  • conference speaker dramatic lighting backlight
  • TED talk stage photography wide shot
  • medical conference speaker podium editorial
  • business summit speaker moody cinematic
  • thought leader stage portrait spotlight

What to Look For in a Reference

When you browse results, you are making four decisions, not one. Pin images that answer all four clearly:

  • Light source and direction. Is the key light coming from stage left, overhead, or behind the subject? Rim lighting (light from behind outlining the silhouette) reads as high-production. Flat frontal lighting looks corporate and flat. Spotlight from above-front reads as classic keynote.
  • Background element. Is there a projection screen with a visible slide? An abstract light pattern? A branded backdrop? A dark void with scattered audience lights? The background element changes the niche signal dramatically.
  • Subject position and gesture. Is the speaker at a podium, holding a microphone, walking the stage, or gesturing mid-point? Walking the stage with one hand raised reads as dynamic and confident. Podium with both hands gripping it reads as academic.
  • Color temperature and mood. Warm amber stage lighting against a cool dark background (the classic TED look) signals inspiration and authority. Cool blue tones signal technology or finance. Deep reds read as high-energy or medical.

How to Describe Your Reference to Claude

You do not need to upload or share the image with Claude. You describe what you see in plain language. Work through the four decisions above and write one sentence per decision. For example:

"The reference image shows a woman speaking on a wooden stage. The key light is warm amber coming from the front-left, with blue rim lighting from behind. There is a large projection screen behind her showing a single-slide graphic with white text on dark blue. She is mid-gesture, right hand raised at shoulder height, looking toward the left side of the frame. The mood is cinematic and high-contrast with a dark audience implied in the foreground."

That description is everything Claude needs to write a technically precise prompt in the next step.

How Do You Use Claude to Build Your GPT Image 2 Prompt?

Claude's role in this pipeline is prompt engineering, not image generation. You give Claude your creative direction in plain language — who the subject is, what niche they operate in, what lighting reference you found — and Claude translates that into the technically precise syntax that GPT Image 2 responds to best.

GPT Image 2 responds significantly better to prompts that include camera and lens specifications, lighting terminology, environmental depth cues, and body language descriptors — not just subject descriptions. Claude's job is to bridge the gap between your plain-language creative vision and that technical prompt language.

The Claude Prompt to Use

Open Claude (claude.ai or the Claude app) and paste the following prompt, filling in the bracketed sections with your own details:

--- CLAUDE PROMPT START ---

You are an expert prompt engineer specializing in photorealistic AI image generation. Your job is to take my creative direction and rewrite it as a technically precise, production-quality prompt for GPT Image 2.

Here is my creative direction:

Subject: [Describe the person — gender presentation, approximate age range, professional appearance, e.g. "a woman in her 40s, South Asian, wearing a tailored dark blazer"]

Niche/context: [e.g. "medical professional speaking at a health summit" or "financial advisor at a B2B leadership conference"]

Lighting reference: [Paste your description from the Pinterest reference here — the four-sentence description covering light direction, background element, subject gesture, and color temperature]

Mood target: [e.g. "cinematic, authoritative, warm, high-contrast" or "cool, precise, scientific"]

Using this direction, write a single GPT Image 2 prompt that:

  • Specifies the camera angle and focal length (e.g. "shot from slightly below stage level with a 85mm portrait lens")
  • Names the lighting setup precisely (e.g. "warm amber key light from front-left at 45 degrees, cool blue rim light from behind")
  • Describes the stage environment with specific physical details (flooring material, screen content, depth of field on audience)
  • Describes the subject's exact body language and expression
  • Ends with photographic style descriptors (e.g. "editorial conference photography, Canon R5, f/2.8, ISO 1600, shallow depth of field")
  • Includes a negative prompt section: avoid stock photo aesthetics, avoid flat lighting, avoid symmetrical posed portraits, avoid visible watermarks, avoid text on clothing

Output the prompt as a single block of text, ready to paste directly into GPT Image 2. Do not add any commentary — just the prompt.

--- CLAUDE PROMPT END ---

How to Refine the Output

After Claude returns the prompt, read it once for accuracy. The most common adjustments are:

  • If the subject description reads too generic, ask Claude: "Make the subject description more specific — add a detail about posture, expression, or attire that signals expertise in [niche]"
  • If the lighting feels too neutral, ask: "Push the lighting to feel more cinematic — increase the contrast between key and fill light and add a stronger rim light separation"
  • If the background feels wrong for your niche, ask: "Change the background to [describe what you want — e.g. a medical conference slide showing anatomy diagrams, or a finance summit backdrop with logo patterns]"
  • If you want a specific format for the final ad (vertical Story, square feed, horizontal video thumbnail), ask: "Rewrite the composition to work in a [9:16 vertical / 1:1 square / 16:9 horizontal] crop without cutting off the subject"

Once you are satisfied with the prompt, copy it. You will paste it directly into HeyOz in the next step.

Example Prompts by Niche

These are examples of what a well-formed output from Claude looks like for three common niches. You can use these as a starting point and adjust from here.

Medical / Healthcare:

Photorealistic editorial photograph of a woman in her late 40s, South Asian, wearing a well-fitted dark navy blazer over a white collared shirt, standing mid-stage at a medical summit. She is mid-gesture, right hand raised at shoulder height, eyes directed at the left side of the room with a composed and authoritative expression. Stage floor is dark polished wood. Behind her, a large high-resolution projection screen shows a clean data visualization slide with white text and an anatomical diagram on a dark blue background. Lighting: warm amber key light from front-left at 45 degrees, soft blue fill from the right, strong cool-white rim light from directly behind separating her from the screen. A shallow depth of field blurs rows of seated conference attendees in the foreground. Shot from slightly below stage level, 85mm portrait lens, f/2.8, ISO 1600, Canon R5. Editorial conference photography, high contrast, cinematic grain. Avoid stock photo aesthetics, avoid flat frontal lighting, avoid symmetrical posed portrait, no visible watermarks.

Finance / Wealth:

Photorealistic editorial photograph of a man in his mid-50s, white, silver-haired, wearing a tailored charcoal suit with no tie, standing confidently at center stage at a financial leadership summit. He is walking mid-step toward stage right, one hand extended forward as if making a closing point, looking directly toward the camera with relaxed authority. Stage is large and modern with dark matte flooring. Behind him, a curved LED wall displays a bold typographic slide: a single statistic in white on a deep navy background. Lighting: cool neutral key light from front-center, warm amber practical lights from two side towers, blue-violet accent light from below the screen casting subtle upward fill. Conference audience softly blurred in foreground bokeh. Shot from floor level looking slightly upward, 70mm lens, f/2.2, editorial finance photography, high-contrast, crisp. Avoid corporate stock photo look, avoid symmetrical composition, avoid studio backdrop, no text on clothing.

Business / Executive Coach:

Photorealistic editorial photograph of a woman in her early 40s, Black, natural hair, wearing a structured camel-colored blazer and gold earrings, standing on a TED-style round stage. She is in a wide-stance confident pose with both hands open at hip height, looking slightly downward toward a front-row audience member. The circular stage glows with warm amber from embedded floor lights. Behind and above: deep black void with a single large screen showing minimalist slide design — her name and title in clean white sans-serif on dark background. Key light: warm overhead spotlight from directly above creating strong facial shadow under cheekbones. Cool blue fill from stage right. Intimate dark theatre ambiance, audience barely visible in peripheral bokeh. 50mm, f/1.8, full-frame mirrorless, editorial portrait photography, TED conference aesthetic, film grain, high contrast. Avoid corporate softbox lighting, avoid flat expression, avoid visible logo watermarks.

How Do You Generate the Authority Shot in HeyOz with GPT Image 2?

GPT Image 2 is OpenAI's current image generation model, built for photorealistic output with accurate text rendering and strong compositional control. It is available directly inside HeyOz alongside Seedance 2.0 — you do not need a separate OpenAI account or API key.

Step-by-Step: Generating the Image in HeyOz

  1. Log in to HeyOz at heyoz.com and open a new project.
  2. Select the image generation tool and choose GPT Image 2 as the model.
  3. Paste the prompt Claude generated in the previous step directly into the prompt field.
  4. Set the aspect ratio to match your target ad format: 1:1 for feed, 9:16 for Stories and Reels, 16:9 for YouTube thumbnails or video ads.
  5. Generate. GPT Image 2 typically produces four variations. Review all four before selecting — the best result is rarely the first one returned.

What to Look for in the Generated Results

Evaluate each variation against four criteria before moving to Seedance:

  • Face quality. The face must look like a real person — not overly symmetrical, not plastic-smooth, with natural skin texture and catch light in the eyes. Reject any image where the face looks generated or too polished.
  • Lighting consistency. Check that the light on the subject's face matches the light sources described in the prompt. If the prompt specifies warm amber from the left, the left side of the face should be warmer and brighter.
  • Stage believability. The floor, background screen, and any architectural elements should feel physically plausible. Look for distorted perspective lines, impossible shadows, or surfaces that don't make sense spatially.
  • Body language. Hands and arms are where AI images fail most visibly. Select the image where the hands look natural — not extra fingers, no merged limbs, nothing that would catch a viewer's eye on second look.

How to Iterate If the First Run Misses

If none of the four variations meet all four criteria, go back to Claude with a specific note on what failed. Use this follow-up structure:

"The results came back with [specific issue — e.g. 'the hands look distorted' / 'the lighting is too flat' / 'the background screen looks like a logo wall not a slide']. Rewrite the prompt to specifically address this issue without changing the core direction. Add a negative prompt line that explicitly avoids [the specific failure]."

Most authority shots reach an acceptable result within two to three iterations. Once you have a clean still that passes all four checks, save it — this is the input for Seedance 2.0 in the next step.

How Do You Animate the Authority Shot with Seedance 2.0 in HeyOz?

Seedance 2.0 is ByteDance's video generation model. It takes a still image as input and produces a short video clip with natural motion — in this case, realistic speaker gestures, subtle weight shifts, ambient stage movement, and natural breathing motion. The output reads as genuine conference footage, not a static image with motion blur applied.

Seedance 2.0 is available directly inside HeyOz. You bring your GPT Image 2 still straight into the Seedance workflow without exporting or uploading anywhere.

Step-by-Step: Animating in HeyOz

  1. With your GPT Image 2 result open in HeyOz, select the option to animate with Seedance 2.0.
  2. Set the clip length. For a social ad, 4 to 6 seconds is the standard. For a longer brand video or a YouTube pre-roll, use 8 to 10 seconds. Seedance 2.0 handles both well.
  3. Write a motion prompt. This tells Seedance what kind of movement to generate. The motion prompt is separate from the image prompt — it describes motion only, not appearance.
  4. Generate and review the output. Pay attention to face stability, hand motion quality, and whether the background elements move in a believable way.

The Seedance Motion Prompt to Use

The motion prompt is the most important variable in Seedance output quality. Vague motion prompts produce generic motion blur or unnatural jerking. Specific motion prompts produce controlled, realistic speaker animation.

Use the following as your base motion prompt and modify the bracketed section for your specific image:

--- SEEDANCE MOTION PROMPT ---

Subtle realistic speaker motion. The subject [describe the starting position from your still — e.g. "has right hand raised at shoulder height mid-gesture"] makes a slow, natural completing motion — [describe the intended movement, e.g. "lowers the hand slightly while shifting weight to the left foot, then brings the hand back to center"]. Facial expression stays composed and engaged, head turns slightly toward stage left at the midpoint. Eyes remain directed at the audience with a confident, relaxed gaze. Natural breathing motion in chest and shoulders. Background: stage lighting flickers imperceptibly, projection screen content stays static. Audience in foreground bokeh has minimal natural micro-movement — slight head shift, no dramatic motion. Camera stays locked, no pan or tilt. Realistic conference video footage quality, no motion artifacts, smooth natural motion throughout.

--- END OF MOTION PROMPT ---

Common Motion Adjustments by Use Case

Adjust the motion prompt based on how you plan to use the clip:

  • For a silent ad with text overlay: Keep motion minimal — just natural breathing and a subtle head nod. You want the subject to feel alive without competing with the text the viewer is reading.
  • For a video ad with voiceover: Add slightly more gesture — a slow hand movement that completes across the clip duration. The motion should feel synchronized with speech rhythm even without actual audio.
  • For a looping background video: Use the minimum motion setting and specify "loop-ready motion — subtle enough that the cut from end frame to start frame is not visible." Seedance 2.0 handles this well when the starting pose is symmetrical.
  • For a Reel or TikTok opening frame: Add a slow walk: "subject takes one deliberate step toward stage front, stops, and gestures." This creates natural forward energy that hooks on scroll without being distracting.

What to Do If the Face Drifts

Face drift — where the generated face changes subtly over the clip duration — is the most common quality issue with AI video. If it occurs, add this line to your motion prompt: "Face and facial features must remain completely stable and consistent throughout the entire clip. Zero facial morphing or identity drift. Only expression and gaze direction may shift, and only subtly."

For severe drift cases, shorten the clip to 3 to 4 seconds. Shorter clips give Seedance less time to accumulate inconsistency and the quality is almost always better on shorter durations.

Assembling the Finished Ad in HeyOz

Once you have a Seedance clip you are satisfied with, the complete ad assembly happens inside HeyOz without switching tools. Layer your ad copy over the video using HeyOz's editor, add your brand colours and CTA, and export in the format required for your target platform. The authority shot is now a finished, platform-ready ad asset.

The entire pipeline — from Pinterest reference to finished animated ad — takes under an hour on the first pass. On repeat runs for the same brand, with a saved Claude prompt template and a saved Seedance motion prompt, it takes 15 to 20 minutes.

Frequently Asked Questions

Is it legal to use an AI-generated person in an ad?

An AI-generated image of a fictional person contains no real individual's likeness and therefore does not require a model release. There is no person to release. The generated subject has no identity outside the image. The standard disclosure requirements that apply to AI-generated content in advertising vary by platform — Meta requires disclosure for AI-generated imagery in political ads, and the FTC's general deception guidelines apply to all commercial claims — but for non-political brand ads using a fictional AI-generated spokesperson, the practice is currently within platform terms of service. Always verify current platform policies before running.

Does GPT Image 2 always produce clean hands and faces?

Not always on the first generation, but significantly more reliably than earlier models. The most effective way to improve hand quality is to specify the hand position and gesture explicitly in the prompt rather than leaving it implicit. Instead of "mid-gesture," write "right hand at shoulder height, palm open facing the audience, fingers naturally extended, no objects held." Explicit hand direction reduces failure rate substantially.

What if I want to use my own face, not a fictional one?

If you want the authority shot with your own face, the workflow changes slightly. Instead of generating a fictional subject from a text prompt, you use a reference photo of yourself as an input image to HeyOz and ask GPT Image 2 to composite you onto a stage environment matching your prompt description. This uses the image editing mode rather than text-to-image. The quality of the result depends heavily on the quality and lighting of your reference photo — a clean, well-lit headshot with neutral background gives the best transfer result.

How do I make the expert look like they are from a specific niche?

Niche signaling in the image comes from three elements: attire, props, and background slide content. A doctor reads as a doctor because of the white coat or professional medical attire, the specific slide content behind them (anatomical or clinical), and the conference branding visible in the environment. A finance professional reads as one because of a suit, a Bloomberg-style data visualization slide, and a neutral premium conference aesthetic. Specify all three in the Claude prompt and include them in your Pinterest reference search.

How many iterations does it typically take to get a usable still?

Most users get a usable result within the first batch of four generations if the Claude prompt is specific. The most common reason for needing more iterations is an underspecified prompt — particularly vague gesture descriptions, unspecified light direction, or missing negative prompts. If you use the Claude template in this guide and fill it out completely, you should expect a clean result in one or two generation runs.

Can I run this pipeline for multiple niches or brands?

Yes — and this is where the pipeline scales well for agencies. Each niche requires its own Pinterest reference pass and its own Claude prompt run. Once you have a working prompt per niche, it becomes a template. You run it through HeyOz each time with minor adjustments for the specific campaign, without rebuilding the prompt from scratch. A three-niche agency can maintain three saved prompt templates and produce new authority-shot creatives for each client in under 30 minutes per run.

About the author

Ahad Shams

Ahad Shams is the Founder of HeyOz, an all-in-one ads and content platform built for founders and small teams. He has worked across consumer goods and technology, with experience spanning Fortune 100 companies such as Reckitt Benckiser and Apple. Ahad is a third-time founder; his previous ventures include a WebXR game engine and Moemate, a consumer AI startup that scaled to over 6 million users. HeyOz was born from firsthand experience scaling consumer products and the need for a unified, execution-focused marketing platform.