> ## Documentation Index
> Fetch the complete documentation index at: https://docs.aytada.app/llms.txt
> Use this file to discover all available pages before exploring further.

# How Aytada generates your marketing assets

> Walk through Aytada's AI production pipeline — from product input and awareness stage selection to a rendered, downloadable video ad, banner, or jingle.

When you create an asset in Aytada, your product description passes through a sequence of specialised AI models. Each stage has a distinct role: strategy sets the creative direction, scripting writes the copy, scene breakdown translates copy into visual direction, and rendering turns those directions into actual video clips, images, or audio. This page explains what happens at each stage and which models are involved.

<Note>
  Strategy, ideation, and scene generation are free. Aytada only deducts credits when a model renders a final asset — a video clip, voiceover, banner, or jingle.
</Note>

## The production pipeline

<Steps>
  <Step title="Strategy and ideation (free)">
    Every asset begins with strategy. You provide:

    * **Product name and description** — what you are advertising
    * **Business type** — Physical Product, SaaS, Service, E-commerce Brand, and seven others
    * **Industry context** — Beauty & Skincare, Fitness, Tech/SaaS, and five others
    * **Creative style** — Direct Response, UGC Style, Cinematic, Storytelling, and six others
    * **Target avatar** — a specific person defined by their current struggle and desired identity
    * **Persuasion trigger** — one of Cialdini's six principles (Reciprocity, Social Proof, Authority, etc.)
    * **Awareness stage** — where your audience sits on the [5 Stages of Customer Awareness](/concepts/awareness-stages)

    Aytada feeds all of this into DeepSeek V4 Flash to generate three distinct ad concepts, each with a hook angle and narrative approach tailored to your awareness stage. This costs **0 credits**.

    <Tip>
      The awareness stage is the single most important input. An audience that has never heard of your problem needs a completely different message than an audience that is already comparing you to a competitor. See [Awareness stages](/concepts/awareness-stages) for guidance on which to choose.
    </Tip>
  </Step>

  <Step title="Script generation">
    After you select a concept, Aytada generates a full ad script. Scripts are structured in four sections:

    1. **Hook** — the opening line or visual that stops the scroll
    2. **Problem** — agitation of the pain point your audience recognises
    3. **Solution** — introduction of your product as the answer
    4. **CTA** — a clear call to action matched to the awareness stage

    You choose your script tier:

    | Tier     | Model           | Cost      |
    | -------- | --------------- | --------- |
    | Standard | GPT-5.5         | 2 credits |
    | Premium  | Claude Opus 4.7 | 5 credits |

    The model receives your business type, industry pain points, target avatar description, persuasion trigger rules, and awareness stage narrative strategy as context. This means a Direct Response script for a fitness supplement at the Most Aware stage will read very differently from a Storytelling script for a SaaS product at the Unaware stage — because the prompts are fundamentally different.
  </Step>

  <Step title="Scene breakdown (free)">
    The script is automatically broken down into individual visual scenes — one per act, scaled to the ad length you chose (3, 5, or 7 scenes). Each scene includes:

    * **Narration text** — the voiceover line for that scene
    * **Visual direction** — subject, action, and setting
    * **Camera notes** — angle, movement, and lens cues
    * **Lighting and atmosphere** — time of day, mood, colour temperature

    Scene generation uses DeepSeek V4 Flash and costs **0 credits**. The visual direction adapts to your business type: a Physical Product campaign gets product-centric shots, a Service campaign gets transformation-centric shots, and a SaaS campaign focuses on outcome-centric visuals.

    <Note>
      You can edit scene descriptions before rendering. If the AI chose an exterior shot but your product works better in a studio, update the description and render with your revised direction.
    </Note>
  </Step>

  <Step title="Asset rendering">
    This is where credits are consumed. Aytada uses different model pipelines depending on the asset type:

    ### Video ads

    Video scenes are rendered in parallel using fal.ai's multi-model architecture. The model is selected automatically based on the quality tier you chose:

    | Tier     | Primary model | Fallback model    | Resolution |
    | -------- | ------------- | ----------------- | ---------- |
    | Standard | Wan 2.7       | Kling V3 Standard | 720p       |
    | Pro      | Kling V3 Pro  | Seedance 2.0      | 1080p      |

    If you uploaded a product image, the hook scene is generated using Image-to-Video mode, which uses your photo as the starting frame. All other scenes use Text-to-Video mode.

    Scenes generate concurrently (up to three at once), so total render time is roughly the duration of a single scene generation rather than the sum of all scenes.

    ### Ad banners and social flyers

    Static assets use a typography-first pipeline because readable text inside an image is notoriously difficult for most AI models:

    1. **GPT Image 2** generates the base composition with headlines, body copy, and layout
    2. **Bria Product Shot** composites your product image into the generated scene
    3. **Topaz Upscaler** refines the output for high-DPI displays

    Banners cost **4 credits**. Flyers cost **5 credits**. Both are output at print-ready resolution.

    ### Brand jingles

    Jingle generation maps your ad tone and brand personality to a music description, then routes to the appropriate model tier:

    | Tier     | Model                  | Best for                                    |
    | -------- | ---------------------- | ------------------------------------------- |
    | Standard | ACE-Step or CassetteAI | Background loops, fast instrumentals        |
    | Premium  | MiniMax Music 2.0      | Lyric-driven jingles with structural tags   |
    | Elite    | ElevenLabs Music       | Section-level composition and lyric control |

    Jingles cost **5 credits** and produce a 15–60 second audio asset.

    <Warning>
      Your credit balance is checked before each render step begins. If your balance drops below the required amount mid-campaign, the current step will return an insufficient credits error and no partial credits will be deducted. Top up your balance and resume from the same step.
    </Warning>
  </Step>

  <Step title="Voiceover generation">
    After video scenes are rendered, Aytada generates a voiceover from your script using **ElevenLabs v3** via fal.ai (5 credits). The voice is chosen based on your persuasion trigger:

    * **Authority** → confident, clear male voice
    * **Liking** → casual, empathetic female voice
    * **Scarcity** → urgent, fast-paced delivery
    * **Reciprocity** → warm, instructional tone

    The voiceover uses emotion tags in the script (for example, `[excited]` on the hook line) to produce more expressive delivery rather than flat narration.

    If ElevenLabs is unavailable or times out, Aytada automatically falls back to **Gemini 3.1 Flash TTS** at no extra cost and with no interruption to your workflow.

    <Tip>
      If you have a branded spokesperson, you can upload a 5–30 second audio sample in your project settings to enable voice cloning. The generated voiceover will mimic the tone and cadence of your sample.
    </Tip>
  </Step>

  <Step title="Final assembly">
    The last step stitches your rendered scene clips and voiceover into a single MP4 file. Aytada submits the assets to **Shotstack Edit API** (5 credits), which:

    1. Sequences scene clips in order on the video track
    2. Overlays the voiceover on the audio track at 30% video volume
    3. Mixes in background music if you generated a background track
    4. Renders and returns a final downloadable MP4

    Assembly typically completes within two minutes. You receive an email when your video is ready. The final file is accessible from your [Campaign hub](/studio/campaign-hub) and [Project library](/studio/project-library).
  </Step>
</Steps>

## Multi-model architecture

Aytada uses primary and fallback models for every generation step. If the primary model is slow, returns an error, or times out, the request automatically retries against the fallback model. From your perspective, generation either succeeds or fails with a clear error message — you will never see a mid-pipeline failure that silently produces a broken asset.

| Step             | Primary           | Fallback             |
| ---------------- | ----------------- | -------------------- |
| Ideas and scenes | DeepSeek V4 Flash | —                    |
| Standard script  | GPT-5.5           | —                    |
| Premium script   | Claude Opus 4.7   | —                    |
| Video (Standard) | Wan 2.7           | Kling V3 Standard    |
| Video (Pro)      | Kling V3 Pro      | Seedance 2.0         |
| Voiceover        | ElevenLabs v3     | Gemini 3.1 Flash TTS |
| Banners/Flyers   | GPT Image 2       | Bria pipeline        |

## Creative Intelligence

Every generation step is shaped by your campaign's creative context. Aytada injects the following into each AI prompt:

* **Business type** determines the visual approach — product-centric for physical goods, outcome-centric for SaaS, transformation-centric for services
* **Industry context** provides relevant pain points, aspirations, and hook angles specific to your market
* **Creative style** sets the narrative approach — a UGC Style script reads like a casual testimonial, while a Cinematic script uses sparse copy and visual spectacle
* **Awareness stage** governs what the script is allowed to say — an Unaware audience never hears the product name in the first scene; a Most Aware audience gets urgency and a direct offer
* **Persuasion trigger** shapes voice tone, narrative arc, and the specific emotional lever the script pulls

This context is consistent across every asset in the campaign. Your video ad, banner, flyer, and jingle will all speak the same language to the same audience.

<Note>
  Want to target different audience segments from the same product? Create separate campaigns — one per awareness stage or audience segment. Each campaign maintains its own strategy, scripts, and assets independently.
</Note>

## Next steps

<CardGroup cols={2}>
  <Card title="Awareness stages" icon="bullseye" href="/concepts/awareness-stages">
    A full explanation of the 5 Stages of Customer Awareness and how to choose the right one.
  </Card>

  <Card title="Video ads guide" icon="film" href="/guides/video-ads">
    Quality tiers, ad lengths, formats, and tips for best results.
  </Card>

  <Card title="Credit costs" icon="coins" href="/billing/credit-costs">
    The full credit cost breakdown for every step in every pipeline.
  </Card>

  <Card title="Campaign hub" icon="table-columns" href="/studio/campaign-hub">
    How to manage and view all your campaign assets from one place.
  </Card>
</CardGroup>