Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.aytada.app/llms.txt

Use this file to discover all available pages before exploring further.

When you create an asset in Aytada, your product description passes through a sequence of specialised AI models. Each stage has a distinct role: strategy sets the creative direction, scripting writes the copy, scene breakdown translates copy into visual direction, and rendering turns those directions into actual video clips, images, or audio. This page explains what happens at each stage and which models are involved.
Strategy, ideation, and scene generation are free. Aytada only deducts credits when a model renders a final asset — a video clip, voiceover, banner, or jingle.

The production pipeline

1

Strategy and ideation (free)

Every asset begins with strategy. You provide:
  • Product name and description — what you are advertising
  • Business type — Physical Product, SaaS, Service, E-commerce Brand, and seven others
  • Industry context — Beauty & Skincare, Fitness, Tech/SaaS, and five others
  • Creative style — Direct Response, UGC Style, Cinematic, Storytelling, and six others
  • Target avatar — a specific person defined by their current struggle and desired identity
  • Persuasion trigger — one of Cialdini’s six principles (Reciprocity, Social Proof, Authority, etc.)
  • Awareness stage — where your audience sits on the 5 Stages of Customer Awareness
Aytada feeds all of this into DeepSeek V4 Flash to generate three distinct ad concepts, each with a hook angle and narrative approach tailored to your awareness stage. This costs 0 credits.
The awareness stage is the single most important input. An audience that has never heard of your problem needs a completely different message than an audience that is already comparing you to a competitor. See Awareness stages for guidance on which to choose.
2

Script generation

After you select a concept, Aytada generates a full ad script. Scripts are structured in four sections:
  1. Hook — the opening line or visual that stops the scroll
  2. Problem — agitation of the pain point your audience recognises
  3. Solution — introduction of your product as the answer
  4. CTA — a clear call to action matched to the awareness stage
You choose your script tier:
TierModelCost
StandardGPT-5.52 credits
PremiumClaude Opus 4.75 credits
The model receives your business type, industry pain points, target avatar description, persuasion trigger rules, and awareness stage narrative strategy as context. This means a Direct Response script for a fitness supplement at the Most Aware stage will read very differently from a Storytelling script for a SaaS product at the Unaware stage — because the prompts are fundamentally different.
3

Scene breakdown (free)

The script is automatically broken down into individual visual scenes — one per act, scaled to the ad length you chose (3, 5, or 7 scenes). Each scene includes:
  • Narration text — the voiceover line for that scene
  • Visual direction — subject, action, and setting
  • Camera notes — angle, movement, and lens cues
  • Lighting and atmosphere — time of day, mood, colour temperature
Scene generation uses DeepSeek V4 Flash and costs 0 credits. The visual direction adapts to your business type: a Physical Product campaign gets product-centric shots, a Service campaign gets transformation-centric shots, and a SaaS campaign focuses on outcome-centric visuals.
You can edit scene descriptions before rendering. If the AI chose an exterior shot but your product works better in a studio, update the description and render with your revised direction.
4

Asset rendering

This is where credits are consumed. Aytada uses different model pipelines depending on the asset type:

Video ads

Video scenes are rendered in parallel using fal.ai’s multi-model architecture. The model is selected automatically based on the quality tier you chose:
TierPrimary modelFallback modelResolution
StandardWan 2.7Kling V3 Standard720p
ProKling V3 ProSeedance 2.01080p
If you uploaded a product image, the hook scene is generated using Image-to-Video mode, which uses your photo as the starting frame. All other scenes use Text-to-Video mode.Scenes generate concurrently (up to three at once), so total render time is roughly the duration of a single scene generation rather than the sum of all scenes.

Ad banners and social flyers

Static assets use a typography-first pipeline because readable text inside an image is notoriously difficult for most AI models:
  1. GPT Image 2 generates the base composition with headlines, body copy, and layout
  2. Bria Product Shot composites your product image into the generated scene
  3. Topaz Upscaler refines the output for high-DPI displays
Banners cost 4 credits. Flyers cost 5 credits. Both are output at print-ready resolution.

Brand jingles

Jingle generation maps your ad tone and brand personality to a music description, then routes to the appropriate model tier:
TierModelBest for
StandardACE-Step or CassetteAIBackground loops, fast instrumentals
PremiumMiniMax Music 2.0Lyric-driven jingles with structural tags
EliteElevenLabs MusicSection-level composition and lyric control
Jingles cost 5 credits and produce a 15–60 second audio asset.
Your credit balance is checked before each render step begins. If your balance drops below the required amount mid-campaign, the current step will return an insufficient credits error and no partial credits will be deducted. Top up your balance and resume from the same step.
5

Voiceover generation

After video scenes are rendered, Aytada generates a voiceover from your script using ElevenLabs v3 via fal.ai (5 credits). The voice is chosen based on your persuasion trigger:
  • Authority → confident, clear male voice
  • Liking → casual, empathetic female voice
  • Scarcity → urgent, fast-paced delivery
  • Reciprocity → warm, instructional tone
The voiceover uses emotion tags in the script (for example, [excited] on the hook line) to produce more expressive delivery rather than flat narration.If ElevenLabs is unavailable or times out, Aytada automatically falls back to Gemini 3.1 Flash TTS at no extra cost and with no interruption to your workflow.
If you have a branded spokesperson, you can upload a 5–30 second audio sample in your project settings to enable voice cloning. The generated voiceover will mimic the tone and cadence of your sample.
6

Final assembly

The last step stitches your rendered scene clips and voiceover into a single MP4 file. Aytada submits the assets to Shotstack Edit API (5 credits), which:
  1. Sequences scene clips in order on the video track
  2. Overlays the voiceover on the audio track at 30% video volume
  3. Mixes in background music if you generated a background track
  4. Renders and returns a final downloadable MP4
Assembly typically completes within two minutes. You receive an email when your video is ready. The final file is accessible from your Campaign hub and Project library.

Multi-model architecture

Aytada uses primary and fallback models for every generation step. If the primary model is slow, returns an error, or times out, the request automatically retries against the fallback model. From your perspective, generation either succeeds or fails with a clear error message — you will never see a mid-pipeline failure that silently produces a broken asset.
StepPrimaryFallback
Ideas and scenesDeepSeek V4 Flash
Standard scriptGPT-5.5
Premium scriptClaude Opus 4.7
Video (Standard)Wan 2.7Kling V3 Standard
Video (Pro)Kling V3 ProSeedance 2.0
VoiceoverElevenLabs v3Gemini 3.1 Flash TTS
Banners/FlyersGPT Image 2Bria pipeline

Creative Intelligence

Every generation step is shaped by your campaign’s creative context. Aytada injects the following into each AI prompt:
  • Business type determines the visual approach — product-centric for physical goods, outcome-centric for SaaS, transformation-centric for services
  • Industry context provides relevant pain points, aspirations, and hook angles specific to your market
  • Creative style sets the narrative approach — a UGC Style script reads like a casual testimonial, while a Cinematic script uses sparse copy and visual spectacle
  • Awareness stage governs what the script is allowed to say — an Unaware audience never hears the product name in the first scene; a Most Aware audience gets urgency and a direct offer
  • Persuasion trigger shapes voice tone, narrative arc, and the specific emotional lever the script pulls
This context is consistent across every asset in the campaign. Your video ad, banner, flyer, and jingle will all speak the same language to the same audience.
Want to target different audience segments from the same product? Create separate campaigns — one per awareness stage or audience segment. Each campaign maintains its own strategy, scripts, and assets independently.

Next steps

Awareness stages

A full explanation of the 5 Stages of Customer Awareness and how to choose the right one.

Video ads guide

Quality tiers, ad lengths, formats, and tips for best results.

Credit costs

The full credit cost breakdown for every step in every pipeline.

Campaign hub

How to manage and view all your campaign assets from one place.