How Aytada generates your marketing assets

When you create an asset in Aytada, your product description passes through a sequence of specialised AI models. Each stage has a distinct role: strategy sets the creative direction, scripting writes the copy, scene breakdown translates copy into visual direction, and rendering turns those directions into actual video clips, images, or audio. This page explains what happens at each stage and which models are involved.

Strategy, ideation, and scene generation are free. Aytada only deducts credits when a model renders a final asset — a video clip, voiceover, banner, or jingle.

The production pipeline

Strategy and ideation (free)

Every asset begins with strategy. You provide:

Product name and description — what you are advertising
Business type — Physical Product, SaaS, Service, E-commerce Brand, and seven others
Industry context — Beauty & Skincare, Fitness, Tech/SaaS, and five others
Creative style — Direct Response, UGC Style, Cinematic, Storytelling, and six others
Target avatar — a specific person defined by their current struggle and desired identity
Persuasion trigger — one of Cialdini’s six principles (Reciprocity, Social Proof, Authority, etc.)
Awareness stage — where your audience sits on the 5 Stages of Customer Awareness

Aytada feeds all of this into DeepSeek V4 Flash to generate three distinct ad concepts, each with a hook angle and narrative approach tailored to your awareness stage. This costs 0 credits.

The awareness stage is the single most important input. An audience that has never heard of your problem needs a completely different message than an audience that is already comparing you to a competitor. See Awareness stages for guidance on which to choose.

Script generation

After you select a concept, Aytada generates a full ad script. Scripts are structured in four sections:

Hook — the opening line or visual that stops the scroll
Problem — agitation of the pain point your audience recognises
Solution — introduction of your product as the answer
CTA — a clear call to action matched to the awareness stage

You choose your script tier:

Tier	Model	Cost
Standard	GPT-5.5	2 credits
Premium	Claude Opus 4.7	5 credits

The model receives your business type, industry pain points, target avatar description, persuasion trigger rules, and awareness stage narrative strategy as context. This means a Direct Response script for a fitness supplement at the Most Aware stage will read very differently from a Storytelling script for a SaaS product at the Unaware stage — because the prompts are fundamentally different.

Scene breakdown (free)

The script is automatically broken down into individual visual scenes — one per act, scaled to the ad length you chose (3, 5, or 7 scenes). Each scene includes:

Narration text — the voiceover line for that scene
Visual direction — subject, action, and setting
Camera notes — angle, movement, and lens cues
Lighting and atmosphere — time of day, mood, colour temperature

Scene generation uses DeepSeek V4 Flash and costs 0 credits. The visual direction adapts to your business type: a Physical Product campaign gets product-centric shots, a Service campaign gets transformation-centric shots, and a SaaS campaign focuses on outcome-centric visuals.

You can edit scene descriptions before rendering. If the AI chose an exterior shot but your product works better in a studio, update the description and render with your revised direction.

Asset rendering

This is where credits are consumed. Aytada uses different model pipelines depending on the asset type:

Video ads

Video scenes are rendered in parallel using fal.ai’s multi-model architecture. The model is selected automatically based on the quality tier you chose:

Tier	Primary model	Fallback model	Resolution
Standard	Wan 2.7	Kling V3 Standard	720p
Pro	Kling V3 Pro	Seedance 2.0	1080p

If you uploaded a product image, the hook scene is generated using Image-to-Video mode, which uses your photo as the starting frame. All other scenes use Text-to-Video mode.Scenes generate concurrently (up to three at once), so total render time is roughly the duration of a single scene generation rather than the sum of all scenes.Static assets use a typography-first pipeline because readable text inside an image is notoriously difficult for most AI models:

GPT Image 2 generates the base composition with headlines, body copy, and layout
Bria Product Shot composites your product image into the generated scene
Topaz Upscaler refines the output for high-DPI displays

Banners cost 4 credits. Flyers cost 5 credits. Both are output at print-ready resolution.

Brand jingles

Jingle generation maps your ad tone and brand personality to a music description, then routes to the appropriate model tier:

Tier	Model	Best for
Standard	ACE-Step or CassetteAI	Background loops, fast instrumentals
Premium	MiniMax Music 2.0	Lyric-driven jingles with structural tags
Elite	ElevenLabs Music	Section-level composition and lyric control

Jingles cost 5 credits and produce a 15–60 second audio asset.

Your credit balance is checked before each render step begins. If your balance drops below the required amount mid-campaign, the current step will return an insufficient credits error and no partial credits will be deducted. Top up your balance and resume from the same step.

Voiceover generation

After video scenes are rendered, Aytada generates a voiceover from your script using ElevenLabs v3 via fal.ai (5 credits). The voice is chosen based on your persuasion trigger:

Authority → confident, clear male voice
Liking → casual, empathetic female voice
Scarcity → urgent, fast-paced delivery
Reciprocity → warm, instructional tone

The voiceover uses emotion tags in the script (for example, [excited] on the hook line) to produce more expressive delivery rather than flat narration.If ElevenLabs is unavailable or times out, Aytada automatically falls back to Gemini 3.1 Flash TTS at no extra cost and with no interruption to your workflow.

If you have a branded spokesperson, you can upload a 5–30 second audio sample in your project settings to enable voice cloning. The generated voiceover will mimic the tone and cadence of your sample.

Final assembly

The last step stitches your rendered scene clips and voiceover into a single MP4 file. Aytada submits the assets to Shotstack Edit API (5 credits), which:

Sequences scene clips in order on the video track
Overlays the voiceover on the audio track at 30% video volume
Mixes in background music if you generated a background track
Renders and returns a final downloadable MP4

Assembly typically completes within two minutes. You receive an email when your video is ready. The final file is accessible from your Campaign hub and Project library.

Multi-model architecture

Aytada uses primary and fallback models for every generation step. If the primary model is slow, returns an error, or times out, the request automatically retries against the fallback model. From your perspective, generation either succeeds or fails with a clear error message — you will never see a mid-pipeline failure that silently produces a broken asset.

Step	Primary	Fallback
Ideas and scenes	DeepSeek V4 Flash	—
Standard script	GPT-5.5	—
Premium script	Claude Opus 4.7	—
Video (Standard)	Wan 2.7	Kling V3 Standard
Video (Pro)	Kling V3 Pro	Seedance 2.0
Voiceover	ElevenLabs v3	Gemini 3.1 Flash TTS
Banners/Flyers	GPT Image 2	Bria pipeline

Creative Intelligence

Every generation step is shaped by your campaign’s creative context. Aytada injects the following into each AI prompt:

Business type determines the visual approach — product-centric for physical goods, outcome-centric for SaaS, transformation-centric for services
Industry context provides relevant pain points, aspirations, and hook angles specific to your market
Creative style sets the narrative approach — a UGC Style script reads like a casual testimonial, while a Cinematic script uses sparse copy and visual spectacle
Awareness stage governs what the script is allowed to say — an Unaware audience never hears the product name in the first scene; a Most Aware audience gets urgency and a direct offer
Persuasion trigger shapes voice tone, narrative arc, and the specific emotional lever the script pulls

This context is consistent across every asset in the campaign. Your video ad, banner, flyer, and jingle will all speak the same language to the same audience.

Want to target different audience segments from the same product? Create separate campaigns — one per awareness stage or audience segment. Each campaign maintains its own strategy, scripts, and assets independently.

Next steps

Awareness stages

A full explanation of the 5 Stages of Customer Awareness and how to choose the right one.

Video ads guide

Quality tiers, ad lengths, formats, and tips for best results.

Credit costs

The full credit cost breakdown for every step in every pipeline.

Campaign hub

How to manage and view all your campaign assets from one place.

Get Started

Core Concepts

Creating Assets

Studio Features

Credits & Billing

Account

How Aytada generates your marketing assets

The production pipeline

Video ads

Brand jingles

Multi-model architecture

Creative Intelligence

Next steps

Awareness stages

Video ads guide

Credit costs

Campaign hub

Get Started

Core Concepts

Creating Assets

Studio Features

Credits & Billing

Account

Documentation Index

​The production pipeline

​Video ads

​Ad banners and social flyers

​Brand jingles

​Multi-model architecture

​Creative Intelligence

​Next steps

Awareness stages

Video ads guide

Credit costs

Campaign hub

The production pipeline

Video ads

Ad banners and social flyers

Brand jingles

Multi-model architecture

Creative Intelligence

Next steps