The other weekend, I made a short film using mostly AI, although I used what I usually call a “hybrid pipeline”, which is a fancy term for “I used all the shit that I know and I’ve been using for twenty plus years”.
Anyway, Seedance 2 Fast, or whatever it's called, had unlimited generations for a month, so I bought a subscription to test it.
Disclaimer: The version of Seedance 2 I used only allowed me to generate 720p videos without paying extra, and since this is a silly exercise of grandiose storytelling, I generated the entire short film in 720p.
As Nolan would say, I strongly recommend that you watch it on your phone.
Here is the short film.
Enough people asked about how I made it (2 or 3) so I decided to break it down here.
Writing the story
The Spike Jonze Influence
I'm Here (2010) was the direct reference point — a short film that does something almost nobody else does: it places something non-human at the emotional center and trusts the audience to meet it there. No explanation of why robots exist in this world. No backstory. Just a life, observed.
The decision to follow that same discipline: Lou exists. The ballroom exists. We don't explain either.
The ballroom story: the dance as the thing that couldn't be explained or justified. It just was.
The moment the ending clicked: Lou doesn't stop dancing when she leaves. He mops. That was the film.
The Checklist
An informal test was applied to every story decision throughout development:
Does this have almost no dialogue?
Does the emotional weight land in a single small gesture?
Does it end on stillness rather than resolution?
Is the non-human character a lens for something deeply human?
Ballroom passed all four.
Designing Lou
The First Instinct and Why It Was Wrong
The initial character brief called for brushed steel hands and matte black feet — clean, modern, cinematic. The kind of robot you see in every sci-fi short. It would have been invisible.
The pivot: what if he looked obsolete? The Vintage Computer Reference
The specific moment of realisation: early 1990s computers. The Macintosh SE. The IBM PS/2. Objects that were once the future and are now just old. The particular sadness of yellowed plastic — something that was white once, that had ambitions once, that sat on someone's desk and felt important. Now just yellowed and forgotten. Lou is a direct translation of that feeling into a body.
The Design Decisions
The head: why a monitor-ish and not a more humanoid face. The blank CRT screen (as sort of eyes) as the most expressive non-face possible — you project onto it. The hands: chunky plastic segments, joint seams, worn edges — the specific beauty of rigid things attempting fluid movement. The feet: rectangular, flat-bottomed, ventilation slots — the comedy and the pathos of those feet dancing silly.
The uniform: navy blue janitor workwear, worn and washed. The colour chosen to contrast with the ivory plastic without competing with the amber ballroom.
The Character Sheet Process
Reference sheets were built before any video generation. The two-panel format — full body and face — and the five-panel turnaround were the continuity bible for every subsequent prompt. The discipline of documentation before generation.
3D model
The character was modelled in 3D (rough and sloppy).
With a 3D model of the character I would be able to pose him as I wanted, to animate him as I wanted and to frame him as I wanted. AI is random as hell, don’t let her control the narrative.
The original designed that didn’t happen
I wanted to create a stop mtion figurine look, but soon I realized that I would have to design the woman and the environment in the same style, and that is a lot of work. Also, rendering at 720p I would loose all the fiber details, so I discarded it quickly.
3D renders of Lou, the robot. Original design, discarded.
At the end I went for a more realistic look. Because I have the 3D character, I can generate multiple views that later I will use to create an asset and reference it while generating videos, which will ensure continuity (to some extent). To use the character in the videos I just need to tag it with @robot
Designing the woman
The Brief
No name. No backstory. A woman in her mid-to-late thirties who stumbles into something she didn't plan for and leaves before she can name it.
The specific quality needed: unremarkable in the best sense. The kind of face you trust immediately. Not a movie face — a real face.
The Generation Challenges
An honest account of what happened when trying to generate the character: attempts at a Latina woman, kept drifting to Asian features, the eventual decision to specify an average American woman in her thertiess with natural colouring.
This process taught something important about the biases baked into image generation models. The prompt had to explicitly state what the character was not, as much as what she was. Ethnicity and age had to appear in both the description and the constraints block — reinforced twice — to hold across generations.
Her Coat
Why the charcoal oversized coat was the right choice: it gives her weight and presence without defining her. She could be anyone. She is everyone. The coat also solves a practical continuity problem — a distinctive garment that reads consistently across all shots, from doorway to dance floor to exit.
I used a combination of Flux2, Nano Banana and GPT Image 2 for extra views once I had the main views.
Designing the ballroom
The Brief
Grand but faded. A room that was important once and is now just a room. The specific sadness of a ballroom that only fills a few times a year — the rest of the time it waits.
The herringbone parquet floor was the first decision. Golden-brown, heavily worn, the wear pattern heaviest at the center where couples have always danced. A floor that knows what it is for.
The Lighting Logic
The most important design decision in the film: two distinct versions of the same space were developed, even though only one appears in the final cut.
The night version: chandelier at 70-80% power, wall sconces filling the mid-room, warm amber at 3000K. This is Lou's world. Warm, intimate, slightly unreal. The morning-after version: chandelier off, overcast window light at 6000K, flat and honest. Built as a production tool to understand the space in neutral light — seeing the ballroom in flat overcast revealed details that needed to be present in the night version.
The Colour Logic
The 60:30:10 rule was applied consistently across every shot in the film. Dominant warm amber: the chandelier, the wall sconces, the record player lamp. Secondary soft shadow: the corners, the ceiling, under the tables. Accent: whatever carries the emotional weight of that specific shot.
This consistency is what allows 38 individual AI-generated shots — never generated together, never in the same session — to read as a single coherent visual world.
The Details That Make It Real
Every prop and detail was chosen not for decoration but for what it says about time and use and the lives that passed through this room. The chandelier: brass, tarnished, several crystals missing, slightly uneven hang. The windows: old glass with faint distortion, night outside. The tablecloths: stained, askew. The lipstick mark on the champagne flute rim. The confetti still on the floor, not yet swept.
The Wall Sconces
Added after the first generation pass. The chandelier alone left the room too dark and dramatic. The sconces fill the mid-room shadows and lift the whole space to a warm, comfortable level. The room winding down rather than brooding.
Even with a consistent asset the AI broke consistency a few time, but overall it kind of works.
Style frame for the ballroom, used in a few shots along the shor tfilm.
Contact sheet details.
I also made a rough 3D model of the ballroom, that I could use later for framing and compositing.
Also to animate 3D cameras and use them for the AI generation. Remember, don’t let the AI do it’s thing, it’s bad.
The record player
Why It Needed Its Own Contact Sheet
Every other prop in the film is set dressing. The record player is something that the robot interacts with. It appears in Scene 2, Scene 3, every dance scene, and the final shot. Lou's hand rests on it at the end of the film — not on a wall, not on a chair, on the record player.
It needed to be as documented and consistent as Lou himself. Three panels: the full prop in context, the turntable face-on, and the needle macro. The same object in all three — no drift in model or colour.
The Practical Lamp
The key lighting decision for the record player: a single practical clip lamp above it. This makes the record player the brightest object in the room in the final scenes. As the chandelier dims toward the end of the film, the record player becomes the last warm point of light. Lou walks toward it like a beacon.
The Emotional Arc of the Record Player
Scene 2: Lou places the record. The ritual. Scene 3 through 6: the music plays. Everything happens. Scene 7: the needle loops in silence. The music is gone. Scene 8: Lou's hand rests on the housing. Not the needle. The body of the machine. Two old things, still here.
In most shots it’s not really visible, because of the low resolution of the videos it kept deforming, so I remove it from most shots in the background.
Writing the Shot List
The Discipline of One Prompt Per Shot
The single most important production decision: every Seedance prompt generates exactly one shot. One camera setup. One continuous action. No cuts inside a prompt. Seedance can do multiple shots at once, but who the fuck does this?
AI video generation doesn't edit — it generates. Asking it to generate a sequence of cuts inside one prompt produces inconsistency and drift. One shot, one prompt, one truth. This discipline was arrived at after failed early attempts at multi-shot prompts.
The Evolution of the Prompt Format
First attempt: long descriptive paragraphs with multiple shots embedded. The problem was immediate — the model treated the whole prompt as one visual idea and blended the shots together. Second attempt: structured multi-shot prompts with numbered shots inside a single prompt. Still wrong — the same blending problem.
The final format: SUBJECT / LOCATION / ACTION / CAMERA / STYLE / CONSTRAINTS. Each section exists for a reason. SUBJECT defines who and what emotional beat they carry. LOCATION is a style reference, not a keyframe. ACTION is one thing, one movement, one continuous moment. CAMERA is a DOP brief. STYLE is the colour ratio and WB. CONSTRAINTS is what must not happen.
The Global Style Prefix
Every single prompt carries the full style prefix — 8K cinematic, photorealistic, naturalistic cinematography, physical cine lens, 180 degree shutter, WB, colour ratio. The discipline of never assuming the model remembers what came before. Each prompt is a standalone document. This is what people suggest to do, I didn’t do it. Feed other sources to understand what you want to do (3D renders, storyboards, camera animation, etc).
The Asset Tags
The evolution from full character descriptions in every prompt to @Lou, @woman, @ballroom, @recordplayer. How reference sheets make this possible. The tag is a contract between the prompt and the asset — it says: look at what I showed you, hold it exactly, do not drift.
Duration as a Creative Decision
Different shots have different durations — 8 seconds for a close-up of hands, 20 seconds for the final crane. Duration shapes rhythm before the edit even begins. A shot that is too long for its content teaches the audience to stop watching. A shot that ends too soon teaches them not to feel.
I ended up cutting most of the shot in half during editing.
Why Storyboards Matter in AI Filmmaking
The counterintuitive argument: AI video generation makes storyboards more important, not less. When you can generate anything, you need to know exactly what you want before you generate it. The storyboard is the commitment. Without it, you are shopping, not directing.
The Storyboard Process
Each shot prompt was first made as storyboard frame: camera position and height, subject placement in frame, the emotional beat of the shot, what the viewer's eye should go to first. Drawing these frames before generation is key, prompting is (mostly) useless.
Using Storyboards for Framing
If you start your generation with a storyboard or keyframe, you are directing the position of the characters, the perspective, the framing, even the set dressing. Imagine doing all of that with a single prompt and expect the AI to come up with it, madness.
My storyboards don’t show motion, only intention. Then I control the motion with 3D cameras, or 3D character animations, or prompts, or a combination of all.
Testing the “pipeline”
Now that I have designed the two characters, the environment and I have some storyboards to direct the AI, it is time for testing everything together.
It should be pretty simple at this point to tell the AI to place the robot in the ballroom based on the compositing established in the storyboard. Sometimes it doesn’t want to follow, but most of the times it works pretty decently.
3D Modelling the Environment for Camera Control
Why 3D Was Necessary
AI video generation does not give you camera control. You describe the camera in the prompt and hope. For a film where camera placement is a deliberate creative language — the 24mm wide, the 85mm close, the crane — hope is not enough.
The 3D ballroom model served as a camera planning tool: place the virtual camera exactly where you want it, render a rough frame, use that frame as a reference image in Seedance. The model did not need to be beautiful — it needed to be accurate.
The Crane Shot Planning
The final crane shot — Scene 8 Shot 3, 20 seconds, from medium height to vast wide, ending in a fade to black — was planned entirely in 3D before a single Seedance prompt was written. The start position, the end position, the rate of rise, the focal length. All locked in 3D first. The rate of rise had to be imperceptibly slow — the audience should feel the room getting larger without noticing the camera moving.
Camera Height as Emotional Language
The 3D planning process revealed a pattern in the film's camera heights. 1.6m: neutral, observational, Lou's world as it is. 1.4m: slightly low, used for two-shot moments between Lou and the woman — makes both figures feel present and alive. 0.3m: floor level, used for the feet shots. Rising crane: revelation, scale, emotional punctuation. This language was discovered in 3D, not invented at the prompt-writing stage.
Animating Lou Using 3D Models
The Motion Reference Problem
Seedance generates movement from text description. For simple movements — walking, standing, mopping — text description is sufficient. For complex movement — sloppy dancing, the specific quality of rigid plastic joints attempting fluid choreography — text description alone produces generic results.
The solution: motion capture reference rendered from the 3D Lou model, used as a reference image input for Seedance alongside the text prompt.
The Dance Choreography Process
Reference videos of specific moves were gathered: chest pops, shoulder isolations, footwork. Motion was applied to the Lou 3D model. Key frames were rendered as reference images. Those images were fed into Seedance alongside the text prompt. The difference between shots generated with 3D motion reference versus without is significant — the reference shots have specificity, the non-reference shots have generic movement.
The Plastic Body Problem
The rigid plastic joints of Lou's hands and feet created unexpected choreographic opportunities. Moves that would look ordinary on a human body look strange and beautiful on Lou's body. The rectangular feet doing hip hop footwork. The chunky plastic hands attempting finger waves. The constraint became the aesthetic.
Not in all the shots, but for some, I just rendered flipbooks from Houdini to get the right motion for the shot.
Then I asked (politely) to the AI to place the robot, in the ballroom, following the storyboard composition and using the flipbook animation.
Some early tests.
Flipbook from Houdini.
Lou’s original look.
Lou’s new look.
Lou’s animation in context.
Lou’s animation in context.
Editing — Building Rhythm Without Dialogue
The Edit as the Real Writing
The shot list was the script. The edit was the rewrite. The editing process was a creative act — not just assembly but discovery. A film with no dialogue, no voiceover, and almost no camera movement lives or dies by its editing rhythm. The cuts are the heartbeat.
The Rhythm Logic
Scene 1 (routine): slow, regular, mechanical — cuts every 10 to 12 seconds. Scene 2 (the record): slower still — the ritual needs time. Scene 3 (the dance): the rhythm breaks open — cuts get shorter, more energetic. Scene 4 (discovered): sudden — the freeze, the long two-shot, silence. Scene 5 (approach): building — each shot slightly shorter than the last. Scene 6 (together): the freest rhythm in the film, the dance sets the pace. Scene 7 (it ends): slowing — the deceleration of the edit mirrors the deceleration of the dance. Scene 8 (back to work): back to Scene 1 rhythm — mechanical, regular, inevitable.
The Echo Cuts
Deliberate visual rhymes were built into the edit. Scene 1 Shot 3 — Lou mopping — is echoed exactly by Scene 8 Shot 2 — Lou mopping the floor they danced on. Scene 3 Shot 6 — wide crane of Lou dancing alone — is echoed by Scene 8 Shot 3 — wide crane of Lou mopping. The record player in Scene 2 and the record player in the final shot. The film is full of returns.
The Hardest Cut in the Film
The cut from the woman walking out the door to Lou standing still at the center of the floor. The door closes. Cut. Lou alone. How long to hold on the door before cutting? Too short and the loss doesn't land. Too long and the film goes slack. This was the most iterated cut in the edit — the one that took the most versions to arrive at the right duration.
Nah not really, I’m calling myself the Ed Wood of AI. Almost every shot is a first generation, I’m not wasting my time with this.
The Sound Philosophy
No score. No needle drops. Environmental sound only — consistent with the visual approach of natural light only, practical light only. The sound world follows the same rules as the visual world.
Three layers: room tone — the specific acoustic of a large empty ballroom, the hum of the chandelier, the creak of the building, the distant city outside. Practical SFX — Lou's plastic feet on the parquet, the sound of yellowed plastic hands stacking chairs, the wet sound of the mop, the champagne flutes. And the record player — the most important sound in the film.
The Record Player Sound Design
The crackle of the needle hitting the groove is the first music sound in the film — and it is also the most important single sound event. The music that plays during the dance: the choice, the licensing, the decision to let it breathe without competing with it. The moment the music ends: the needle looping in silence. How long that loop plays before Lou goes back to the mop.
The Silence After the Music
The most important sound decision in the film: after the needle lifts and the music ends, there is no room tone for three seconds. Pure silence. Then the room tone returns — the hum, the creak, the city outside. The world coming back. Lou going back to work.
Nah not really, just play The Smiths and you are good.
It’s late, thanks Claudio for writing this.