Your AI Video Prompt Is a Shot List. Here's How to Write It Like One.

Why Your Prompts Are Probably Undirected

When most people start using AI video generation tools, their prompts look like this: "A woman walking through a city at night, cinematic, beautiful lighting, 4K." The output they get is technically competent and aesthetically generic. It looks like a stock video that represents the concept rather than a specific creative vision.

The problem is not the tool. It is the prompt. "Cinematic" is not a direction. "Beautiful lighting" is not a lighting design. The models in 2026 — Runway Gen-4.5, Google Veo 3.1, Kling 3.0 — have been trained on vast datasets of professional cinematography. They respond to specific technical language the same way a DP responds to a proper brief. When you give them vague adjectives, they make generic default choices. When you give them specific technical parameters, they execute a specific vision.

After 14 years of directing commercial work, I can tell you that this is exactly how it feels to brief a DP. Vague creative language produces technically competent footage that expresses nothing in particular. Specific technical and aesthetic language produces footage that represents a decision.

The Eight Control Layers

Current AI video research describes eight primary control dimensions that determine the quality and specificity of generated video output. I'm going to translate each one into the production language I use on set, because that translation is the key to moving from generic to directed.

1. Subject: Not "a woman" but "a woman in her 40s, Mediterranean appearance, mid-length dark hair, wearing a cream linen blazer over a white shirt, carrying a document folder, purposeful walking cadence." Your talent direction brief, translated directly. The more specific the subject description, the more coherent the generated performance.

2. Emotion/Performance: Not "confident" but "the quiet confidence of someone who has already decided the outcome of a meeting and is approaching it with controlled energy." Directing a performance on a brief — the emotional register you'd give the talent in a pre-shoot conversation. Models respond to emotional specificity with visible performance choices in body language and facial expression.

3. Optics: This is where production language translates most directly. "85mm equivalent, f/2.8, slight rack focus from foreground element to subject at midpoint." The focal length determines the spatial compression and background relationship. The aperture determines depth of field. A telephoto lens at a wide aperture produces a different spatial relationship than a 35mm at f/5.6, and the models understand this difference when you specify it in technical terms rather than aesthetic descriptions.

4. Motion: Camera movement should be described as a DP would execute it. "Slow tracking shot from camera right to camera left, tracking the subject, slight push-in as the subject pauses, camera height at subject's chest level." Not "the camera moves with her" but a specific movement description with direction, speed, and height. The models execute this with much higher fidelity than natural language movement descriptions.

5. Lighting: Give a lighting setup description, not an aesthetic label. "Overcast exterior, even diffuse light, no harsh shadows, slight fill from a reflective surface to camera left, color temperature approximately 5600K." That is a real lighting condition you'd scout for or recreate with a soft box and a reflector. The model understands it as such and generates coherent light behavior rather than aesthetic approximation.

6. Style: The aesthetic reference frame for the shot. The most effective style prompts reference specific photographers, cinematographers, or productions rather than genre labels. "Rodrigo Prieto exterior work" or "commercial photography aesthetic in the tradition of Art Streiber portraits" gives the model a specific visual tradition to draw from. "Cinematic" does not.

7. Audio: For models with native audio generation (Runway Gen-4.5, Kling 3.0, Veo 3.1), the audio direction is part of the prompt. "Ambient urban soundscape, distant traffic, quiet footsteps on stone, no dialogue" is a sound design brief. The model generates audio that corresponds to the described environment rather than adding generic background sound.

8. Continuity: For multi-shot sequences, continuity parameters maintain visual consistency across shots. "Maintain the same color temperature and lighting character as the preceding shot, same talent wardrobe, late afternoon in the same urban environment." This is what prevents AI-generated sequences from feeling like a collection of separate clips rather than a coherent edit.

Prompt Length and Structure

One of the non-obvious insights from working with current AI video models: prompts for video should be shorter than image prompts but technically denser. Fewer words, higher precision. A 150-word prompt with eight specific technical parameters outperforms a 400-word descriptive paragraph that covers the same ground in narrative prose.

Structure the prompt as a shot list entry rather than a description. Shot number, subject action, camera setup, lighting condition, audio. The more it reads like professional production documentation and the less it reads like a creative brief to a marketing team, the better the output typically is.

The iterative workflow that works: start with a short, high-precision prompt that establishes the technical parameters. Generate three or four variations. Identify which technical element is producing the most useful result and which is producing the most unwanted variation. Adjust the underperforming element with more specific direction. One parameter change at a time, not a complete prompt rewrite.

What This Means for Directors

The directorial skill that translates most directly to AI video work is the ability to describe a vision in technical language rather than aesthetic language. Directors who have always given their DPs specific technical briefs — "I want a 100mm lens at f/2 for this shot, motivated backlight from camera left, foreground out of focus" — are better positioned to get high-quality AI video output than those who have worked primarily in conceptual and aesthetic language.

The skills are the same. The tool that executes them has changed.

Sources: TrueFan — Master Cinematic AI Video Prompts: 2026 Expert Playbook | MetricsMule — AI Video Prompt Engineering | Google Cloud Blog — Ultimate Prompting Guide for Veo 3.1

Why Your Prompts Are Probably Undirected

The Eight Control Layers

Prompt Length and Structure

What This Means for Directors

Ulisses Balbino

Amazon Is Quietly Building AI Tools for Film and TV Production. This Is What It Signals.

Anthropic's Agent Skills Is Now an Open Standard — Here's What That Changes in Your Production Stack

14 Years on Set Taught Me This: AI Video Models Respond to DP Language. Here's How to Use It.