14 Years on Set Taught Me This: AI Video Models Respond to DP Language. Here's How to Use It.

The Prompt Problem Nobody Talks About

Most people prompting AI video tools describe what they want to happen in a scene. "A person walking through a city street at night." "A product rotating on a white background." "Two people having a conversation at a table."

That is content description. It tells the model what the scene contains. It does not tell the model how to shoot it.

Directors and cinematographers speak differently on set. The same city street at night scene might be: "Low angle Dutch tilt, handheld with motivated camera shake, rack focus from background signage to foreground subject as they pass, practical neon lighting motivated left."

After 14 years directing commercial productions — Starbucks, Nestlé, Yamaha, Carrefour, Benefit — this is the vocabulary that became second nature. And in 2026, the major AI video models understand it. Runway Gen-4.5, Kling 3.0, Sora 2, and Veo 3.1 all respond to cinematography language in ways that meaningfully change output quality.

Camera Movement Vocabulary That Works

These terms translate consistently across the current generation of AI video models:

Dolly in / dolly out — camera physically moving toward or away from subject. Creates a different feel than zooming, which changes focal length without moving the camera. AI models generally handle this correctly when specified explicitly.

Tracking shot / follow shot — camera moves laterally to follow a subject. Specify direction: "left-to-right tracking shot following the subject."

Crane up / crane down — vertical camera elevation. Useful for reveal shots. "Crane down from rooftop to street level, ending on protagonist."

Handheld — camera movement with organic instability. "Handheld with motivated shake" specifies that the movement should feel like an operator making intentional choices, not random jitter. This distinction matters for tone.

Steadicam / gimbal — smooth, floating camera movement. Useful for following action without the instability of handheld.

Static / locked-off — no camera movement. Often the most powerful choice for emotional scenes. Specifying "static locked-off medium shot" prevents the model from adding unnecessary movement.

Focus and Lens Vocabulary

Rack focus — shifting focus from one plane to another during a shot. "Rack focus from background subject to foreground object" is a classic reveal technique. AI models handle this well when the subjects are clearly positioned in the prompt.

Shallow depth of field / wide open — blurred background, sharp foreground. Specify approximate focal length if you want a specific look: "85mm equivalent, wide open, shallow depth of field."

Deep focus — both foreground and background in sharp focus simultaneously. Often associated with wide angle lenses.

Anamorphic — the horizontal lens flare and bokeh shape associated with anamorphic lenses. Adding "anamorphic lens characteristics" to a prompt reliably produces that look in most current models.

Framing and Composition Vocabulary

Dutch angle / canted angle — tilted horizon line. Communicates psychological unease or instability. Specify degree for more control: "15-degree Dutch tilt."

Low angle — camera below subject eye level, shooting upward. Makes subjects feel powerful, imposing, or threatening depending on context.

High angle / bird's eye — camera above subject. Can convey vulnerability, surveillance, or scale.

Over the shoulder — framing one subject from behind the shoulder of another. Standard for dialogue scenes. Specify which shoulder for matching across cuts.

Two-shot / three-shot — specifies how many subjects appear in frame.

Lighting Vocabulary

Motivated lighting — light that appears to come from a source visible or implied in the scene. "Practical neon lighting motivated left" means the light looks like it is coming from the neon sign visible in the scene.

Rembrandt lighting — specific portrait lighting with characteristic triangle of light on the shadowed side of the face. AI models trained on enough cinematography reference know this term.

Golden hour / magic hour — warm directional light at low angles. More reliable than trying to describe the specific color temperatures.

Hard light vs soft light — hard light creates defined shadows (direct sun, harsh fixtures), soft light wraps and fills (overcast, diffused).

Putting It Together

Compare these two prompts for the same scene:

Basic: "A woman enters a dark room and finds something surprising."

With cinematography language: "Low angle, static locked-off shot. Woman enters frame right into a dark interior. Practical single source light from practical lamp reveals her face in Rembrandt lighting. She stops. Rack focus from the doorframe to her face as she reacts. Shallow depth of field, 50mm equivalent."

The second prompt gives the model a directed scene with specific visual language. The output is not guaranteed to execute perfectly — these models still have inconsistencies — but the probability of getting usable, directed output is substantially higher than with content-only description.

The craft of directing is knowing how to tell a story visually. That craft translates directly to prompting. If you have production experience, you already know this vocabulary. Use it.

The Prompt Problem Nobody Talks About

Camera Movement Vocabulary That Works

Focus and Lens Vocabulary

Framing and Composition Vocabulary

Lighting Vocabulary

Putting It Together

Ulisses Balbino

Colourlab AI Is Now 22x Faster at Color Grading. What That Actually Means for Commercial Post-Production.

ComfyUI Just Removed the Node Graph. Local AI Video Now Has a Real Interface.

DaVinci Resolve 20 Added AI to Every Stage of Post. Here's the Honest Breakdown for Working Editors.