Nº 019 · AI ·7 min read · March 15, 2026

Veo 3.1 Is Now Inside YouTube Shorts. The 'Ingredients to Video' Feature Is More Useful Than It Sounds.

Fig. 01 Veo 3.1 Is Now Inside YouTube Shorts. The 'Ingredients to Video' Feature Is More Useful Than It Sounds.

The Tool That's Already in Everyone's Pocket

There's a consistent pattern in how AI tools get adopted at scale. The technology launches in a dedicated app. Early adopters explore it. Quality improves. Then the technology gets embedded inside a platform that already has billions of users — and adoption accelerates past anything the original launch achieved.

That is exactly what happened with Veo 3.1. Google DeepMind's most advanced video generation model is now available natively inside YouTube Shorts and the YouTube Create app. You don't download a separate tool. You don't create an API account. You don't leave the platform. You click Create, select your images, and generate video — in the same workflow where you edit and publish to Shorts.

The feature is called Ingredients to Video. The name is better than it sounds.

How It Works: Three Inputs, One Video

The workflow is straightforward. You upload up to three reference images — Google's examples use "yourself, an object, and a background." The model takes these inputs simultaneously and generates a cohesive video clip that incorporates all three elements into a single scene. The outputs are natively vertical (9:16) for Shorts, with upscaling available to 4K.

For brand content, the "ingredients" framing maps directly to a production brief: talent, product, environment. A cosmetics brand uploads a talent reference, a product shot, and a background that matches the campaign aesthetic. The model generates a video that puts them together without a physical shoot. That's a concept I understand very well from the production side — we call it a moodboard brief. The "ingredients" are exactly what a director receives from a brand team before pre-production begins.

Character consistency is a specific focus of the Veo 3.1 upgrade. The same character appears correctly across multiple scenes, meaning you can generate a sequence where the talent maintains consistent appearance as the setting changes — something that was unreliable in earlier AI video models and required manual correction in post.

Audio That Matches the Visual

Veo 3.1 generates synchronized audio alongside the video. Not just background music or generic sound effects — audio that matches the specific content of the generated scene, including multi-person conversations and precisely timed sound effects guided by the text prompt. For short-form social content where audio quality expectations are set by what's native to the platform, this closes the main gap between generated video and production-ready output.

Combined with YouTube's existing audio tools and the ability to layer voiceovers, the generation-to-publish pipeline for a 15-second Shorts piece is now nearly complete within the platform. The only step that remains external is script development and creative direction — which is exactly where a director's value should sit.

The Distribution Moat Nobody Can Replicate

I wrote about Meta's Mango video model and how Meta's three billion users change the competitive equation for AI video tools. Veo 3.1 inside YouTube Shorts is the same argument, applied to the largest video platform in the world. YouTube has 2.7 billion monthly active users and the most established creator ecosystem anywhere on the internet.

When AI video generation is natively integrated into that platform — not just available, but embedded in the creation flow where creators already spend their time — the adoption curve is fundamentally different from any standalone AI video tool. Runway has better output quality. Kling has more precise motion control. But neither of them has billions of users who will encounter their capabilities as a native feature of the platform they already use every day.

For professional production companies, this creates a two-tier reality. Platform-native tools serve the volume of content that requires speed and distribution efficiency over precision. Specialized production tools serve the work that requires full creative control and professional quality standards. The question is not which tier wins — both will exist. The question is which tier your clients' work belongs in, and whether you're positioned to serve both.

What Changes for Brand Creators Right Now

The immediate practical change: social content for YouTube Shorts can now be generated, refined, and published in a single platform workflow. For brands managing always-on content programs — the constant churn of product videos, campaign teasers, seasonal content, and audience engagement material — this reduces the production overhead significantly. Content that previously required a shoot day or a separate AI tool workflow now has a path to generation inside the platform where it will be distributed.

Access is available through Gemini API, Vertex AI, Flow for professional workflows, and directly in YouTube Shorts and the YouTube Create app for consumer-facing creation. The enterprise path exists for production studios and agencies that want API-level access to Veo 3.1 outside the native platform experience.

My suggestion: test the Ingredients to Video feature on your next brand campaign with a product that photographs clearly. Run the generation with proper brief inputs — quality talent reference, clear product shot, strong environmental reference. Evaluate the output against your current social production standard. The gap you find will tell you exactly where the tool fits in your workflow right now, and where to watch for improvement over the next six months.

Sources: Google Blog — Veo 3.1 Ingredients to Video | CineD — Veo 3.1 Update: Vertical Format, 4K Upscaling, Character Consistency | Chrome Unboxed — Google upgrades AI video with Veo 3.1

About the author

Read the manifesto Write in