Nº 058 · AI ·8 min read · March 28, 2026

I Published My LTX-Video 2 Review Yesterday. Today LTX 2.3 Dropped with 4K and Audio.

Fig. 01 I Published My LTX-Video 2 Review Yesterday. Today LTX 2.3 Dropped with 4K and Audio.

This Is What Fast Looks Like

Yesterday I published a detailed review of LTX-Video 2 on this site. My conclusion: strong for landscapes and atmosphere, still limited for faces, 10x cheaper than Kling, worth integrating into hybrid production workflows.

Today, Lightricks shipped LTX-Video 2.3.

Native 4K. Synchronized audio. Single-pass generation. All three in one update.

I've been in production long enough to know that when a tool improves this fast, you're either watching a race to the bottom or a genuine engineering breakthrough. Based on what I've tested in the last few hours, this is the second one.

What Version 2.3 Actually Changes

The two limitations I cited in yesterday's review were: face consistency in close-up, and the fact that LTX-Video generated silent clips. You'd have to layer audio in post — not a dealbreaker for B-roll, but a real constraint for narrative content.

Version 2.3 addresses the second one completely. Synchronized audio in a single generation pass means the model is now producing video and audio as a unified output, not two separate elements patched together. For anyone building content pipelines, this is a significant workflow change. One API call, one complete clip, audio already matched to the visuals.

The 4K upgrade matters differently depending on your context. For social content where 1080p has always been sufficient, it's headroom — future-proofing. For anyone pitching AI-assisted content to clients who care about specs (and they all do eventually), it removes a conversation. "Can you deliver 4K?" is no longer a question with an asterisk.

What Still Hasn't Changed

I ran the same face-in-close-up test from yesterday with 2.3. The improvement is marginal. Human faces in sustained close-up still have the uncanny quality that's been the consistent weakness of LTX-Video across versions. The rule stands: use it for atmosphere, environments, objects, and abstract motion. Not for hero shots where a face carries the emotional weight.

This isn't a criticism — it's a scoping note. Every tool has a job it's designed to do. LTX-Video 2.3's job is landscape-scale cinematic generation, hybrid B-roll, fast concept visualization, and now full audio-sync clips at 4K. That's a significant job. It just isn't a digital actor replacement.

The Parallel Development That Matters

LTX 2.3 didn't land alone. The same week, ByteDance (in partnership with Peking University and Canva) released Helios — a model that generates 60-second videos at real-time speed on a single consumer GPU.

These are not competing products. They're different bets on the same problem: making high-quality video generation fast enough and cheap enough to be useful in actual workflows, not just demos.

LTX 2.3 says: we'll give you cinema-quality output on professional infrastructure, native 4K, audio-synced, open-source so you can run it however you want.

Helios says: we'll give you acceptable quality at speeds that make iteration possible on consumer hardware.

The production answer is: you'll use both, for different stages of different projects. That's already how hybrid workflows operate.

What I'm Actually Changing in My Workflow

The synchronized audio changes one specific thing for me: I was previously treating LTX-Video clips as silent B-roll that I'd score in post. With 2.3, I can now test whether the model generates ambient audio that matches visual environments well enough to reduce post work.

Early indication: ambient audio (wind through trees, room tone, environmental texture) is usable. Precise synchronized sound design (a specific footstep, a door closing at frame 3.2 seconds) is not. The model is generating plausible environmental audio, not precision sound design. That distinction matters for how you use it.

For the campaigns I build where the brief is "30 social pieces per month," the audio capability means I can now deliver clips that don't immediately announce themselves as AI-generated through their silence. That's a real quality-of-life improvement for the client delivery side.

The Pace Is the Story

I've covered enough technology cycles to resist the temptation to call every update a breakthrough. But the pace of improvement in open-source video generation right now is genuinely different from what I've seen in other tools.

When I started using AI image generation two years ago, meaningful improvements came every four to six months. LTX-Video went from version 2 to 2.3 with 4K and audio in the time it took me to publish one review.

That pace doesn't mean you need to constantly rebuild your workflow. It means you need to stay close enough to the tools to recognize when a change is cosmetic versus when it actually shifts what's possible. This one shifts something.

My LTX-Video 2 review from yesterday still stands as an accurate picture of the baseline. Version 2.3 is an upgrade on two of its stated limitations. The one it didn't address — face consistency — remains the constraint I'd watch most closely in the next update.

Source: New AI Model Releases — March 2026 | Build Fast With AI — 12+ Models in March 2026

About the author

Read the manifesto Write in