Decart’s Lucy 2 Enables Lifelike Coherence in Live AI-Generated Video Streams

Decart’s Lucy 2 Enables Lifelike Coherence in Live AI-Generated Video Streams Decart’s Lucy 2 Enables Lifelike Coherence in Live AI-Generated Video Streams

Decart’s Lucy 2 Enables Lifelike Coherence in Live AI-Generated Video Streams 

AI research lab startup Decart is pitching a new class of “world model” that can generate high-quality video sequences continuously and in real-time without any limits. Unlike previous world models, Lucy 2 maintains unprecedented consistency across extended runtimes, paving the way for live video generation with lifelike coherence. 

Decart is making Lucy 2 available through the company’s APIs and also in the form of Delulu Stream, 

Advertisement

available via Decart’s API, but it’s also the model used to power Decart’s new Delulu Stream tools, released earlier this month as a way for streamers to use real-time AI on with TikTok Live, Twitch, Kick, and YouTube Live.

In a blog post, Decart said Lucy 2 represents a major evolution of its original video model Lucy, because it’s built on an entirely new kind of architecture that allows it to operate as a “single continuous system.” When the user enters a prompt, each frame is generated successively, one after another, in a way that’s perfectly correlated with the previous frame to ensure a coherent video stream that’s free from the inconsistencies that plague other low-latency models. 

Lucy 2 does this while outputting 1080p video at 30 frames-per-second with no buffering and no time constraints, meaning it can keep generating film-quality content for hours on end. Adding to its technical achievements, Decart said Luxy is much more economical than its predecessor, reducing the cost of generative video creation to as low as $3 per hour. 

Maintaining Coherence with Persistent Generation

Decart calls Lucy 2 a “world transformation model,” but it’s really just a new breed of “world model” that’s designed to understand and replicate the dynamics, physical and spatial properties of the real world using camera input.

What makes Lucy 2 different from other world models is its architectural design, which supports what Decart calls “persistent” generation. One of the problems with existing video models such as OpenAI’s Sora 3 is that they work by generating video clips in small batches before stitching them together. 

But this means they’re unable to maintain a continuous state, and is the main reason why AI-generated videos tend to suffer aesthetically, with unrealistic movements, awkward facial expressions, ill-fitting clothes, environmental inconsistencies and numerous other problems. While OpenAI claims that Sora 3 is more consistent than others, it typically only maintains coherent scenes for around 60 seconds before degradation sets in.

Because Lucy 2 generates outputs continuously on a frame-by-frame basis, it’s able to ensure consistency in elements such as anatomy, lighting, clothing behavior and object interaction, resulting in superior cohesion. “For the first time, a world model runs live, in real time, with no quality compromises,” said Decart co-founder and Chief Executive Dean Leitersdorf in a press release. 

Lucy 2 runs as a single, ongoing system from the moment the camera is switched on. Rather than generating small segments of video, it models people, environments and their motion in real time and pairs this with auto-regression to preserve full-body movement, physical presence and timing with sub-second latency. 

Because there’s no batching or buffering, it can continuously update the video as the subject moves, gestures or adapts their posture. Each frame represents a direct continuation of the one that preceded it, making it far easier to maintain coherence over long sequences. 

Live AI Video with Greater Realism

Lucy 2 is uniquely suited for real time applications, including live entertainment formats where performers transform into AI-generated characters inside virtual worlds that respond instantly to context and movement. 

Alternatively, Decart believes it can enable more realistic virtual shopping experiences, where shoppers can try on clothing virtually and model them in different poses with greater realism. 

Perhaps the most intriguing use case for Lucy 2 is the creation of simulated environments for robotics and embodied AI training. Decart demonstrated how the model can be integrated with the Nvidia Omniverse simulation platform and dynamically adapt elements such as the lighting, textures and visibility during robot training simulations. 

Doing this can help to make the simulations more unpredictable, just like the real-world, resulting in smarter robots that can better adapt to changing environments. 

“Robots today are trained in simulations that are too perfect,” Leitersdorf said. “We can mess it up, add smoke, change materials, turn off the lights, and suddenly the robot learns to operate in conditions that look much closer to reality.”

The AI Video Realism Race

Lucy 2 promises to excel in terms of continuity, and Decart has certainly differentiated its models when it comes to real-time output, but the company is going up against a tough group of competitors that are all pursuing their own methods of making AI video more realistic and true-to-life. 

One of its leading rivals is Runway, which has long attempted to stand out in terms of its cinematic quality and creative workflows, aiming at professional audiences such as film directors. Its most advanced model, Gen-4.5, delivers superior motion to other video generation models, thanks to an enhanced physics engine that models weight, momentum and fluid dynamics. 

Runway also prioritizes user control with production-grade, in-context video editing tools that go beyond raw, one-shot generation. For instance, users can make adjustments such as panning and tilting the camera with defined intensity, zoom in and out on specific elements and so on. Its motion brush tool enables creators to draw on a specific part of the video and dictate how it moves, so a waterfall would flow rapidly while trees in the background remain motionless. It’s all about the finer details, in other words. 

While Sora 3 more closely resembles Lucy 2 with its focus on character and object consistency, it also distinguishes itself in terms of prompt understanding. It relies on the same re-captioning techniques that were first introduced in OpenAI’s DALL-E 3 to comprehend more intricate and detailed user instructions with a higher level of accuracy. 

This means it can better understand intent and generate content that more accurately reflects the creator’s vision, including complex scenes with multiple characters that display specific types of motion and behaviors. 

Google’s best effort so far is Veo 3, which emphasizes audio synchronization. Whereas other models generate the video first and overlay soundtracks and dialogue afterwards, Veo uses a joint diffusion method to create the audio simultaneously. So when an AI-generated character speaks, their lip movements should align perfectly with the words that come out of their mouth. 

To aid in coherence, Google has developed a different technique called “first frame, last frame,” where users can upload multiple reference images of characters that the model can follow to try and maintain visual consistency from the beginning to the end. 

This increased specialization is beneficial to creators, but Decart’s Lucy 2 promises a more fundamental change. By solving the problem of coherence over long durations, it’s morphing AI video from offline generation into a dynamic, real-time creative tool for live digital content that resembles “true” video featuring human actors.

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Advertisement

Pin It on Pinterest

Share This