Meet Vidu: A Sora Killer?
Meet Vidu: A Sora Killer?

Meet Vidu: A Sora Killer?

2024-04-30
2 mins read

Sora has not publicly launched yet, but its first contender has popped up. Meet Vidu, the Chinese answer to Sora. Vidu is China’s first long-duration, highly consistent, and highly dynamic video model. A Sora killer? Check it out.

Vidu: China's first long-duration, highly consistent, and highly dynamic video model.
Vidu is China’s first long-duration, highly consistent, and dynamic video model.

Vidu is imaginative, can simulate the physical world, and “produce 16-second videos with consistent characters, scenes, and timeline.

Vidu: From prompt to video

Filmmakers have not decided yet how to ingest Sora, and its contender has emerged. Meet Vidu- China’s first long-duration, highly consistent, and highly dynamic video model. Vidu can generate a 16-second 1080p video with one click. Developed by Chinese AI firm Shengshu Technology and Tsinghua University, Vidu’s capability lies in its Universal Vision Transformer (U-ViT) architecture. “Vidu is the latest achievement of self-reliant innovation, with breakthroughs in many areas,” said Zhu Jun, chief scientist at Shengshu who is also deputy dean at Tsinghua’s Institute for AI, announcing the model at the Zhongguancun Forum held in the Chinese capital reporting by Beijing News. Vidu is ‘imaginative’, “can simulate the physical world” and “produce 16-second videos with consistent characters, scenes, and timeline”, Zhu said, adding that the model is also able to comprehend “Chinese elements”.

Vidu: China's first long-duration, highly consistent, and highly dynamic video model.
Vidu: China’s first long-duration, highly consistent, and highly dynamic video model.

In order for Sora to produce a one-minute clip, it needs eight Nvidia A100 Tesor Core GPUs to run for more than three hours.

Demo clips related to Sora

During the model’s unveiling, Shengshu released several demo clips, including one featuring a panda playing the guitar while sitting on the grass and another of a puppy swimming in a pool, both showing vivid details. The generated imagery was chosen to their similarity to Sora, and in order to make a point, that this is a solid contender to Sora. Indeed, Chinese outlets confirm that Vidu’s debut has raised hopes in the country, which is racing to catch up with leading global generative AI players, such as Microsoft-backed OpenAI. By the way, you can read here more about filmmakers using Sora as a tool. It’s not as easy as we think it is. It’s not just prompt-to-video but involves many editing techniques and a long post-production process. Sometimes it will be easier just to shoot things with your camera.

Vidu: China’s first long-duration, highly consistent, and highly dynamic video model.

Comparison to Sora

In order for Sora to produce a one-minute clip, it needs eight Nvidia A100 Tesor Core GPUs to run for more than three hours, according to Li Yangwei, a Beijing-based technical consultant working in the intelligent computing sector. “Sora requires a lot of computing power for inferencing,” has said and implies that Vidu demands much less than that. That’s interesting as we didn’t hear anything from OpenAI regarding the power horse needed to generate video via Sora. Technically speaking, it merges the strengths of both diffusion and transformer-based text-to-video models, by imaginative capabilities, the ability to simulate the physical world, and the capacity to generate 16-second videos with consistent characters, scenes, and timelines. Furthermore, Vidu is constructed on a proprietary visual transformation model architecture called the Universal Vision Transformer (U-ViT). Developers have indicated that this architecture combines two text-to-video AI models: the Diffusion and the Transformer. This architectural framework facilitates the creation of lifelike videos featuring dynamic camera movements, intricate facial expressions, and authentic lighting and shadow effects. You can find more technical details here. Zhu noted that the introduction of Sora resonated with their technical direction, intensifying their resolve to continue their research efforts. For now, Vidu is inferior to Sora, but the slope is steep so in a year it can look much better. Explore the demonstration below:

So what do you think? A Sora killer? 

Yossy is a filmmaker who specializes mainly in action sports cinematography. Yossy also lectures about the art of independent filmmaking in leading educational institutes, academic programs, and festivals, and his independent films have garnered international awards and recognition.
Yossy is the founder of Y.M.Cinema Magazine.

Leave a Reply

Your email address will not be published.

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Sphere’s Postcard from Earth’ Review: An IMAX Experience on Steroids
Previous Story

Sphere’s ‘Postcard from Earth’ Review: An IMAX Experience on Steroids

Buy An IMAX Film Camera for $400,000
Next Story

Buy An IMAX Film Camera for $400,000

Latest from News

The Price That Nikon Paid for RED: $85 Million

The Price That Nikon Paid for RED: $85 Million

$85 Million. That was the price Nikon paid for RED Digital Cinema, according to the Consolidated Financial and Business Data for the Year ended March 31, 2024 (IFRS) document released by Nikon.…
IMAX New 65mm Film Cameras: All Details

IMAX Next-Gen 65mm Film Cameras: All Details

This was one of the most fascinating sessions during NAB 2024. The lecture titled “The Art and Science of IMAX” sheds light on IMAX filmmaking and the upcoming next-generation of IMAX 65mm film…
Go toTop

Don't Miss

OpenAI Presents ‘Voice Engine’: Generate Natural Sounding Based on Your Voice

OpenAI Presents ‘Voice Engine’: Generate Natural Sounding Based on Your Voice

OpenAI has been developing ‘Voice Engine’ for two years. The technology allows users to upload any 15-second voice sample to generate a synthetic
Sora: Democratization of Filmmaking

Sora: Democratization of Filmmaking

One of the first creators who used Sora has shared his rare experience with the AI-based text-to-video super powerful platform. It appears that