Meet Vidu: A Sora Killer?
Meet Vidu: A Sora Killer?

Meet Vidu: A Sora Killer?

2024-04-30
2 mins read

Sora has not publicly launched yet, but its first contender has popped up. Meet Vidu, the Chinese answer to Sora. Vidu is China’s first long-duration, highly consistent, and highly dynamic video model. A Sora killer? Check it out.

Vidu: China's first long-duration, highly consistent, and highly dynamic video model.
Vidu is China’s first long-duration, highly consistent, and dynamic video model.

Vidu is imaginative, can simulate the physical world, and “produce 16-second videos with consistent characters, scenes, and timeline.

Vidu: From prompt to video

Filmmakers have not decided yet how to ingest Sora, and its contender has emerged. Meet Vidu- China’s first long-duration, highly consistent, and highly dynamic video model. Vidu can generate a 16-second 1080p video with one click. Developed by Chinese AI firm Shengshu Technology and Tsinghua University, Vidu’s capability lies in its Universal Vision Transformer (U-ViT) architecture. “Vidu is the latest achievement of self-reliant innovation, with breakthroughs in many areas,” said Zhu Jun, chief scientist at Shengshu who is also deputy dean at Tsinghua’s Institute for AI, announcing the model at the Zhongguancun Forum held in the Chinese capital reporting by Beijing News. Vidu is ‘imaginative’, “can simulate the physical world” and “produce 16-second videos with consistent characters, scenes, and timeline”, Zhu said, adding that the model is also able to comprehend “Chinese elements”.

Vidu: China's first long-duration, highly consistent, and highly dynamic video model.
Vidu: China’s first long-duration, highly consistent, and highly dynamic video model.

In order for Sora to produce a one-minute clip, it needs eight Nvidia A100 Tesor Core GPUs to run for more than three hours.

Demo clips related to Sora

During the model’s unveiling, Shengshu released several demo clips, including one featuring a panda playing the guitar while sitting on the grass and another of a puppy swimming in a pool, both showing vivid details. The generated imagery was chosen to their similarity to Sora, and in order to make a point, that this is a solid contender to Sora. Indeed, Chinese outlets confirm that Vidu’s debut has raised hopes in the country, which is racing to catch up with leading global generative AI players, such as Microsoft-backed OpenAI. By the way, you can read here more about filmmakers using Sora as a tool. It’s not as easy as we think it is. It’s not just prompt-to-video but involves many editing techniques and a long post-production process. Sometimes it will be easier just to shoot things with your camera.

Vidu: China’s first long-duration, highly consistent, and highly dynamic video model.

Comparison to Sora

In order for Sora to produce a one-minute clip, it needs eight Nvidia A100 Tesor Core GPUs to run for more than three hours, according to Li Yangwei, a Beijing-based technical consultant working in the intelligent computing sector. “Sora requires a lot of computing power for inferencing,” has said and implies that Vidu demands much less than that. That’s interesting as we didn’t hear anything from OpenAI regarding the power horse needed to generate video via Sora. Technically speaking, it merges the strengths of both diffusion and transformer-based text-to-video models, by imaginative capabilities, the ability to simulate the physical world, and the capacity to generate 16-second videos with consistent characters, scenes, and timelines. Furthermore, Vidu is constructed on a proprietary visual transformation model architecture called the Universal Vision Transformer (U-ViT). Developers have indicated that this architecture combines two text-to-video AI models: the Diffusion and the Transformer. This architectural framework facilitates the creation of lifelike videos featuring dynamic camera movements, intricate facial expressions, and authentic lighting and shadow effects. You can find more technical details here. Zhu noted that the introduction of Sora resonated with their technical direction, intensifying their resolve to continue their research efforts. For now, Vidu is inferior to Sora, but the slope is steep so in a year it can look much better. Explore the demonstration below:

So what do you think? A Sora killer? 

Yossy is a filmmaker who specializes mainly in action sports cinematography. Yossy also lectures about the art of independent filmmaking in leading educational institutes, academic programs, and festivals, and his independent films have garnered international awards and recognition.
Yossy is the founder of Y.M.Cinema Magazine.

Leave a Reply

Your email address will not be published.

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Get the best of filmmaking!

Subscribe to Y.M.Cinema Magazine to get the latest news and insights on cinematography and filmmaking!

Sphere’s Postcard from Earth’ Review: An IMAX Experience on Steroids
Previous Story

Sphere’s ‘Postcard from Earth’ Review: An IMAX Experience on Steroids

Buy An IMAX Film Camera for $400,000
Next Story

Buy An IMAX Film Camera for $400,000

Latest from News

Sora: The Digital Equivalent of Junk Food

Sora: The Digital Equivalent of Junk Food

The big news in the world of AI is that Sora, the much-hyped generative AI platform, is now accessible to the public. Promising to revolutionize video production, Sora has been marketed as…
Go toTop

Don't Miss

Sora: The Digital Equivalent of Junk Food

Sora: The Digital Equivalent of Junk Food

The big news in the world of AI is that Sora, the much-hyped generative AI platform, is now accessible to the public. Promising…
OpenAI Sora Has Been Leaked: The Pandora’s Box of AI Creativity

OpenAI Sora Has Been Leaked: The Pandora’s Box of AI Creativity

In a stunning act of rebellion, a group of artists has leaked OpenAI’s Sora project, exposing what many fear could be the most…