Nuh-uh, Meta, we can do text-to-video AI, too, says Google

Hot on the heels of Meta's Make-A-Video, Google said on Wednesday it too has built an AI-powered text-to-video system. This one's called Imagen Video.

We dare say the public reveal of Make-A-Video last week spurred the Big G to suddenly start shouting about its own competing system, lest it looks like it's fallen behind Mark Zuckerberg's team. Or perhaps Meta learned of Google's planned announcement, and raced to spoil it with its own unveiling. It seems too much of a coincidence.

Given a text prompt, such as "sprouts in the shape of text 'Imagen Video' coming out of a fairytale book. Smooth video," Google's software generates a sequence of images to create the short clip as seen below.

There are numerous other examples of the model completely fabricated footage from prompts, such as from "a teddy bear running through New York City," or "incredibly detailed science fiction scene set on an alien planet, view of a marketplace. Pixel art."

Imagen Video builds upon Google's previous text-to-image system, Imagen, launched in May. Instead of a single still picture, however, Imagen Video builds a video out of multiple frames of output.

Text-to-video systems are more computationally intensive to train and run than text-to-image systems. Imagen Video, for example, is made up of seven types of models. For one thing, it has to not just generate a frame from its text prompt but also predict what the next frames would be to form a coherent moving animation - each frame a slight progression from the previous - rather than a series of related images that played back would look like a jumbled mess.

"Imagen Video generates high resolution videos with Cascaded Diffusion Models," according to a Google research note.

"The first step is to take an input text prompt and encode it into textual embeddings with a T5 text encoder.

"A base Video Diffusion Model then generates a 16-frame video at 24×48 resolution and three frames per second; this is then followed by multiple Temporal Super-Resolution (TSR) and Spatial Super-Resolution (SSR) models to upsample and generate a final 128-frame video at 1280×768 resolution and 24 frames per second - resulting in 5.3 seconds of high definition video."

Like Meta's Make-A-Video, the quality of Google's Imagen Video is somewhat fuzzy. Edges of images are blurry, and the resolution isn't great yet. Research and development into generative visual models, however, moves quickly, and it'll only be a matter of time before a new architecture will create fake videos that are crisper, in high-definition, over longer periods of time.

These models show that computers are good at learning the logical sequence of events to simulate events, such as a water balloon bursting or an ice cream melting. Boffins at Google Brain described Imagen Video as being "temporally-coherent" and "well-aligned with the given prompt" in a non-peer reviewed research paper [PDF].

An internal Google dataset made up of 14 million video-text samples and 60 million image-text pairs, as well as information from the publicly available LAION-400M image-text dataset, was used to train Imagen Video.

"Video generative models can be used to positively impact society, for example by amplifying and augmenting human creativity. However, these generative models may also be misused, for example to generate fake, hateful, explicit or harmful content," the researchers said. The LAION-400M dataset is also known to contain pornographic and other types of problematic images.

Although the team have applied content filters to block unsavory text prompts or images in videos generated by the model, Imagen Video is still prone to creating content with "social biases and stereotypes" and is not safe yet for people to experiment with. "We have decided not to release the Imagen Video model or its source code until these concerns are mitigated," they concluded.

So, like Meta's toy, Imagen Video isn't available to the general public, perhaps making their public unveiling more recruitment tools - hey, come work on cool stuff like this - than anything else right now. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Dec 3
OpenAI tweaks ChatGPT to avoid dangerous AI information

In brief Plus: DeepMind beats humans at Stratego

Dec 3
Google says Android runs better when covered in Rust

Banishing memory safety bugs cuts critical vulnerabilities

Dec 2
Google frees nifty ML image-compression model... but it's for JPEG-XL

Yep. The very same JPEG-XL that's just been axed from Chromium

Dec 2
Blockchain needs a reason to exist, Boris Johnson tells roomful of blockchain pros

As for Twitter, politicians need to grow thick skins and stop mistaking it for advertisement

Dec 2
DoJ worries messaging apps could hide evidence of crime, corruption

Record keeping rules might need a tweak to ensure content is preserved

Dec 1
Microsoft adds silicon muscle into latest Azure SQL database configs

Intel's 'Ice Lake' and AMD's 'Milan' chips bump up speeds and feeds

Dec 1
.NET open source is 'heavily under-funded' says AWS

RE:INVENT Amazon web arm investing in Microsoft's platform to help customers escape Windows