Kyle Chua
- Feb 9, 2023
- 2 min read

Google's New AI Can Animate Images, Videos With Text Prompts

Have you ever wanted to bring a still image to come to life? That's now possible thanks to Dreamix, a new editing approach that leverages video diffusion models to enable text-based motion.

The generative artificial intelligence (AI) system, developed by a team from Google Research and the Hebrew University of Jerusalem, can make the subject of a video perform actions based on text prompts. It can also animate the subject of a photo, making it move as though it were a video.

So how does it work? Dreamix first corrupts the video to be edited by downsampling and adding signal noise to it – essentially degrading the quality. Then, it uses the low-resolution details in the video to synthesise images or motion as guided by a text prompt, before upscaling the video to its final resolution.

In the study, the developers, for instance, showed how they were able to turn a video of a monkey eating into a video of a bear dancing using the text prompt "A bear dancing and jumping to upbeat music, moving his whole body".

The only caveat – at least right now – is that the quality of the resulting video takes a hit, with the subject losing some fine detail during motion. Still, you'll likely be able to discern what it is you're looking at, whether it's a dog or a turtle, as demonstrated in the video above. It's also impressive how the system understands the context of the requests, allowing it to output videos that are accurate to the text prompt.

While the study is accessible to the public, Dreamix itself isn't, and it's not clear when the tool will be available, if there are even plans to put it out.

Google sees AI as an important part of its business, announcing yesterday Bard, a ChatGPT like chatbot service.

"Whether it’s helping doctors detect diseases earlier or enabling people to access information in their own language, AI helps people, businesses and communities unlock their potential," said Google CEO Sundar Pichai. "And it opens up new opportunities that could significantly improve billions of lives."

A team from Google Research and the Hebrew University of Jerusalem has published a study detailing Dreamix, a new editing approach that leverages video diffusion models to enable text-based motion.
The generative AI they developed can make the subject of a video perform actions based on text prompts.
The developers, for instance, showed how they were able to turn a video of a monkey eating into a video of a bear dancing using the text prompt "A bear dancing and jumping to upbeat music, moving his whole body".