Google Admits Gemini Demo Video Not Conducted in Real Time

Kyle Chua
Dec 11, 2023
2 min read

Google is drawing flak for tricking the public into thinking its latest artificial intelligence (AI) model is more capable than it actually is.

Google Gemini Demo Video — Credit: Google

The search engine provider last week launched what it touted as its fastest and most flexible AI model yet, Gemini, showcasing its capabilities in a six-minute demo video released to the media and to the public. The video shows how Gemini can respond to spoken-word prompts and videos in real time as though it were in a real conversation. It also shows how the new model can describe a drawing of a duck versus that of a rubber duck, among other examples.

In the video's description, Google includes a disclaimer that reads, "For the purposes of this demo, latency has been reduced, and Gemini outputs have been shortened for brevity." That disclaimer, along with other important information about the Gemini's capabilities, however, are nowhere to be found in the video itself.

https://www.youtube.com/watch?v=UIZAiXYceBI&t=78s

As a result, the public thought that the demo in the video was all happening in real time. Google only later admitted to Bloomberg that that wasn't the case. In a statement, the tech giant described the video as "an illustrative depiction of the possibilities of interacting with Gemini, based on real multimodal prompts and outputs from testing", suggesting it doesn't acurately reflect the version of the service that's releasing on 13 December.

It's not the first time Google's AI demos sparked controversy. Some of the search giant's own employees called it out earlier in the year for supposedly rushing out the demo of the chatbot Bard in an attempt to compete with Microsoft's own showing that same week.

Google reportedly scrapped plans to hold an in-person launch of Gemini, likely because it thought a virtual showing, where the AI's response speeds are shortened by editing, could seem more advanced than it actually is.

Still, the new AI model is nonetheles impressive as it's said to be natively multimodal, meaning it can process not only text but also images, videos and audio. In comparison, OpenAI's GPT-4 model, its latest and most advanced model so far, is primarily a text-based model, and only becomes multimodal with the use of plugins.

Goolge launched what it touted as its fastest and most flexible AI model yet, Gemini, showcasing its capabilities in a six-minute demo video released to the media and to the public.
The video, however, tricked the public into thinking the user's interactions with the AI were happening in real time, when in reality they weren't.
In a statement, Google said the video was just "an illustrative depiction of the possibilities of interacting with Gemini, based on real multimodal prompts and outputs from testing", suggesting it doesn't acurately reflect the version of the service that's releasing on 13 December.