Google Unveils Gemini, Its Most Capable, Flexible AI Model Yet

Kyle Chua
Dec 7, 2023
2 min read

Updated: Dec 16, 2023

Google is making a major move in the artificial intelligence (AI) space.

The search engine giant today introduced Gemini, its most capable, flexible and general AI model yet. Gemini is a multimodal model, meaning it can process not only text but also images, videos and audio. It's reportedly also capable of completing complex tasks like solving math and physics problems, as well as generating code in different programming languages.

The first version of the model, Gemini 1.0, is said to be optimised for three different sizes: Ultra, Pro and Nano. Gemini Ultra is, according to Google, the most capable and largest model among the three, built for complex tasks. Gemini Pro, meanwhile, is the best model for scaling across a wide range of tasks. Finally, Gemini Nano is the most efficient model for on-device tasks.

"Google DeepMind ran the Gemini Pro base model through a number of industry-standard benchmarks and found that Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely used industry benchmarks - this includes MMLU (massive multitask language understanding), where Gemini has scored 90.04%," touted Google.

Gemini Ultra is available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback. It'll then be available to developers and enterprise customers early next year.

Gemini Pro, on the other hand, is already integrated with Google Bard, making the conversational AI far more capable at things like understanding and summarising, reasoning, brainstorming, writing and planning, among other tasks.

For now, the new AI model is available via its integrations with Google Bard. Users can try it out starting today using text-based prompts, with other modalities expected to be added in the future. It's available in English in over 180 countries to start, but it'll be including more languages and regions in the near future.

Gemini Nano is also powering various features in Google's flagship Pixel 8 Pro smartphone, including Summarise in the Recorder app and Smart Reply in Gboard.

Google plans to roll out Gemini to other products and services in the coming months, which include Search, Ads, Chrome and Duet AI.

Gemini stands out versus other AI models out there today since it's natively multimodal. In comparison, OpenAI's GPT-4 model, its latest and most advanced model so far, is primarily a text-based model. It only becomes multimodal with plugins and integrations, relying on DALL-E 3 and Whisper, for example, to generate images and process audio, respectively.

Google has unveiled Gemini, its most capable, flexible and general AI model yet.
Gemini is a multimodal model, meaning it can process not only text but also images, videos and audio.
For now, the new AI model is available via its integrations with Google Bard and the Google Pixel 8 smartphone.
Gemini stands out versus other AI models out there today since it's natively multimodal, whereas OpenAI's GPT-4 model is primarily text-based.