top of page

Beijing AI Academy Unveils Multimodal Model

  • Writer: tech360.tv
    tech360.tv
  • Oct 22, 2024
  • 2 min read

Beijing Academy of Artificial Intelligence (BAAI) unveils Emu3, a groundbreaking multimodal AI model. Emu3 can interpret text, images, and video simultaneously, showcasing China's technological advancement. BAAI's innovative approach simplifies model training by using a unified AI architecture.


Beijing AI Academy Unveils Groundbreaking Multimodal Model
Credit: Shutterstock

This move positions Chinese firms at the forefront of innovation, bridging the gap with leading US counterparts in the AI sector.


Facing challenges like restricted access to advanced chips and limited capital compared to US companies, Chinese AI startups are striving to match the rapid pace of model development set by industry giants like OpenAI and Google. BAAI, a non-profit organisation, plays a pivotal role in fostering growth within China's AI community.


At a recent event in Beijing, BAAI showcased Emu3, its latest multimodal model. Emu3 utilises a streamlined architectural design to train models in comprehending images and generating video clips. Unlike traditional models that focus on a single data type, multimodal models like Emu3 can process various inputs such as text, video, and audio simultaneously.


Wang Zhongyuan, the head of BAAI, also known as the Zhiyuan Institute, hailed Emu3 as the "largest technological contribution in recent years" from the organisation. Emu3 employs a unified AI architecture that converts text, images, and video clips into tokens, the fundamental units of data processed by AI models.


This innovative approach eliminates the need for separate models to handle different data types, streamlining the training process and enhancing efficiency in developing versatile AI models. BAAI reported that Emu3 surpasses established task-specific models like Stable Diffusion XL in image generation and the multimodal model LLaVA in both understanding and creating images.

  • Beijing Academy of Artificial Intelligence (BAAI) unveils Emu3, a groundbreaking multimodal AI model.

  • Emu3 can interpret text, images, and video simultaneously, showcasing China's technological advancement.

  • BAAI's innovative approach simplifies model training by using a unified AI architecture.


Source: SCMP


Comments


As technology advances and has a greater impact on our lives than ever before, being informed is the only way to keep up.  Through our product reviews and news articles, we want to be able to aid our readers in doing so. All of our reviews are carefully written, offer unique insights and critiques, and provide trustworthy recommendations. Our news stories are sourced from trustworthy sources, fact-checked by our team, and presented with the help of AI to make them easier to comprehend for our readers. If you notice any errors in our product reviews or news stories, please email us at editorial@tech360.tv.  Your input will be important in ensuring that our articles are accurate for all of our readers.

Tech360tv is Singapore's Tech News and Gadget Reviews platform. Join us for our in depth PC reviews, Smartphone reviews, Audio reviews, Camera reviews and other gadget reviews.

  • YouTube
  • Facebook
  • TikTok
  • Instagram
  • Twitter
  • LinkedIn

© 2021 tech360.tv. All rights reserved.

bottom of page