top of page

Alibaba Unveils Lightweight AI Model for Image and Video Processing on Mobile Devices

  • Writer: tech360.tv
    tech360.tv
  • Mar 28, 2025
  • 2 min read

Alibaba Group Holding has launched a new multimodal artificial intelligence model, Qwen2.5-Omni-7B, capable of processing text, images, audio and video directly on smartphones, tablets and laptops.


Purple geometric logo and text "Qwen2.5-Omni" on black background. Blue streaks flow from the logo, creating a dynamic, futuristic feel.
Credit: ALIBABA

The model, introduced on Thursday, is the latest addition to Alibaba’s Qwen family and is designed to run locally on devices with limited computing power. With only 7 billion parameters, it enables real-time responses in text or audio without requiring an internet connection.


Cartoon bear named Ethan talking, saying "I can explain PPTs, web materials, and more." Background shows blurred document and waveforms.
Credit: ALIBABA

Qwen2.5-Omni-7B is open-source and available on Hugging Face, Microsoft’s GitHub and Alibaba’s ModelScope. It is also integrated into Alibaba’s Qwen Chat.


Alibaba highlighted potential applications such as providing real-time audio descriptions for visually impaired users and offering cooking guidance by analysing ingredients. The model’s ability to handle multiple input types reflects growing demand for AI systems that extend beyond text generation.





In benchmark tests, Qwen2.5-Omni-7B scored 56.1 on OmniBench, outperforming Google’s Gemini-1.5-Pro, which scored 42.9. It also achieved 92.4 on the CV15 audio benchmark, surpassing Alibaba’s earlier Qwen2-Audio model by one point.


For image-related tasks, the model scored 59.2 on the Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark, beating the Qwen2.5-VL vision-language model.


The release aligns with a broader industry trend toward efficient, multimodal AI models that prioritise portability and data privacy. These models can operate without cloud-based processing, reducing reliance on external servers.


Other tech firms are also advancing in this space. OpenAI recently added image generation to its GPT-4o model, while ByteDance introduced InfiniteYou, a tool that re-crafts images while preserving subjects’ identities. In January, DeepSeek released Janus-Pro, an updated version of its multimodal model.


Alibaba’s Qwen models have become popular among AI developers in mainland China, positioning the company as a key competitor to DeepSeek’s V3 and R1 models.

  • Alibaba launched Qwen2.5-Omni-7B, a multimodal AI model for mobile devices

  • The model processes text, images, audio and video locally without internet

  • It outperformed Google’s Gemini-1.5-Pro in benchmark tests


Source: SCMP

Technology increasingly permeates every facet of our lives, making informed decision making an essential pursuit. We bridge this gap by combining the precision of AI with the irreplaceable discernment of human expertise. Our team produces rigorous product reviews that offer unique insights, honest critiques, and trustworthy recommendations. We also leverage AI to synthesise complex news from reliable sources into clear, actionable updates, ensuring that every story is carefully fact checked by our editorial staff before publication. Accuracy remains our priority. Should you identify any discrepancies, please contact us at editorial@tech360.tv. Your feedback is a vital part of our process in maintaining the high standards our readers deserve.

Tech360tv is Singapore's Tech News and Gadget Reviews platform. Join us for our in depth PC reviews, Smartphone reviews, Audio reviews, Camera reviews and other gadget reviews.

  • YouTube
  • Facebook
  • TikTok
  • Instagram
  • Twitter
  • LinkedIn

© 2021 tech360.tv. All rights reserved.

bottom of page