top of page

Ant Group Unveils AI Inference Framework Outperforming Nvidia, Berkeley Competitors

  • Writer: tech360.tv
    tech360.tv
  • Oct 16
  • 2 min read

Chinese fintech giant Ant Group has open-sourced dInfer, an inference framework for diffusion language models, which it claims makes artificial intelligence systems more efficient. The Alibaba Group Holding affiliate stated dInfer surpasses a framework proposed by Nvidia and is faster than an open-source inference engine developed by researchers at the University of California, Berkeley.


Building facade with Ant Financial logo and text, framed by tree leaves. The sky is a clear blue, creating a calm urban atmosphere.
Credit: ANT GROUP

Ant Group, the operator of Alipay, announced on Monday that dInfer is designed for diffusion language models. These models generate outputs in parallel, differing from autoregressive systems, such as ChatGPT, which produce text sequentially. Diffusion models are already widely utilised in image and video generation.


The company asserted that dInfer is up to three times faster than vLLM, an open-source inference engine from University of California, Berkeley researchers. Furthermore, it is 10 times faster than Fast-dLLM, Nvidia’s own framework.


Autoregressive language models, including OpenAI’s GPT-3.5 and DeepSeek’s R1, have largely powered the chatbot boom due to their strengths in understanding and generating human language. Nevertheless, researchers continue to explore diffusion language models for potentially greater capabilities.


Ant Group’s focus on alternative model paradigms highlights how China’s technology firms are enhancing algorithmic and software optimisation. This strategy aims to counterbalance the country’s disadvantages in advanced AI chips.


Internal tests conducted on Ant’s diffusion model LLaDA-MoE showed dInfer generated an average of 1,011 tokens per second on the HumanEval code-generation benchmark. This compares with 91 tokens per second for Nvidia’s Fast-dLLM and 294 for Alibaba’s Qwen-2.5-3B model, which was optimised with vLLM.


Researchers noted that these results help address a primary limitation of diffusion language models: their high computational cost. "We believe that dInfer provides both a practical toolkit and a standardised platform to accelerate research and development in the rapidly growing field of dLLMs," Ant researchers wrote in a technical report.


This announcement follows other artificial intelligence activities from the Hangzhou-based firm. On Tuesday, Ant Group unveiled a one-trillion-parameter large language reasoning model, one of the world's biggest open-sourced models, which scored strongly on reasoning benchmarks.


Ant Group entered the AI model race in 2023 with a self-developed financial large language model. Its current portfolio includes the Ling series non-thinking large language models, Ring series reasoning models, Ming series multimodal models, and the experimental diffusion model LLaDA-MoE. The company is also developing AWorld, a framework supporting continual learning among AI agents.


Digital circuit board with glowing blue AI chip in the center, surrounded by intricate lines on a dark blue background, conveying technology.

"At Ant Group, we believe artificial general intelligence (AGI) should be a public good – a shared milestone for humanity’s intelligent future," said Chief Technology Officer He Zhengyu. AGI refers to a theoretical AI system that could surpass humans in most economically valuable tasks, a goal for companies like OpenAI and Alibaba.


Other Chinese technology firms are also experimenting with alternative model paradigms. In late July, TikTok owner ByteDance introduced Seed Diffusion Preview, a diffusion language model it claimed achieved speeds five times faster than comparable autoregressive models.

  • Ant Group open-sourced dInfer, an AI inference framework for diffusion language models.

  • dInfer is claimed to be up to 10 times faster than Nvidia’s Fast-dLLM and three times faster than vLLM from the University of California, Berkeley.

  • The framework helps address the high computational cost typically associated with diffusion language models.


Source: SCMP

As technology advances and has a greater impact on our lives than ever before, being informed is the only way to keep up.  Through our product reviews and news articles, we want to be able to aid our readers in doing so. All of our reviews are carefully written, offer unique insights and critiques, and provide trustworthy recommendations. Our news stories are sourced from trustworthy sources, fact-checked by our team, and presented with the help of AI to make them easier to comprehend for our readers. If you notice any errors in our product reviews or news stories, please email us at editorial@tech360.tv.  Your input will be important in ensuring that our articles are accurate for all of our readers.

Tech360tv is Singapore's Tech News and Gadget Reviews platform. Join us for our in depth PC reviews, Smartphone reviews, Audio reviews, Camera reviews and other gadget reviews.

  • YouTube
  • Facebook
  • TikTok
  • Instagram
  • Twitter
  • LinkedIn

© 2021 tech360.tv. All rights reserved.

bottom of page