China's Meituan Trains Largest AI Model on Local Hardware

tech360.tv
4 hours ago
3 min read

China's food delivery giant Meituan has introduced what it asserts is the country's largest artificial intelligence model trained entirely on domestically developed hardware. This development signals a strategic shift beyond merely using local chips for AI model inference.

AI chip on a dense circuit board, with glowing gold contacts and black components in a futuristic close-up. — Credit: UNSPLASH

The Beijing based on demand service organisation recently open sourced LongCat 2.0, a new large language model, or LLM. This model features 1.6 trillion parameters and a context window of 1 million tokens, a scale comparable to DeepSeek's V4 pro model, which became available earlier this year. Meituan claims LongCat 2.0 stands as the industry's initial trillion parameter model to achieve complete process training and inference on a 50,000 card domestic computing power cluster.

But, unlike DeepSeek V4 pro, which utilised domestically developed chips exclusively for inference, LongCat 2.0 employed local hardware for both inference and the more intensive pre training phase, according to Meituan. Pre training is a computationally demanding process where an AI model consumes extensive data sets to discern fundamental patterns.

Flowchart showing LongCat SFT feeding Agent, Reasoning, and Interaction Experts into MOPD, ending at LongCat 2.0 unicorn mascot. — Credit: LongCat

Meituan stated LongCat 2.0 was constructed entirely on "large scale clusters of tens of thousands of AI ASIC superpods." This demonstrated its capacity to perform advanced training on alternative hardware platforms. An ASIC, or application specific integrated circuit, constitutes a chip specifically configured for certain tasks, contrasting with a general purpose processor.

While Meituan did not specify its hardware supplier, the company indicated in a WeChat announcement that it used the Huawei Collective Communication Library, or HCCL, to improve training stability. HCCL operates as an interchip communication system, similar to the Nvidia Collective Communication Library, or NCCL. Neither Meituan nor Huawei Technologies provided immediate statements regarding the matter.

China's domestically developed AI chips have seen wide adoption for model inference, aligning with Beijing's policy for technological self reliance. And yet, local hardware has often been deemed inadequate for LLM pre training, with most Chinese models developed on domestic hardware remaining small or confined to multimodal applications.

Meituan's announcement garnered immediate attention from industry observers. Tech analyst TP Huang commented on social media platform X that this development alleviated concerns about Atlas 950 SuperPoDs being unable to train large LLMs for entities such as Zhipu AI and DeepSeek. These computing cluster products were unveiled by Huawei previously.

Hanchi Sun, a computer science PhD student focusing on LLM research at Lehigh University, also noted on X the "near frontier performance, trained on 50k Chinese domestic accelerators," describing it as a first for such an achievement. LongCat 2.0 reportedly demonstrates strong capabilities in coding and agentic tasks, Meituan reported.

The model outperformed Google's older Gemini 3.1 Pro on certain benchmarks, including Terminal Bench 2.1 and SWE Bench Pro, Meituan indicated. However, it recognised that LongCat 2.0 still trails global advanced models such as OpenAI's GPT 5.5 and Anthropic's Claude 4.8 Opus. The new Meituan model has not yet undergone evaluation on prominent benchmarks including Artificial Analysis and Arena, nor on advanced tests like Agents' Last Exam and CyberGym.

But, despite this milestone, Meituan acknowledged ongoing challenges in replacing Western graphics processing units, or GPUs. The company noted in a technical report published with the launch that "compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed."

Pre training the model on a cluster exceeding 50,000 chips posed "significant challenges at a system level due to both model and cluster scale," the company revealed, identifying memory as the "primary bottleneck." Meituan further stated that its domestic accelerators hold considerably less memory per device than Nvidia's H800 chip, an item banned from export to China under US regulations. So, to address these limitations, the delivery firm dedicated substantial effort to establishing a stable, secure, and scalable infrastructure, implementing various optimisations.

Meituan released LongCat 2.0, an AI model trained entirely on domestically developed Chinese hardware.
The model features 1.6 trillion parameters and a 1 million token context window.
LongCat 2.0 used domestic hardware for both inference and the more demanding pre training phase.
Industry analysts have acknowledged this achievement as significant for China's AI development.
Challenges persist regarding software ecosystem maturity and memory constraints compared to Western GPUs.

Source: SCMP