DeepSeek Founder Proposes AI Training Method to Bypass GPU Limits
- tech360.tv
- 1 hour ago
- 2 min read
A new artificial intelligence (AI) model training technique, co-authored by DeepSeek founder Liang Wenfeng and Peking University researchers, aims to bypass graphics processing unit (GPU) memory constraints. This development underscores the Hangzhou start-up’s focus on maximising cost efficiency amid a computational power deficit compared to US industry leaders.

The technical paper introduces a "conditional memory" technique called Engram, designed to address a key bottleneck in scaling AI models: the limited capacity of GPU high-bandwidth memory (HBM). Existing large language models (LLMs) often waste valuable sequential depth on trivial operations, which could be used for higher-level reasoning.
HBM represents one of China’s biggest AI hardware gaps with the US. According to Ray Wang, a Seoul-based analyst at SemiAnalysis, China’s memory champion ChangXin Memory Technologies (CXMT) lags several years behind industry leaders such as South Korea’s Samsung Electronics, SK Hynix, and Micron Technology of the US.
Engram addresses this by "decoupling" compute and memory, enabling models to "look up" basic information more efficiently. This new technique also purports to improve models’ efficiency in handling long context, a significant challenge for turning AI chatbots into useful AI agents.
Researchers validated Engram in a 27 billion parameter model, finding it boosted performance on major industry benchmarks by several percentage points. Crucially, it also left more capacity for the model to perform computationally demanding complex reasoning.
They envision conditional memory as "an indispensable modelling primitive for next-generation sparse models," likening its potential impact to their own variant of the Mixture-of-Experts technique. This technique allowed scaling of model size without proportional increases in compute.
DeepSeek has been a poster child for China’s AI innovation over the past year. Elie Bakouch, a research engineer at open-source developer platform Hugging Face, praised the paper for validating the technique "with hardware at inference and training."
The paper lists 14 co-authors, including Huishuai Zhang, an assistant professor of computer science at Peking University and a former researcher at Microsoft Research Asia. The lead author is Cheng Xin, a Peking University student who also contributed to DeepSeek’s landmark V3 and R1 models.

Anticipation is high for a new major DeepSeek model on the one-year anniversary of the release of its R1 model. US tech media outlet The Information reported on Friday that DeepSeek was expected to launch a new V4 model with strong coding capabilities in mid-February.
DeepSeek founder Liang Wenfeng co-authored a paper proposing Engram, an AI training technique.
Engram aims to bypass GPU memory constraints and improve efficiency in handling long context.
The technique was validated in a 27 billion parameter model, showing performance boosts and more capacity for complex reasoning.
Source: SCMP