DeepSeek Enhances OCR Performance with Alibaba Open-Source AI

tech360.tv
15 hours ago
2 min read

DeepSeek, a Chinese artificial intelligence startup, has launched an upgraded version of its optical character recognition, or OCR, model, DeepSeek-OCR 2. This update incorporates an open-source system developed by Alibaba Cloud to significantly boost performance.

Blue and white DeepSeek banner with text: "Explore the Unknown." Features two buttons for starting a conversation and downloading the app. — Credit: DEEPSEEK

The new model replaced a core component of its architecture with Alibaba Cloud’s lightweight Qwen2-0.5b model, according to research released by DeepSeek. This upgrade highlights the increasing importance of China’s open-source ecosystem in driving domestic AI development.

DeepSeek-OCR 2 is now open-sourced on Hugging Face, a widely used open-source AI developer platform. Benchmark tests revealed that the updated model delivered a 3.73% performance improvement over its predecessor, which DeepSeek described as a meaningful gain on an already high accuracy base.

In the original model, DeepSeek relied on Contrastive Language Image Pre-training, or CLIP. CLIP is a neural network framework from Microsoft-backed OpenAI that helps systems identify and interpret text embedded in images for OCR applications.

Chat interface with "Hi, I'm DeepSeek." at the top. Shows "Deep Think (50 messages left today) NEW" option toggled on. Blue and gray tones. — Credit: DEEPSEEK

DeepSeek stated that replacing CLIP with Alibaba’s Qwen2-0.5b enabled its OCR model to process documents in a manner that emulated human reading. The model now follows flexible yet semantically coherent scanning patterns, driven by inherent logical structures, the research noted.

This collaboration demonstrates how Chinese AI developers are increasingly leveraging each other’s open-source innovations to accelerate progress. For instance, last year, Beijing-based startup Moonshot AI launched its Kimi K2 system, which borrowed elements from DeepSeek’s V3 architecture while introducing significant redesigns, according to a company researcher.

This Kimi K2 launch resonated globally, with some experts describing it as another "DeepSeek moment," referring to the surprise impact of DeepSeek’s V3 and R1 model releases in early 2025. DeepSeek’s latest OCR update also follows academic scrutiny of its original approach.

Researchers from China, and Japan recently questioned the initial DeepSeek-OCR research, finding that the model showed inconsistent performance under certain conditions. Their study indicated that the original system’s accuracy in visual question-answering tasks could drop to approximately 20% when exposed to additional text meant to influence its reasoning.

This contrasts with roughly 90% accuracy for standard AI models in similar scenarios. DeepSeek stated in its research that it plans to continue refining its OCR architecture for broader applications, advancing towards a more comprehensive vision of multimodal intelligence.

DeepSeek upgraded its OCR model, DeepSeek-OCR 2, by integrating Alibaba Cloud’s open-source Qwen2-0.5b model.
The new model demonstrated a 3.73% performance improvement over its previous version.
DeepSeek-OCR 2 has been open-sourced on the Hugging Face developer platform.

Source: SCMP