top of page

DeepSeek Unveils Multimodal AI for Efficient Text Compression

  • Writer: tech360.tv
    tech360.tv
  • 11 hours ago
  • 2 min read

DeepSeek, a Hangzhou-based artificial intelligence start-up, released a new open-source multimodal AI model on Monday. Named DeepSeek-OCR, the model processes large documents with significantly fewer tokens by utilising visual perception to compress information.


Blue and black abstract background featuring a white whale logo and the text "deepseek" in white. Bold and modern design.
Credit: Cath Virginia / The Verge

DeepSeek-OCR, available on online developer platforms Hugging Face and GitHub, emerged from an investigation into vision encoders' role in compressing text for large language models. This approach enables LLMs to process extensive text without a proportional increase in computing costs.


The company stated DeepSeek-OCR achieved significant token reduction, between seven and 20 times, for different historical context stages. This offers a promising direction for addressing long-context challenges within LLMs.


This release continues DeepSeek's efforts to enhance AI model efficiency and reduce building and usage costs. The organisation followed this principle in developing its open-source V3 and R1 models, released in December and February, respectively.


Graph with purple and blue bars showing precision vs. text tokens per page. Right chart features encoder performance on average vision tokens.
Credit: GitHub

DeepSeek-OCR comprises two main components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. DeepEncoder serves as the model's core engine, maintaining low activation under high-resolution inputs while achieving strong compression ratios.


The decoder, a Mixture-of-Experts model with 570 million parameters, reconstructs the original text. Its architecture divides the model into separate sub-networks, or experts, that specialise in a subset of the input data to jointly perform a task.


Beyond standard vision tasks like image captioning and object detection, DeepSeek-OCR parses highly structured visual content. This includes tables, formulas, and geometric diagrams, benefiting applications in finance and science.


Benchmark tests showed DeepSeek-OCR achieved 97% decoding accuracy when text tokens were within ten times the size of visual tokens, indicating a compression ratio below 10x. Even at a 20x ratio, the model maintained around 60% accuracy, preserving information despite extreme compression.


On OmniDocBench, a benchmark for diverse document understanding, DeepSeek-OCR surpassed major OCR models, including GOT-OCR 2.0 and MinerU 2.0. It accomplished this while using far fewer tokens.


The new model can also generate over 200,000 pages of training data daily on a computing system powered by a single Nvidia A100-40G graphics processing unit.


DeepSeek-OCR enables users to handle scalable ultra-long context processing. This system preserves recent content at high resolution, while older contexts consume fewer computing resources.


This suggests DeepSeek-OCR could facilitate theoretically unlimited context architectures, balancing information retention with efficiency. In late September, the company launched DeepSeek V3.2-Exp, an experimental version of its V3 model.


DeepSeek V3.2-Exp improves training and inference efficiency, while sharply reducing application programming interface costs.

  • DeepSeek released DeepSeek-OCR, a new open-source multimodal AI model, on Monday.

  • The model uses visual perception to compress text input, significantly reducing tokens for large language models.

  • DeepSeek-OCR improves AI model efficiency, lowers computing costs, and enhances long-context processing.


Source: SCMP

As technology advances and has a greater impact on our lives than ever before, being informed is the only way to keep up.  Through our product reviews and news articles, we want to be able to aid our readers in doing so. All of our reviews are carefully written, offer unique insights and critiques, and provide trustworthy recommendations. Our news stories are sourced from trustworthy sources, fact-checked by our team, and presented with the help of AI to make them easier to comprehend for our readers. If you notice any errors in our product reviews or news stories, please email us at editorial@tech360.tv.  Your input will be important in ensuring that our articles are accurate for all of our readers.

Tech360tv is Singapore's Tech News and Gadget Reviews platform. Join us for our in depth PC reviews, Smartphone reviews, Audio reviews, Camera reviews and other gadget reviews.

  • YouTube
  • Facebook
  • TikTok
  • Instagram
  • Twitter
  • LinkedIn

© 2021 tech360.tv. All rights reserved.

bottom of page