DeepSeek-OCR: Contexts Optical Compression

AI-generated keywords: DeepSeek-OCR

AI-generated Key Points

  • DeepSeek-OCR's deep parsing abilities allow it to analyze images within documents through secondary model calls
  • The model can extract structured information from various types of images such as charts, natural images, chemical formulas, and geometric figures with just one unified prompt
  • DeepSeek-OCR showcases versatility by performing deep parsing on financial charts, natural images, chemical formulas, and planar geometric figures
  • It demonstrates impressive multilingual recognition proficiency by handling nearly 100 languages present in PDF documents on the internet
  • The adaptability of DeepSeek-OCR to different languages emphasizes its utility in processing multilingual data for LLM/VLM pretraining
  • Its practical performance is highlighted by generating training data at scale for LLMs/VLMs with high OCR accuracy compared to existing models like GOT-OCR2.0 and MinerU2.0
  • Overall, DeepSeek-OCR's capabilities make it a promising tool for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Haoran Wei, Yaofeng Sun, Yukun Li

License: CC BY 4.0

Abstract: We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs. Beyond this, DeepSeek-OCR also demonstrates high practical value. On OmniDocBench, it surpasses GOT-OCR2.0 (256 tokens/page) using only 100 vision tokens, and outperforms MinerU2.0 (6000+ tokens per page on average) while utilizing fewer than 800 vision tokens. In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G). Codes and model weights are publicly accessible at http://github.com/deepseek-ai/DeepSeek-OCR.

Submitted to arXiv on 21 Oct. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2510.18234v1

, , , , In a recent study, the capabilities of DeepSeek-OCR were explored for its deep parsing abilities. This allows the model to analyze images within documents through secondary model calls. With this feature, known as "deep parsing," the model can extract structured information from various types of images such as charts, natural images, chemical formulas, and geometric figures with just one unified prompt. The versatility of DeepSeek-OCR is showcased through its ability to perform deep parsing on financial charts, natural images in books and articles, chemical formulas in STEM documents, and planar geometric figures. This highlights its potential applications in diverse fields. Furthermore, DeepSeek-OCR demonstrates impressive multilingual recognition proficiency by handling nearly 100 languages present in PDF documents on the internet. This capability is crucial for training Large Language Models (LLMs) as it supports both layout and non-layout OCR formats for languages like Arabic and Sinhala. The adaptability of DeepSeek-OCR to different languages further emphasizes its utility in processing multilingual data for LLM/VLM pretraining. Additionally, its practical performance is highlighted by its ability to generate training data at scale for LLMs/VLMs. By achieving high OCR accuracy with minimal vision tokens on OmniDocBench compared to existing models like GOT-OCR2.0 and MinerU2.0, DeepSeek-OCR proves its efficiency in producing quality training data efficiently. Overall,<Organization>'s deep parsing capabilities, multilingual recognition proficiency, and practical performance make it a promising tool for various research areas such as historical long-context compression and memory forgetting mechanisms in LLMs. Its accessibility through publicly available codes and model weights further enhances its value for researchers seeking advanced optical mapping solutions.
Created on 23 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.