Ubuntu本地部署dots.ocr

发布时间：2025-12-23 21:24:41编辑：123阅读（905）

基于Ubuntu 24.04.2 LTS + RTX 4070完成

dots.ocr 是一个强大的多语言文档解析器，它在一个单一的视觉-语言模型中统一了布局检测和内容识别，同时保持良好的阅读顺序。尽管其基础是紧凑的1.7B参数的大规模语言模型，但它达到了最先进的（SOTA）性能。

强大的性能： dots.ocr 在 OmniDocBench 上实现了文本、表格和阅读顺序的SOTA性能，同时在公式识别方面与更大的模型如Doubao-1.5和gemini2.5-pro相比也具有可比性。

多语言支持： dots.ocr 对低资源语言表现出强大的解析能力，在我们的内部多语言文档基准测试中，在布局检测和内容识别方面都取得了决定性的优势。

统一且简单的架构：通过利用单一的视觉-语言模型，dots.ocr 提供了一个比依赖复杂多模型流水线的传统方法更简洁的架构。任务之间的切换只需简单地改变输入提示即可实现，证明了一个VLM可以与传统的检测模型如DocLayout-YOLO相比达到竞争性的检测结果。

高效且快速的性能：基于紧凑的1.7B大规模语言模型构建，dots.ocr 比许多基于更大基础的高性能模型提供了更快的推理速度。

github地址:https://github.com/rednote-hilab/dots.ocr

Anaconda通过conda实现包管理与虚拟环境隔离，支持多版本Python环境创建及切换

安装Anaconda

sudo wget https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Linux-x86_64.sh

sudo bash Anaconda3-2024.10-1-Linux-x86_64.sh

一路yes安装完，重新进去ubuntu，前面会有个base

创建conda虚拟环境

conda create -n dots_ocr python=3.12

conda activate dots_ocr

git clone https://github.com/rednote-hilab/dots.ocr.git

cd dots.ocr

pip加速

pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/

查看pip是否生效

pip config list

安装ninja

ninja是一个编译加速的包，因为安装flash-attn需要编译，如果不按照ninja，编译速度会很慢，所以建议先安装ninja，再安装flash-attn

pip install ninja

dots.ocr项目需要torch 2.7.0，并且flash-attn v2.8.0post2最高支持torch2.7.0。因此这里更新torch为2.7.0版本

pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu128

然后去flash-attn官网下载flash_attn-2.8.0.post2

github地址：https://github.com/Dao-AILab/flash-attention/releases

注意torch版本和python版本

查看torch版本

python -c "import torch; print(torch.__version__)"

下载完成后进入dots.ocr根目录下安装

cd ..

wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.0.post2/flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

安装flash_attn-2.8.0.post2

pip install flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312-linux_x86_64.whl

安装dots.ocr编译环境

cd dots.ocr/

注释掉requirements.txt里面的flash-attn==2.8.0.post2

vim requirements.txt

pip install -e .

然后Download Model Weights，这里使用国内源

python3 tools/download_model.py --type modelscope

安装vllm框架

dots.ocr可以使用vllm框架加速推理，所以还需要部署vllm。根据dots.ocr官方说明安装vllm 0.9.1版本

pip install vllm==0.9.1

配置并启动vllm服务

设置模型路径环境变量

export hf_model_path=./weights/DotsOCR

将模型路径添加到 PYTHONPATH

export PYTHONPATH=$(dirname "$hf_model_path"):$PYTHONPATH

修改 vllm 入口脚本以注册 DotsOCR 模型

sed -i '/^from vllm\.entrypoints\.cli\.main import main$/a\

from DotsOCR import modeling_dots_ocr_vllm' `which vllm`

启动 vLLM 服务器

vllm serve ${hf_model_path} \

--tensor-parallel-size 1 \

--gpu-memory-utilization 0.85 \

--chat-template-content-format string \

--served-model-name model \

--trust-remote-code

关键字：

上一篇： Python搭建一个RAG系统(分片/检索/召回/重排序/生成)

下一篇： LangChain-1.0教程-(介绍，模型接入)



搜索

热门推荐

最新文章

LangChain 1.0-Agent中间件-实现闭环(批准-编辑-拒绝动作)
 82°
LangChain 1.0-Agent中间件-汇总消息
 89°
LangChain 1.0-Agent中间件-删除消息
 92°
LangChain 1.0-Agent中间件-消息压缩
 87°
LangChain 1.0-Agent中间件-多模型动态选择
 156°
LangChain1.0-Agent-部署/上线(开发人员必备)
 315°
LangChain1.0-Agent-Spider实战(爬虫函数替代API接口)
 360°
LangChain1.0-Agent(进阶)本地模型+Playwright实现网页自动化操作
 355°
LangChain1.0-Agent记忆管理
 334°
LangChain1.0-Agent接入自定义工具与React循环
 373°

博主信息

姓名：Run
职业：谜
邮箱：383697894@qq.com
定位：上海 · 松江

扫我打开

友情链接

百度 淘宝 腾讯 慕课网 CSDN 博客园 51cto博客