游乐游手机版
首页/AI热点日报/热点详情

olmOCR-7B:高效开源文档提取专用模型

类型:热点整理2026-07-04
文档提取始终是AI处理领域的一大难题——PDF与扫描图像中的内容看似简单,但想要干净利落地还原为纯文本,常常会出现各种状况。olmOCR-7B的发布,为这一领域带来了突破性进展。它基于Qwen2-VL-7B-Instruct模型,在25万页数据集上进行了针对性微调,核心目标就是:将PDF和文档图像高

文档提取始终是AI处理领域的一大难题——PDF与扫描图像中的内容看似简单,但想要干净利落地还原为纯文本,常常会出现各种状况。olmOCR-7B的发布,为这一领域带来了突破性进展。它基于Qwen2-VL-7B-Instruct模型,在25万页数据集上进行了针对性微调,核心目标就是:将PDF和文档图像高效转换为清晰、结构化的纯文本。下面我们将深入解析其强大之处,并与现有主流工具进行对比,看看它的实际优势究竟有多大。

olmOCR-7B:文档提取专用模型

olmOCR-7B:领先的PDF转文本与文档提取模型

语言模型始终依赖于纯文本进行训练、推理和服务,文本质量直接决定了最终效果。噪声文本会导致训练不稳定、模型性能下降,甚至对用户请求输出混乱的结果。但关键问题是,大量有价值的数据并非以干净的网页格式存在,而是隐藏在PDF等电子文档中。PDF的设计初衷是在固定大小的页面上渲染内容,而非保留逻辑文本结构,因此解析起来极为棘手:字符编码、位置信息、格式标记交织在一起,要正确恢复标题、段落、表格、方程,并按阅读顺序排列,难度相当大。

1、olmOCR模型核心能力概述

为了攻克这一难题,研究团队推出了olmOCR——一个专门将PDF和文档图像转换为干净结构化纯文本的高性能工具包。它究竟有哪些与众不同之处?

  • 卓越性能:在25万页数据集上完成微调,数据来源涵盖各类PDF,包括数字原生文档和公共领域书籍扫描件。覆盖范围广泛,提取准确度有充分保障。
  • 极致成本效益:使用olmOCR工具包处理一百万页PDF,成本仅约190美元,相当于使用GPT-4o API批量处理相同页面的三十二分之一。这一成本优势足以让许多团队重新评估API方案。
  • Markdown输出格式:输出采用Markdown格式,解析和处理非常便捷。能够应对方程、表格、手写文字,并准确按照正确阅读顺序处理最复杂的多列布局。
  • 开箱即用体验:针对SGLang和vLLM推理引擎进行了全面优化,从单个GPU到数百个GPU均可高效扩展,内置了应对常见解析失败和元数据错误的启发式方法。
  • 完全开源透明:基于Qwen2-VL-7B-Instruct构建,模型权重、微调数据集、训练和推理代码全部开源,没有任何保留。

下面我们直接拿olmOCR与其他几款主流文档提取工具进行对比,看看实际效果差距究竟有多大。

2、olmOCR vs. 其他文档提取工具:实测对比

通过样本文档对比,重点关注处理质量的关键差异。点击不同标签可查看各工具的输出结果。

2.1 手写信件识别

olmOCR

Executive Mansion,

Washington City,

January 15th, 1864

Major General Hitchcock, Commissioner of Exchanges, is authorized and directed to offer Brigadier General Trimble, now a prisoner of war in Fort McHenry, in exchange for Major White, who is held as a prisoner at Richmond. He is also directed to send forward the offer of exchange by Henry M. Warfield, Esq. of Baltimore, under a flag of truce, and give him a pass to City Point.

Abraham Lincoln

Marker

necuhve Mansion Vastington amany layor Seneral Hitchcocks Commissioner of Cachanges, is anthonged and directed to offer Bingadier General Trin prisoner of war in Fort Inctienny, in exchange now w Major White, who is held as a preises at Richmond Ite is also directed to vand forwards the offer of exchange by Stenny in. Warfield, Eag. of Baltimore, under aflag 11 mice, and give him apass to tity Point. Abrakan Sincolus 

GOT OCR 2.0

43571
Bachington City
January 10th 1864.
Major General Architect, Commissioner of aivachangera
is authorized and directed by ffeed Bngader General Trelmble,
new a firemen of war in Fert nchery in exchange for
Mayor White, who held a a firemen at Hillmannd.
He is aker conducted by end forward the offer of exchange
by Henry in. Warfield, Lag. of Balthmore, under a flag
of three, and five him afaies to City Bink.
Abraham Lincoln

MinerU

No text detected.

2.2 数学教科书文本提取

一本包含方程式的微积分教科书,测试工具对数学公式的还原能力。

olmOCR

3.4 EXERCISES

For the following exercises, the given functions represent the position of a particle tra veling along a horizontal line.

a. Find the velocity and acceleration functions.

b. Determine the time intervals when the object is slowing down or speeding up.

150. ( s(t) = 2t^3 - 3t^2 - 12t + 8 )

151. ( s(t) = 2t^3 - 15t^2 + 36t - 10 )

152. ( s(t) = rac{t}{1 + t^2} )

153. A rocket is fired vertically upward from the ground. The distance s in feet that the rocket tra vels from the ground after t seconds is given by ( s(t) = -16t^2 + 560t ).

a. Find the velocity of the rocket 3 seconds after being fired.

b. Find the acceleration of the rocket 3 seconds after being fired.

154. A ball is thrown downward with a speed of 8 ft/s from the top of a 64-foot-tall building. After t seconds, its height above the ground is given by ( s(t) = -16t^2 - 8t + 64 ).

a. Determine how long it takes for the ball to hit the ground.

b. Determine the velocity of the ball when it hits the ground.

155. The position function ( s(t) = t^2 - 3t - 4 ) represents the position of the back of a car backing out of a driveway and then driving in a straight line, where s is in feet and t is in seconds. In this case, ( s(t) = 0 ) represents the time at which the back of the car is at the garage door, so ( s(0) = -4 ) is the starting position of the car, 4 feet inside the garage.

a. Determine the velocity of the car when ( s(t) = 0 ).

b. Determine the velocity of the car when ( s(t) = 14 ).

156. The position of a hummingbird flying along a straight line in t seconds is given by ( s(t) = 3t^3 - 7t ) meters.

a. Determine the velocity of the bird at ( t = 1 ) sec.

b. Determine the acceleration of the bird at ( t = 1 ) sec.

c. Determine the acceleration of the bird when the velocity equals 0.

157. A potato is launched vertically upward with an initial velocity of 100 ft/s from a potato gun at the top of an 85-foot-tall building. The distance in feet that the potato tra vels from the ground after t seconds is given by ( s(t) = -16t^2 + 100t + 85 ).

a. Find the velocity of the potato after 0.5 s and 5.75 s.

b. Find the speed of the potato at 0.5 s and 5.75 s.

c. Determine when the potato reaches its maximum height.

d. Find the acceleration of the potato at 0.5 s and 1.5 s.

e. Determine how long the potato is in the air.

f. Determine the velocity of the potato upon hitting the ground.

158. The position function ( s(t) = t^3 - 8t ) gives the position in miles of a freight train where east is the positive direction and t is measured in hours.

a. Determine the direction the train is tra veling when ( s(t) = 0 ).

b. Determine the direction the train is tra veling when ( a(t) = 0 ).

c. Determine the time intervals when the train is slowing down or speeding up.

159. The following graph shows the position ( y = s(t) ) of an object moving along a straight line.

a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.

b. Sketch the graph of the velocity function.

c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.

d. Determine the time intervals when the object is speeding up or slowing down.

Marker

- a. Determine the direction the train is tra veling when *s*(*t*) = 0.
- b. Determine the direction the train is tra veling when *a*(*t*) = 0.
- c. Determine the time intervals when the train is slowing down or speeding up.

159. The following graph shows the position *y* = *s*(*t*) of an object moving along a straight line.

![](_page_0_Figure_34.jpeg)

- negative, or zero. b. Sketch the graph of the velocity function.
- c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
- d. Determine the time intervals when the object is speeding up or slowing down.

GOT OCR 2.0

Chapter 3 | Derivatives
273
3.4 EXERCISES
For the following exercises, the given functions represent
the position of a particle tra veling along a horizontal line.
a.
Find the velocity and acceleration functions.
b.
Determine the time intervals when the object is
slowing down or speeding up.
150.
s(t) = 2t3 −3t2 −12t + 8
151.
s(t) = 2t3 −15t2 + 36t −10
152.
s(t) =
t
1 + t2
153.
A rocket is fired vertically upward from the ground.
The distance s in feet that the rocket tra vels from the
ground after t seconds is given by s(t) = −16t2 + 560t.
a.
Find the velocity of the rocket 3 seconds after being
fired.
b.
Find the acceleration of the rocket 3 seconds after
being fired.
154.
A ball is thrown downward with a speed of 8 ft/
s from the top of a 64-foot-tall building. After t seconds,
its height above the ground is given by s(t) = −16t2 −8t + 64.
a.
Determine how long it takes for the ball to hit the
ground.
b.
Determine the velocity of the ball when it hits the
ground.
155.
The position function s(t) = t2 −3t −4 represents
the position of the back of a car backing out of a driveway
and then driving in a straight line, where s is in feet and
t is in seconds. In this case, s(t) = 0 represents the time
at which the back of the car is at the garage door, so
s(0) = −4 is the starting position of the car, 4 feet inside
the garage.
a.
Determine the velocity of the car when s(t) = 0.
b.
Determine the velocity of the car when s(t) = 14.
156.
The position of a hummingbird flying along a straight
line in t seconds is given by s(t) = 3t3 −7t
2
2
2
...
2
2
2
a.
Use the graph of the position function to determine
the time intervals when the velocity is positive,
negative, or zero.
b.
Sketch the graph of the velocity function.
c.
Use the graph of the velocity function to determine
the time intervals when the acceleration is positive,
negative, or zero.
d.
Determine the time intervals when the object is
speeding up or slowing down.
157.
A potato is launched vertically upward with an initial
velocity of 100 ft/s from a potato gun at the top of an
85-foot-tall building. The distance in feet that the potato
tra vels from the ground after t seconds is given by
s(t) = −16t2 + 100t + 85.
a.
Find the velocity of the potato after 0.5 s and
5.75 s.
b.
Find the speed of the potato at 0.5 s and 5.75 s.
c.
Determine when the potato reaches its maximum
height.
d.
Find the acceleration of the potato at 0.5 s and 1.5
s.
e.
Determine how long the potato is in the air.
f.
Determine the velocity of the potato upon hitting
the ground.
158.
The position function s(t) = t3 −8t gives the
position in miles of a freight train where east is the positive
direction and t is measured in hours.
a.
Determine the direction the train is tra veling when
s(t) = 0.
b.
Determine the direction the train is tra veling when
a(t) = 0.
c.
Determine the time intervals when the train is
slowing down or speeding up.
159.
The following graph shows the position y = s(t) of
an object moving along a straight line.
155.
The position of a hummingbird flying along a straight
line in t seconds is given by s(t) = 3t3 −7t
2
2 3
3
3
.....
125.5
126
126.5

MinerU

a. Determine the direction the train is tra veling when $s(t)=0$ .  
b. Determine the direction the train is tra veling when $a(t)=0$ .
c. Determine the time intervals when the train is slowing down or speeding up.

159. The following graph shows the position $y=s(t)$ of an object moving along a straight line.

a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.
b. Sketch the graph of the velocity function.
c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
d. Determine the time intervals when the object is speeding up or slowing down.

2.3 历史文档还原

一份古老的历史文献,文字褪色且光照条件不佳,考验模型的抗干扰能力。

olmOCR

Christians beha ving themselves like Mahomedans.

4. The natives soon had reason to suspect the viceroy's sincerity in his expressions of regret at the proceedings of which they complained. For about this time the Dominican friars, under pretence of building a convent, erected a fortress on the island of Solor, which, as soon as finished, the viceroy garrisoned with a strong force. The natives very naturally felt indignant at this additional encroachment, and took every opportunity to attack the garrison. The monks, forgetful of their peaceable profession, took an active part in these skirmishes, and many of them fell sword in hand.

The Mahomedan faith has been appropriately entitled, The religion of the sword; and with equal propriety may we so designate the religion of these belligerent friars. The Portuguese writers give an account of one of their missionaries, Fernando Vinagre, who was as prompt in the field of battle as at the baptismal font. This man, though a secular priest, undertook the command of a squadron that was sent to the assistance of the rajah of Tidore, on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour, at another in a surplice; and even occasionally, baptizing the converts of his sword without putting off his armour, but covering it with his ecclesiastical vest. In this crusade he had two

Maker

## **IN INDIA *** BOOK TI. S69
Christians beha ving themselves like Ma borne- a. dans.3 . extquotedblleft5/0-
*t>.*

The natives soon had reason to suspect the viceroy, viceroy's sincerity in his expressions of regret at the proceedings of which they complained. extquotedblleft n. extquotedblleft' For about this time the Dominican friars, under pretence of building a. convent, erected a fortress on the island of Sol or, which, as soon as finished, the viceroy garrisoned with a strong force. The natives' very naturally felt indig-S nant at this additional encroachment, and took every opportunity to attack the garrison. The monks, forgetful/ of their peaceable profession, took an active part in these skirmishes, and many of tbg.tr fell sword in hand.

The i'lfinomedan faith has been appropriately entitled., extquoteleft The religion of the sword extquoteright,; and with equal propriety may we so designate the re- . i'gv.m of these belligerent friars. The Portugu writers give an account of one of their extquoteleft missionaries, extquoteright Fernando Vinagre, who was as prompt in the field of battle as at the baptismal font. This man, though a secular priest, undertook the command of a squadron that was I sent to the assistance of the rajah of Tidore,4 on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour, ; at another in a surplice; and even occasionally, baptizing the converts of his sword without putting off his armour, but covering it with his ecclesiastical vest. In this crusade5 he had two
> 3 Geddes History, &c., pp. 24---27. Pudet hae c opprobria nobis Vel dici potuisse.
> 4 Called extquoteleft T a d u ra extquoteright or extquoteleft D a c o, extquoteright an island in the Indian Ocean, one of the Moluccas
> 5 extquoteleft These extquoteleft a la D ra g o o n extquoteright conversions. extquoteright Geddes' History, p. 27.

GOT OCR 2.0

 IN INDIA:  BOOK U 269 Christians beha ving themselves like Mahome-  1670.  4. The natives son had reason to suspect the Viceroy' s vice roy' s sincerity in his expressions of regret in s in e eri ty at the proceedings of which they complained.  fl it ars.  For about this time the Dominican f mars, under pre ten ce of building a convent, erected a for-  tress on the island of Sol or, which, as soon as finished, the vice roy garrisoned with a strong force. The natives very naturally felt indig-  nant at this additional encroachment, and took every opportunity to attack the garrison. The monks, forgetful of their peaceable profession,  took an active part in these skirmishes, and many of the n fell sword in hand.  The Mh on med an faith has been appropriately entitled. The religion of the sword; and with e ral Tropriety may we so designate the re-  gian of these belligerent friars. The Port u-  gue s writers give an account of one of their mission are s, Fer endo Vina gre, who was as prompt in the fe ld of battle as at the baptismal font. This man, though a secular priest, un-  der took the command of a squadron that was sent to the assistance of the rajah of Tidore, on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour,  at another in a surplice; and even occasionally,  baptizing the converts of his sword without put-  ting off his armour, but covering it with his ecclesiastical vest. In this crusade he had two 3 Ged des History, & c. , pp. 24-27.  P ude th aec opp rob ria nobis Vel die ipo tui sse.  Called Tadur u or Daco, an island in the Indian Ocean,  one of the Mol ucc as These a laDra goon conversions. Ged des History, p. 27.

MinerU

 ININDIASY BOOKU
Christians bcha ving.themselves like Mahome dans.3

4.The natives soon had reason to suspect ihe viceroy's sincerity in his expressions of regret at the proceedings of which they complained. For about this time the Dominican friars,under pretenceof building a convent,erected a for tress on the island of Solorwhich,as soon as finishedthe viceroy garrisoned with a strong force. The natives very naturally felt indig nant at this additional encroachment, and took every pportunity to attack the garrison.The monks,forgetful of their peaceable profession took an activa part in these skirmishes, and many of tbein feil sword in hand.

TheMahornedan faithhas been appropriately ntitled.The religion of the swordand with equal propriety may we so designate the region of these belligerent friars.The Portugueswriters give an account of one of their missionarzes,femando Vinagre,who was as prompt in the field of battle as at the baptismal font. This man, though a secular priest, undertook the command of a squadron that was sent to the assistance of the rajah of Tidore,4 on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour, at another in a surplice;and even occasionally baptizing the converts of his sword without put ting off his armour, but covering it with his ecclesiastical vest.In this crusadehe had two

3、olmOCR的构建方法

要训练olmOCR,首先必须解决高质量训练数据的获取问题。研究团队开发了一种名为“文档锚定”的技术——简单来说,就是充分利用PDF文件中自带的文本和元数据,以提升提取质量。

图1:文档锚定在典型页面上的工作原理示例

该方法会提取相关的图像位置和文本块,然后将它们拼接并插入到模型提示中。当提示视觉语言模型(VLM)获取文档的纯文本版本时,模型会同时参考锚定的文本和页面的栅格化图像。

借助文档锚定技术,团队使用GPT-4o对25万页进行了标注,数据来自网络爬取的公开PDF和互联网档案馆扫描的公共领域书籍。数据分布十分多元:60%学术论文、12%小册子、11%法律文件、6%图表、5%幻灯片、4%其他类型。

在训练模型本身时,团队对Qwen2-VL-7B-Instruct检查点进行了微调,并精心优化了大规模批处理推理管道。采用SGLang,使得olmOCR转换一百万页PDF仅需190美元——大约是GPT-4o API成本的1/32。结果不仅大幅降低了成本,在人类评估中,olmOCR也优于其他流行的OCR工具。

图2:olmOCR与其他流行工具的ELO评分对比

评估方面,团队将olmOCR的输出与Marker、MinerU、GOT-OCR 2.0进行了对比。收集了11名研究人员的成对判断,从2,017份PDF中采样并得到452次有意义的比较,计算ELO评分。olmOCR的ELO得分超过1800,显著领先所有竞争对手。在直接比较中,olmOCR在61.3%的情况下优于Marker,58.6%优于GOT-OCR,71.4%优于MinerU——生成干净、结构良好文本的能力确实突出。

更多评估细节可以查阅技术报告。

4、如何获取olmOCR

首次发布的olmOCR包含演示、模型权重、微调数据集、一份简短技术报告,以及最重要的——高效推理管道。

访问GitHub仓库可以安装olmOCR并探索文档。在有GPU的机器上,只需运行以下命令:

python -m olmocr.pipeline ./localworkspace --pdfs tests/gnarly_pdfs/horribleocr.pdf

团队计划尽快发布更多定量基准测试,以帮助开发更好的PDF提取模型并评估它们的性能。

原文链接:https://olmocr.allenai.org/blog

来源:https://www.53ai.com/news/finetuning/2025032721587.html

相关热点

继续查看同栏目近期热点。

延伸阅读

补充最近整理过的热点入口。