AeroBench Leaderboard
95 real-world EASA Form 1 and FAA 8130-3 documents with multi-model verified ground truth. Benchmark your aviation document extraction system against the industry standard.
Leaderboard
| # | Model | Organization | |||||
|---|---|---|---|---|---|---|---|
| 1 | BackToBirth v1 Claude Sonnet 4 | CodesDevs | 95.8% | 94.6% | 47.1% | 3% | 2026-02-19 |
Want to submit your results? Download the dataset and open a PR with your benchmark results.
Per-Field Breakdown
Safety-critical field — errors in these fields can result in unairworthy parts entering aircraft.
Why AeroBench Matters
Aviation Safety
A single character error in a part number can put an unairworthy component on an aircraft. MROs manually transcribe thousands of release certificates monthly.
No Existing Benchmark
Until AeroBench, there was no public dataset for evaluating extraction accuracy on aviation release certificates. Vendors claimed accuracy with no standard to measure against.
False Accept Rate
The most dangerous metric: extracting the wrong value with high confidence. A system that's 95% accurate but doesn't flag its errors is worse than one at 90% that knows when it's unsure.
Annotation Methodology
Benchmark Your System
Download the dataset, run your extraction, and submit your results to the leaderboard.