Resources · Benchmarks

Independent Annotation Benchmarks

Transparent accuracy, agreement, and throughput benchmarks across annotation tasks, data types, and industry verticals.

Methodology

How Annotation Accuracy Is Measured

Every benchmark we publish is derived from two core metrics — Precision and Recall — computed against a verified gold-standard reference set. Here is exactly how they work.

Reference Sets

What Are Gold Standard References?

A gold standard is a verified, high-confidence annotation set used as the ground truth when computing accuracy metrics. Every precision and recall figure we publish is measured against one of two reference types.

Consistency

Inter-annotator Agreement (IAA)

Accuracy tells you how close annotators are to the gold standard. IAA tells you how consistently they agree with each other. Both are required to trust a dataset — high accuracy against a noisy reference, or high agreement around a systematic error, are equally dangerous.

Speed & Scale

Throughput & Turnaround

Throughput is the volume of annotation units completed per hour. Turnaround is the calendar time from data intake to final delivery. Both are tracked continuously — and both are meaningless without the accuracy figures that accompany them.

<48 hrs
Pilot batch turnaround
100–500 items from intake to QA-cleared delivery
200+
Items / hour (classification)
Sustained rate on single-label text tasks at κ ≥ 0.82
10 days
Standard project SLA
10k-item bounding box or NER batch with two QA passes
100%
Iteration coverage
Every delivery includes a structured feedback loop window

Want to See Our Full Benchmark Data?

Request a detailed accuracy report for your annotation type, domain, and quality tier.