Scalable Delphi: evidence flows into an LLM expert panel, producing structured probability estimates

Scalable Delphi

Large language models for structured risk estimation

Calibrated, auditable risk estimates through structured deliberation — scalable to hundreds of parameters and orders of magnitude cheaper and faster than traditional expert elicitation.

Risk models for high-stakes domains (cybersecurity, nuclear safety, aviation) need probability estimates for quantities that can't be directly measured. The Delphi method — iterative expert elicitation with anonymized feedback — is the gold standard. But a single study can take months to years and many highly qualified experts, placing rigorous risk assessment out of reach for most applications.

Scalable Delphi adapts the classical Delphi protocol for LLM agents: diverse expert personas estimate independently, a mediator synthesizes anonymized feedback, and panelists refine their estimates through structured deliberation.

01
Expert panel
Diverse LLM personas estimate independently
02
Mediation
Anonymized feedback & rationale summary
03
Refinement
Revised estimates after deliberation
93%
Combined correlation with ground truth across three benchmarks
Expert
Closer to human experts than experts are to each other.
Minutes
vs. months for traditional Delphi

We tested Scalable Delphi on cybersecurity risk estimation — predicting task difficulty across three benchmarks and comparing estimates to independent human expert panels.

Calibration
Predicted vs. actual difficulty (%). Each dot is one task. Dashed line = perfect prediction.
Expert alignment
LLM panels compared to two independent human expert panels. Tasks ordered by difficulty.
Tobias Lorenz, Mario Fritz · CISPA Helmholtz Center for Information Security · 2026

LLM panels running the Delphi protocol achieve strong calibration (r=0.87–0.95) against benchmark ground truth and align closely with human expert panels, reducing elicitation time from months to minutes.

BibTeX
@misc{lorenz2026scalabledelphi,
    title         = {Scalable Delphi: Large Language Models for Structured Risk Estimation},
    author        = {Tobias Lorenz and Mario Fritz},
    year          = {2026},
    eprint        = {2602.08889},
    archivePrefix = {arXiv},
    primaryClass  = {cs.AI},
    url           = {https://arxiv.org/abs/2602.08889}
}