Scalable Delphi
Calibrated, auditable risk estimates through structured deliberation — scalable to hundreds of parameters and orders of magnitude cheaper and faster than traditional expert elicitation.
📄 Read the preprint: arXiv:2602.08889
Risk models for high-stakes domains (cybersecurity, nuclear safety, aviation) need probability estimates for quantities that can't be directly measured. The Delphi method — iterative expert elicitation with anonymized feedback — is the gold standard. But a single study can take months to years and many highly qualified experts, placing rigorous risk assessment out of reach for most applications.
Scalable Delphi adapts the classical Delphi protocol for LLM agents: diverse expert personas estimate independently, a mediator synthesizes anonymized feedback, and panelists refine their estimates through structured deliberation.
We tested Scalable Delphi on cybersecurity risk estimation — predicting task difficulty across three benchmarks and comparing estimates to independent human expert panels.
LLM panels running the Delphi protocol achieve strong calibration (r=0.87–0.95) against benchmark ground truth and align closely with human expert panels, reducing elicitation time from months to minutes.
BibTeX
@misc{lorenz2026scalabledelphi,
title = {Scalable Delphi: Large Language Models for Structured Risk Estimation},
author = {Tobias Lorenz and Mario Fritz},
year = {2026},
eprint = {2602.08889},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
url = {https://arxiv.org/abs/2602.08889}
}