Scalable Delphi

Large language models for structured risk estimation

Calibrated, auditable risk estimates through structured deliberation — scalable to hundreds of parameters and orders of magnitude cheaper and faster than traditional expert elicitation.

📄 Read the preprint: arXiv:2602.08889

The problem

Risk models for high-stakes domains (cybersecurity, nuclear safety, aviation) need probability estimates for quantities that can't be directly measured. The Delphi method — iterative expert elicitation with anonymized feedback — is the gold standard. But a single study can take months to years and many highly qualified experts, placing rigorous risk assessment out of reach for most applications.

The idea

Scalable Delphi adapts the classical Delphi protocol for LLM agents: diverse expert personas estimate independently, a mediator synthesizes anonymized feedback, and panelists refine their estimates through structured deliberation.

Expert panel

Diverse LLM personas estimate independently

→

Mediation

Anonymized feedback & rationale summary

→

Refinement

Revised estimates after deliberation

Evidence

93%

Combined correlation with ground truth across three benchmarks

Expert

Closer to human experts than experts are to each other.

Minutes

vs. months for traditional Delphi

We tested Scalable Delphi on cybersecurity risk estimation — predicting task difficulty across three benchmarks and comparing estimates to independent human expert panels.

Calibration

Predicted vs. actual difficulty (%). Each dot is one task. Dashed line = perfect prediction.

Expert alignment

LLM panels compared to two independent human expert panels. Tasks ordered by difficulty.

Paper

Scalable Delphi: Large Language Models for Structured Risk Estimation

Tobias Lorenz, Mario Fritz · CISPA Helmholtz Center for Information Security · 2026

LLM panels running the Delphi protocol achieve strong calibration (r=0.87–0.95) against benchmark ground truth and align closely with human expert panels, reducing elicitation time from months to minutes.

Read the paper arXiv

BibTeX

@misc{lorenz2026scalabledelphi,
    title         = {Scalable Delphi: Large Language Models for Structured Risk Estimation},
    author        = {Tobias Lorenz and Mario Fritz},
    year          = {2026},
    eprint        = {2602.08889},
    archivePrefix = {arXiv},
    primaryClass  = {cs.AI},
    url           = {https://arxiv.org/abs/2602.08889}
}