A training dataset with a few poisoned records feeds into a model; the result carries a certified seal reading 'provably robust — poisoning can't change the prediction.'
RESEARCH THEME · 3 PAPERS

Certified Training

Robustness guarantees against data poisoning.

A model is shaped by every data point it is trained on, and training data is the least controlled part of the ML pipeline: scraped, crowdsourced, or bought in. We build methods that prove a model's predictions could not have been changed by data poisoning — any bounded manipulation of the examples it's trained on.

Whoever can poison the training data can change the model itself: a few poisoned data points bake the attacker's influence into every prediction that follows. And poisoning already happens in the wild.

Empirical defenses keep getting bypassed. And we showed that even the safeguards built to certify a model's robustness can be disabled by poisoning its training data (AISec 2023). So the guarantee has to start at the data.

Conventional robustness certification checks that a model resists tampered inputs, but assumes the data it learned from is clean. We drop that assumption and ask the harder question: given a bounded budget for poisoning the training data, can we prove the model's prediction stays the same? The guarantee then holds end-to-end, from training data to final prediction.

An attacker has countless ways to poison the training data, so we cannot check them one by one. Instead, we compute a set guaranteed to contain every outcome the poisoning could produce, then check whether the prediction can change within it. If it can't, the prediction is certified.

FullCert (GCPR 2024) was the first to push such bounds through the entire process — training and inference together — for a deterministic, end-to-end guarantee. It represents every number as an interval, a guaranteed lower and upper bound, and ships BoundFlow, an open-source library for training on bounded data.

Intervals are sound but loose, and the looseness compounds at every training step. MIBP-Cert (NeurIPS 2025) instead encodes each step as a mixed-integer program that preserves the exact relationships between quantities: the bounds still grow, but far more slowly, staying tight enough to certify cases where interval bounds are too loose. The same formulation reaches threat models intervals can't express, such as discrete or structured perturbations.

  • Prove that a prediction was untouched by poisoning: no change to the training data within a set budget could have flipped it.
  • An end-to-end, deterministic proof, covering both poisoned training data and an adversarial input.
  • Insights into training dynamics and worst-case influence of data changes.

We establish feasibility on small models and datasets. Scaling to larger models is an open problem.

Core
MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs
2025·Lorenz, Kwiatkowska, Fritz·paper·NeurIPS·arXiv·poster·project

Exact mixed-integer bounds per training step: tighter, stable certified training and structured threat models.

Foundation
FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks
2024·Lorenz, Kwiatkowska, Fritz·paper·GCPR·arXiv·code·project

The first end-to-end certifier with deterministic bounds across training and inference, plus the BoundFlow library.

Motivation
Certifiers Make Neural Networks Vulnerable to Availability Attacks
2023·Lorenz, Kwiatkowska, Fritz·paper·AISec·arXiv·project

Certified systems' abstain-and-fallback behavior is itself an attack surface that poisoning can trigger on command.