A training dataset with a few poisoned records feeds into a model; the result carries a certified seal reading 'provably robust — poisoning can't change the prediction.'

RESEARCH THEME · 3 PAPERS

Certified Training

Robustness guarantees against data poisoning.

A model is shaped by every data point it is trained on, and training data is the least controlled part of the ML pipeline: scraped, crowdsourced, or bought in. We build methods that prove a model's predictions could not have been changed by data poisoning — any bounded manipulation of the examples it's trained on.

Why it matters

Whoever can poison the training data can change the model itself: a few poisoned data points bake the attacker's influence into every prediction that follows. And poisoning already happens in the wild.

A chatbot receives mostly normal chat bubbles and a few poisoned ones, and outputs a corrupted message.

Coordinated users taught Microsoft's Tay chatbot to post racist content within a day of launch.

BBC News · 2016

Two visually identical images, one carrying a sparse scatter of poisoned pixels.

Nightshade lets anyone poison image-scraping models through invisible pixel edits.

MIT Technology Review · 2023

A wall of hundreds of documents with three poisoned ones, breaching a large model.

250 poisoned documents – as little as 0.00016% of training data – backdoor LLMs of any size.

Anthropic · UK AISI · Turing Institute · 2025

Empirical defenses keep getting bypassed. And we showed that even the safeguards built to certify a model's robustness can be disabled by poisoning its training data (AISec 2023). So the guarantee has to start at the data.

Extending the guarantee to training

Conventional robustness certification checks that a model resists tampered inputs, but assumes the data it learned from is clean. We drop that assumption and ask the harder question: given a bounded budget for poisoning the training data, can we prove the model's prediction stays the same? The guarantee then holds end-to-end, from training data to final prediction.

How: bound every attack at once

An attacker has countless ways to poison the training data, so we cannot check them one by one. Instead, we compute a set guaranteed to contain every outcome the poisoning could produce, then check whether the prediction can change within it. If it can't, the prediction is certified.

FullCert (GCPR 2024) was the first to push such bounds through the entire process — training and inference together — for a deterministic, end-to-end guarantee. It represents every number as an interval, a guaranteed lower and upper bound, and ships BoundFlow, an open-source library for training on bounded data.

Intervals are sound but loose, and the looseness compounds at every training step. MIBP-Cert (NeurIPS 2025) instead encodes each step as a mixed-integer program that preserves the exact relationships between quantities: the bounds still grow, but far more slowly, staying tight enough to certify cases where interval bounds are too loose. The same formulation reaches threat models intervals can't express, such as discrete or structured perturbations.

What it enables

Prove that a prediction was untouched by poisoning: no change to the training data within a set budget could have flipped it.
An end-to-end, deterministic proof, covering both poisoned training data and an adversarial input.
Insights into training dynamics and worst-case influence of data changes.

We establish feasibility on small models and datasets. Scaling to larger models is an open problem.

Papers

Core

MIBP-Cert: Certified Training against Data Perturbations with Mixed-Integer Bilinear Programs

2025·Lorenz, Kwiatkowska, Fritz·paper·NeurIPS·arXiv·poster·project

Exact mixed-integer bounds per training step: tighter, stable certified training and structured threat models.

Foundation

FullCert: Deterministic End-to-End Certification for Training and Inference of Neural Networks

2024·Lorenz, Kwiatkowska, Fritz·paper·GCPR·arXiv·code·project

The first end-to-end certifier with deterministic bounds across training and inference, plus the BoundFlow library.

Motivation

Certifiers Make Neural Networks Vulnerable to Availability Attacks

2023·Lorenz, Kwiatkowska, Fritz·paper·AISec·arXiv·project

Certified systems' abstain-and-fallback behavior is itself an attack surface that poisoning can trigger on command.