A small number of samples can poison LLMs of any size

OCT 11, 2025

« All posts

Anthropic: A small number of samples can poison LLMs of any size

Specifically, we demonstrate that by injecting just 250 malicious documents into pretraining data, adversaries can successfully backdoor LLMs ranging from 600M to 13B parameters.