The Repeated‑Token Attack: Extracting Training Data from ChatGPT
In December 2023, researchers from Google DeepMind, Cornell University and other institutions demonstrated that a simple prompt-engineering technique could coax ChatGPT into revealing memorised training data—including personal email signatures, contact information, and programming code—for roughly US $200 in API costs.
Key Facts
- Researchers from Google DeepMind, Cornell University and other institutions
- Simple prompt: repeat a single word thousands of times
- Leaked content: personal emails, contact details, code, URLs, bitcoin addresses
- 10,000+ unique verbatim training examples extracted
- Total cost: approximately US $200 in ChatGPT API queries
- Some trigger words were 164× more effective than others
Background
In December 2023 researchers from Google DeepMind, Cornell University and other institutions showed that a simple prompt‑engineering technique could coax ChatGPT into regurgitating memorized training data. The attack, sometimes called a repeated‑token attack, exploits the fact that large language models occasionally memorize and reveal parts of their training corpus when pushed outside of typical operating conditions.
This research was significant because it demonstrated that even "closed" AI models—where the training data is not publicly accessible—can be manipulated into revealing their training corpus, raising serious questions about data privacy in AI systems.
What Happened
By instructing ChatGPT to repeat single words such as "poem," "company," "send," "make" or "part" thousands of times, the researchers found that after a few hundred repetitions the model began producing snippets of training data.
The leaked content included:
- Personal email signatures and contact information
- Verbatim paragraphs from books and poems
- URLs and user identifiers
- Bitcoin addresses
- Programming code
Some trigger words were more effective than others; for example, repeating "company" led to training‑data leakage 164 times more often than less charged words. Using roughly US $200 worth of ChatGPT queries, the researchers extracted more than 10,000 unique verbatim training examples.
This vulnerability demonstrates that closed AI models can memorize and output sensitive data if they are manipulated with crafted prompts. Any personal or corporate data that was part of the training corpus could potentially be extracted by adversarial users.
Response and Consequences
Dark Reading reported that OpenAI was informed of the issue and may have mitigated it by late 2023, though the susceptibility of large models to memorization and extraction remains an open research problem.
The incident triggered broader discussions about the privacy risks of training AI systems on indiscriminate web data. The researchers warned that developers should not deploy AI models in privacy‑sensitive contexts without strong safeguards. While the attack did not breach any specific organization, it shows that uncurated data can be exposed via model outputs.
Lessons for Enterprise AI Governance
This case illustrates how prompt-engineering attacks can extract training data from generative AI models, creating risks for any organisation whose data may have been included in the training corpus.
- Incorporate privacy-preserving techniques. AI developers should use differential privacy and de-biasing constraints during training to minimize memorization.
- Conduct regular red-teaming. Adversarial testing can help identify and patch memorization pathways before they are exploited.
- Monitor AI outputs. End-users and organisations should be aware that AI responses may contain copyrighted or sensitive information and should avoid publishing verbatim AI outputs without verification.
- Implement output governance. Deploy systems that scan AI outputs for PII, proprietary data, and other sensitive content before it reaches end users.
- Control training data. Organisations should audit what data is being used to train or fine-tune AI models and ensure sensitive data is excluded or properly anonymised.
Prunex inspects and evaluates AI outputs in real time, detecting PII, proprietary data patterns, and sensitive content before it reaches end users. With configurable policy enforcement, organisations can automatically redact or block outputs that contain memorised training data, regardless of how it was triggered. Every enforcement action is logged with full context for audit and compliance review.
Protect your enterprise from AI data extraction
See how Prunex provides real-time output governance to prevent training data leakage and ensure compliance.
Request a Demo →