The Trojan in the Training Data: How AI Models Leak Data

Since the explosive rise of generative AI models like ChatGPT, Microsoft Copilotn and Google Gemini, data privacy has become a central concern. The public often focuses on the most visible risks: employees carelessly entering confidential data into a public AI tool. But beneath the surface lie far more complex and insidious dangers, uncovered by researchers and security experts. They’ve proven that AI models can not only unintentionally leak data but can also be actively and deliberately manipulated to access secret information.

What We Know: The Research of “White Hats”

Security experts, often called “White Hats,” have meticulously examined the inner workings of AI models in recent years and have uncovered unsettling vulnerabilities. Their publications show that the danger doesn’t just come from careless users; it’s also inherent in the models’ architecture.

The Phenomenon of “Memorization”:In a groundbreaking 2020 study, scientists like Nicholas Carlini from the Google Brain Team demonstrated that large language models (LLMs) have the ability to memorize training data verbatim. By querying the model with specific prompts, they were able to extract private information such as email addresses, phone numbers, and passages from copyrighted books. This happens because the training data, often scraped from the public internet, can contain sensitive or private information. The model stores this as part of its knowledge and can reproduce it upon request—a direct data leak stemming from the nature of AI itself.
- Source: „Extracting Training Data from Large Language Models“
Targeted “Prompt Injection” and “Jailbreaking”:This is one of the most discussed attack methods. A “prompt injection” attack uses manipulated input instructions to bypass an AI model’s internal safeguards. Researchers have succeeded in tricking models into ignoring their predefined rules (e.g., not to follow ethically questionable instructions). In one notorious case, security experts were able to make the OpenAI DALL-E 2 model generate images of brand logos, even though this was contractually prohibited by the provider. While this didn’t leak secret data, it demonstrates the manipulability of AI and the potential to misuse it for other harmful purposes.
The ChatGPT Data Leak of 2023:In March 2023, OpenAI suffered an outage that led to an unintentional data leak. Due to a software bug, some users were able to view the chat titles and in some cases even the email addresses, names, and credit card data of other users. This leak was confirmed by OpenAI in an official blog post and underscored the enormous risk inherent in the vast amounts of data collected by AI companies.
- Source: OpenAI Blog Post on the ChatGPT Outage

The Hidden Danger: What We Can Only Guess

The findings published by researchers are just the tip of the iceberg. They prove that the fundamental vulnerabilities exist. But what happens in secret can only be guessed. It’s highly unlikely that malicious actors—be they intelligence agencies from rogue states, state-sponsored hacking groups, corporate spies, or organized crime—aren’t already using these vulnerabilities for their own purposes.

States and Intelligence Agencies: For intelligence agencies, AI models are a goldmine. They don’t have to break into a system if they can use sophisticated queries to make it reveal secret information. It can be assumed that these actors are developing complex algorithms and techniques to specifically search AI models for confidential information. This isn’t just about grabbing individual passwords but about extracting behavioral patterns, strategic plans, or technical know-how that’s hidden in the training data. A corporate spy could try to get the model to complete an unfinished product description based on its training data.
Corporate Espionage: Companies invest immense sums in research and development. If a competitor has access to the training data of an AI model used by a rival, they could obtain details about unreleased patents, production processes, or marketing strategies. The Samsung example, where employees entered secret source code into ChatGPT, was documented by tech media as a prime example of the uncontrolled disclosure of trade secrets.
- Sources:
  - Cybernews: Samsung bans use of generative AI after leaked data revealed
  - CIO Dive: Samsung employees leaked corporate data in ChatGPT: report
Organized Crime: Criminals exploit every technological vulnerability. They could use AI models to generate lists of potential victims (e.g., email addresses of executives), system credentials, or even bank details that might have been accidentally included in training data.

Conclusion: The Uncontrolled Risk

The threat of AI data leaks is real and more multi-faceted than it appears at first glance. While researchers and the public discuss the obvious vulnerabilities, we must be aware that far more sophisticated and dangerous attacks are likely already happening in the shadows.

The responsibility lies with developers to make models more secure and to create safeguards that cannot be bypassed. At the same time, companies must train their employees and establish clear guidelines for using these technologies. Because in the end, the invisible danger is the greatest threat, and what we don’t know could harm us the most.

Do you think that in your organization everyone is aware of what they’re doing with AI?

As a tech PR expert, Thomas Konrad works extensively with AI and applies these tools thoughtfully in his professional work. Naturally, this article was also created with the support of AI — so credit goes to ChatGPT and Gemini as well.

What We Know: The Research of “White Hats”

The Hidden Danger: What We Can Only Guess

Conclusion: The Uncontrolled Risk

Share this:

Related Posts