EDPB Guidance on Generative AI and Personal Data: Reading the Latest Opinions
EDPB Guidance on Generative AI and Personal Data: Reading the Latest Opinions
The European Data Protection Board has been busy. Between December 2024 and March 2025, the EDPB issued a series of opinions that collectively form the most detailed regulatory guidance yet on how generative AI intersects with EU data protection law. If you're an AI vendor operating in Europe, or a company deploying AI tools that touch personal data of EU residents, these opinions deserve careful reading. They're more nuanced than the headlines suggest, and in a few places, they're surprisingly practical.
What the EDPB Actually Said
The key documents are Opinion 28/2024 on the processing of personal data in the context of AI models (adopted December 17, 2024) and the follow-up guidance published in early 2025 addressing specific scenarios around training data, legitimate interest, and anonymization claims. Together, they run well over 50 pages and cover a lot of ground.
The central question the EDPB tackled: can an AI model itself constitute personal data? Their answer is characteristically European in its precision. It depends. If personal data was used in training and that data can be extracted from the model, or if the model can be used to generate personal data about identifiable individuals, then yes, the model may contain personal data subject to GDPR. If the model has been genuinely anonymized such that personal data cannot be extracted or inferred, it may fall outside GDPR scope. The burden of proving anonymization, however, sits squarely on the controller.
This matters enormously for model distribution. If your AI model is personal data, then transferring it across borders, licensing it to third parties, or deploying it in new contexts all trigger GDPR obligations. Every downstream deployment potentially needs its own lawful basis.
Legitimate Interest as a Legal Basis for Training
One of the most closely watched questions was whether legitimate interest under Article 6(1)(f) GDPR could serve as a lawful basis for training AI models on personal data. The EDPB didn't slam the door shut, but they didn't exactly throw it open either.
The Board laid out a three-part test that will feel familiar to anyone who has done a legitimate interest assessment, but with AI-specific considerations layered in:
- Legitimate interest must be real and specific. "Improving our AI" is too vague. You need to articulate what the model does, why training on personal data is necessary for that purpose, and what benefits flow from it. Commercial interest can qualify, but it needs specificity.
- Necessity must be demonstrated. Could you achieve the same result with synthetic data, anonymized data, or less personal data? If so, processing real personal data fails the necessity test. The EDPB explicitly flagged data minimization under Article 5(1)(c) as a constraint on training dataset composition.
- Balancing test must account for the scale and opacity of AI processing. The EDPB noted that individuals whose data is scraped from the internet for training purposes have essentially no expectation that their data will be used this way. That weighs against the controller. The sheer volume of data involved in large model training also factors in.
The practical upshot: legitimate interest is available as a basis, but the documentation burden is significant. You need a thorough LIA (Legitimate Interest Assessment) that addresses AI-specific risks, and you need to be able to produce it on request. For companies training foundation models on web-scraped data, this is going to be a difficult argument to win without substantial safeguards.
Data Subject Rights and AI Models
The EDPB's position on data subject rights is where things get genuinely complicated for vendors. The Board confirmed that the right to erasure under Article 17 can apply to AI models, not just training datasets. If personal data is embedded in a model and cannot be removed through less drastic means, the controller may need to retrain or delete the model.
This is not theoretical. The Italian Garante's enforcement action against OpenAI in early 2023, which temporarily banned ChatGPT in Italy, raised exactly this issue. The EDPB's guidance now provides a framework that other DPAs can apply consistently. Expect to see more enforcement actions grounded in these opinions.
The Board also addressed the right of access under Article 15. If a model can generate information about an identifiable individual, that individual may have a right to know what the model "knows" about them. How exactly a company satisfies this obligation for a large language model is, to put it mildly, an open implementation question. The EDPB acknowledged the technical challenges but did not grant any exemptions on that basis.
Implications for AI Vendors and Their Customers
If you're deploying a third-party AI tool that processes personal data of EU individuals, these opinions affect your vendor relationships directly. A few things to think about:
Joint controllership is on the table. The EDPB reiterated that where a customer provides personal data to an AI vendor and that vendor uses the data to improve its models, both parties may be joint controllers under Article 26. This means you need a joint controller arrangement, not just a data processing agreement. Many existing AI vendor contracts are structured as processor agreements, and that structure may not hold up under scrutiny.
DPIAs are essentially mandatory. The Board stated that processing personal data to train or deploy generative AI models is "likely to result in a high risk" to individuals, triggering the DPIA requirement under Article 35. If you haven't done a DPIA for your generative AI deployments, you're already behind.
Transfer mechanisms need revisiting. If the model itself is personal data, then hosting it on US servers or sharing it with a US-based vendor implicates Chapter V of GDPR. The EU-US Data Privacy Framework helps, but only if the vendor is certified, and only for transfers to the US. Other jurisdictions require separate adequacy decisions or appropriate safeguards.
The Anonymization Question
Perhaps the most consequential aspect of the EDPB's guidance is its treatment of anonymization claims. Several AI companies have argued that even if personal data goes into training, the resulting model weights are sufficiently abstracted that no personal data comes out. The EDPB was skeptical.
The Board pointed to research demonstrating that LLMs can memorize and regurgitate training data, including personal information. Studies from Google DeepMind and others have shown extraction attacks that can pull verbatim training data from models. Until a vendor can demonstrate, with rigorous technical evidence, that their model cannot leak personal data, the EDPB's position is that anonymization should not be assumed.
This creates an interesting dynamic. Vendors who invest in differential privacy, federated learning, or robust unlearning techniques will have a regulatory advantage. Those who simply assert anonymization without technical backing are exposed.
How FirmAdapt Addresses This
FirmAdapt's architecture was designed around the principle that personal data processing in AI systems needs to be auditable, controllable, and minimized by default. The platform enforces data residency controls, maintains processing logs that support DPIA documentation, and provides configurable guardrails that prevent personal data from being used in model improvement without explicit authorization. For organizations navigating the EDPB's guidance, this means the technical infrastructure for compliance is already in place rather than bolted on after the fact.
FirmAdapt also supports the documentation requirements these opinions create. Legitimate interest assessments, records of processing activities, and joint controller arrangements all need to reflect the specific characteristics of AI processing. FirmAdapt's compliance tooling generates and maintains this documentation as a function of how the platform operates, which makes demonstrating compliance to a DPA substantially more straightforward than retrofitting traditional AI deployments.