Led by the nation’s foremost legal specialist on the intersection of artificial intelligence and the law, as well as world-class litigators and IP-focused lawyers fluent in the latest AI and machine-learning tools and their applications, our cross-departmental AI team is uniquely positioned to help clients mitigate potential legal and business risks in their use of AI-powered technologies, while safely taking advantage of the business opportunities presented by these powerful new tools.
European Regulators Provide Guidance on the Use of Personal Data in Artificial Intelligence
January 13, 2025 Download PDF
As 2024 came to a close, key guidance on the use of personal data as part of AI was released by European data protection regulators. On 10 December 2024, the UK Information Commissioner’s Office (“ICO”) responded to its year-long consultation series on the relationship between the development and use of generative AI and data protection law in the UK (the “ICO Response”).[1] Then, on 17 December 2024, the European Data Protection Board (“EDPB”) published its opinion on certain data protection aspects related to the processing of personal data in the context of AI models (the “EDPB Opinion”).[2]
This alert summarises the legal guidance and key practical takeaway points from the ICO Response and EDPB Opinion and considers how they interplay with the fast-evolving AI and privacy landscape in the U.S.
Summary of Guidance
- Anonymising AI Models. The EDPB considers that certain AI models should not be viewed as “anonymous” because they are designed to provide personal data as an output (e.g., AI models that reply with personal data from training when prompted). Indeed, the EDPB has taken the view that, even with respect to AI models which are not designed for such purposes, personal data used in training may remain “absorbed” into the parameters of the AI model and so may be capable of extraction during use. Therefore, according to the EDPB, an AI model will only be considered anonymous (and therefore not subject to the GDPR) where the likelihood of extracting and obtaining underlying personal data (directly or via queries) is “insignificant” taking into account all the means “reasonably likely” to be used by the controller or another person to extract/obtain such personal data.
The EDPB Opinion (non-exhaustively) notes that an AI model is more likely to be considered “anonymous” where the controller demonstrates that: (a) the training data sources and training processes (including any filtering) were selected in order to limit the processing of personal data; (b) privacy-preserving techniques (such as the introduction of “noise” through techniques such as differential privacy and improvement of model generalisation) were implemented during training; (c) effective engineering governance aimed at anonymisation has been implemented (including document-based audits); (d) there is appropriate frequency, quantity and quality of tests conducted to determine and mitigate security risks; and (e) other relevant documentation is in place (such as data protection impact assessments and advice from data protection officers). The EDPB notes that the requirements may differ between a publicly available AI model where the scope of potential users is unknown versus an internal AI model which is only accessible by user’s employees. In light of this guidance, a controller should consider assessing its model training and deployment practices, including whether it is possible to obtain personal data from the model.
By analyzing whether an AI model may be considered anonymous, the EDPB addresses whether AI models constitute personal data in themselves for GDPR purposes (the question from the Irish supervisory authority that prompted the EDPB Opinion was originally phrased “Is the final AI Model, which has been trained using personal data, in all cases, considered not to meet the definition of personal data?”). While the EDPB Opinion implies that some AI models are likely to be considered personal data in certain circumstances, it remains to be seen the extent to which EU courts and national regulators will reflect this guidance in practice, with divergent guidance from EU supervisory authorities.[3]
The ICO Response does not expressly address this anonymity question and so, for now, the general ICO guidance on anonymity of personal data is the ICO’s most current statement on this question.[4]
- Legal Basis for Processing. While the EDPB Opinion leaves open the possibility of any legal basis being relied upon to use personal data to develop AI (including to conduct web scraping), the ICO Response provides the ICO’s view that, in the web scraping context at least, “legitimate interests” is the only potentially valid legal basis to process personal data to train generative AI.
Focussing on legitimate interests, both the ICO and the EDPB reiterate that, for legitimate interests to be a valid lawful basis for processing in developing and deploying AI models (particularly in the context of web scraping in the ICO Response), the controller must still demonstrate that the processing is: (a) for a lawful, non-speculative, specific and clear interest (e.g., developing a service of a conversational agent to assist users, developing an AI system to detect fraud and improving threat detection in IT systems); (b) necessary for the purpose of the legitimate interest, such that a less intrusive means of data collection cannot be used; and (c) taking into account the context, nature and consequences of processing, not overridden by the fundamental rights and freedoms of the data subjects taking account of the relevant circumstances (i.e., the “balancing test”). Both the ICO and EDPB provide various considerations when conducting the balancing test, including: (i) based on the context and information provided to the data subject (including whether the personal data was originally publicly available and whether the data subjects had knowledge of such availability), the reasonable expectation of data subjects – the ICO expressly notes that just because a type of processing is common practice, this does not mean that the processing reflects reasonable expectations (especially given the complexity of AI technologies); and (ii) evidencing the likely benefits to the specific data subjects (rather than merely assuming such benefits exist or pointing to general societal benefits such as “innovation”). Both also note that, where rights of data subjects would otherwise override the legitimate interests of the controller, processing may still be possible if appropriate mitigation measures are implemented, including technical measures (such as exclusion of or limitations on high risk data sources (such as scraping), output filters and digital watermarking), pseudonymisation to prevent combination of identifiers, the masking or substitution of personal data in training sets, the provision of additional information to data subjects (whether scope of content or via different disclosure means) and appropriate contractual obligations on downstream users.
- Purposes of Processing. According to the EDPB and ICO, controllers should take care to document a specific, explicit and clear purpose of processing, nature of personal data and rationale for its use for each stage of the development and deployment of AI – the purpose at each stage (for example, for the development of a model, the development of an application using that model, and the deployment of that application) may differ or be incompatible and therefore require a different approach to legal basis on a case-by-case basis. The ICO Response additionally notes that: (a) disclosing a purpose does not require controllers to disclose their proprietary code or algorithms; and (b) although contracts could be used to disclose the purpose, this does not negate the need to establish a legitimate interest (despite the existence of a contract).
- Accuracy. The ICO Response notes that developers and deployers of generative AI must be transparent about the accuracy of the training data (including the extent to which it relies on inference, opinion or other AI-generated data) and output data, and implement appropriate and sufficient means of assessing that accuracy (and associated output risks) and communicating the same to downstream users. According to the ICO, the appropriateness of a model’s level of accuracy will depend in part on the AI model’s use case (for example, the ICO Response suggests that AI which generates output for users to rely on as a source of factual information, should have a greater statistical accuracy than AI which generates output purely for non-factual use). The EDPB Opinion does not specifically comment on the accuracy of personal data in the AI context and, as such, the general accuracy requirements of Articles 5(1)(d) and 16 of the EU GDPR should continue to be followed by AI developers.
- Allocating Controllership. The ICO Response notes that the assessment of whether an organisation is a controller, joint controller or processor of personal data is a complex exercise which should reflect the organisation’s actual level of control and influence over the purposes and means of processing. It is not determined by contract or ownership of underlying intellectual property. According to the ICO, developers and deployers of closed-access AI models will often be joint controllers towards the end of the development and release lifecycle as their objectives and control converge, and they will therefore be required to document their respective statutory responsibilities in a way that would not otherwise be required at earlier stages of the development lifecycle when they are more likely to be independent controllers (although the ICO is careful to note that joint controllership does not necessarily mean equal responsibility). Additionally, the ICO states that a developer is not a processor when using a deployer’s data to improve its model, because the developer is processing the personal data for their own purpose and the benefit to the developer’s clients is not sufficient to establish a processor relationship. The EDPB did not directly address this concept, although notes in passing in the EDPB Opinion that the roles of responsibilities of different parties in the AI model lifecycle should be assessed before processing takes place and has previously provided general advice on such matters outside the context of AI.[5]
- Data Subject Rights in AI Models. According to the ICO and EDPB, AI models must be designed to allow the exercise of data subject rights. The ICO provides examples of such design, which include granting a reasonable period between data collection and use to enable an exercise of rights, allowing effective rights of opt-out and erasure, implementing mechanisms to alert the controller of unauthorised personal data outputs (to allow unlearning) and, where legitimate interests are relied on as a legal basis, ensuring the AI model enables the right to object. The ICO cautions against reliance on exemptions on Article 11 GDPR (no data subject rights where ability to identify removed), unless there is clear evidence that identification is not possible (and will not be possible in the future if new information is provided).
Practical Takeaways
- AI providers, deployers and other users should consider whether the use of personal data solely in training of an AI model could give rise to an argument by a regulator or before a national court that the AI model once deployed is not “anonymous” under European privacy laws. This may involve an assessment to consider the likelihood that any personal data could be extracted by reasonable use of the AI model and what mitigations can be implemented if the intention is to remove the need to comply with European privacy laws.
- AI providers, deployers and other users should also assess whether they can proceed on the basis of their legitimate interests for use of personal data in training data sets (particularly where taken from public sources). Consequently, when relying on legitimate interests, documentation and mitigation measures should be implemented to validly complete a legitimate interest assessment.
- Entities in the AI value chain should document the allocation of responsibility with other partners in the AI value chain, which need to reflect any conclusions as to separate purposes and legal bases for use of personal data in development and deployment.
- To reduce the need for significant changes to models once already commercially released, it may be prudent to build in the functionality and processes to comply with obligations relating to accuracy and responses to data subject rights as part of the design of the AI model (even if the general intention is not to process personal data in the model).
Interplay with the U.S. Regulatory Landscape
Because the U.S. lacks a federal comprehensive privacy law, its regulators approach generative AI without the same sort of unified statutory framework the GDPR or EU AI Act might provide. However, California has recently made certain controversial changes to its existing state privacy law that implicate the regulation of AI models (rather than merely personal information used by AI systems).
Specifically, California’s state legislature recently enacted a novel and controversial change to their state privacy law clarifying that “artificial intelligence systems that are capable of outputting personal information” themselves constitute personal information.[6] Because this concept of an AI model itself constituting personal information is not grounded in any other U.S. privacy law or regulation, California’s approach constitutes an outlier under U.S. law and has attracted heavy criticism. For example, commentators have called into question the unintended consequences of classifying AI systems as personal information – which they assert “will create significant risk and uncertainty for model developers”.[7] California courts have yet to opine on the implications of this provision, so it remains an unsettled question how California’s legal requirements pertaining to personal information can or should be applied to the model itself, including, for example, deletion and transparency requests. Although this legislation is an outlier in its substance, it reinforces a broader trend that U.S. legislators are moving more readily than agencies in regulating AI.
California’s privacy regulator, the California Privacy Protection Agency, is also notably in the process of developing regulations under its privacy laws concerning automated decision-making technologies (“ADMTs”). Under the current draft of these regulations, California residents would have the right to opt-out of ADMTs or to request information about how a business applied ADMTs to that person. These access and opt-out ADMT rights would only apply under the draft regulations when one of three conditions is satisfied: (i) when ADMTs are used to make a significant decision about a California resident, such as one impacting one’s access to financial services, housing, insurance, education, criminal justice, employment, healthcare, “or essential goods or services”; (ii) when ADMTs are used to profile in a workplace of educational setting, in a public setting or for behavioural advertising; and third, when ADMTs are being trained where the ADMT “is capable of being used for . . . a significant decision”, “[t]o establish individual identity”, “[f]or physical or biological identification or profiling” or “[f]or the generation of a deepfake”.[8] These proposed regulations for ADMTs are substantial, and courts have not yet been presented with an opportunity to consider whether they go beyond the narrow statutory authorisation for ADMT rulemaking under the California privacy law. Public comment on these proposed ADMT regulations are open until February 19, 2025.
Conclusion
The ICO Response and EDPB Opinion show that, despite the UK’s departure from the EU, the UK and EU data protection regulators remain closely aligned on their views regarding the processing of personal data in the context of AI. One area where there does not appear to be clear alignment (or at least where there is room for inference of difference in approach) relates to the appropriate legal basis for use of personal data in relation to AI models, where the ICO’s opinion in the ICO Response remains that, at least in the context of web scraping, only legitimate interests are permissible and that even this may be hard to demonstrate based on the current level of transparency provided by generative AI developers, whereas the EDPB Opinion does not expressly discount the validity of other legal bases. Additionally, it will be interesting to see whether the ICO follows the EDPB’s opinion on the anonymity of AI models, given that the ICO has not yet given guidance specifically on this topic.
With European jurisdictions further down the road than the U.S. towards enforcing the use of personal data in AI (pursuant to the GDPR), and with the EU AI Act coming into force over the course of the next 24 months, now is an opportune moment for businesses involved (or intending to be involved) in the AI supply chain in Europe – whether they are developing or deploying AI systems and whether based inside or outside Europe – to ensure that their data privacy and AI compliance activities are comprehensive and aligned towards mitigating non-compliance risk.
* * *
[1] Available here: https://ico.org.uk/media/about-the-ico/what-we-do/our-work-on-artificial-intelligence/response-to-the-consultation-series-on-generative-ai-0-0.pdf
[2] Available here: https://www.edpb.europa.eu/system/files/2024-12/edpb_opinion_202428_ai-models_en.pdf
[3] See, for example, guidance from the Danish supervisory authority which provides that AI models themselves may not be considered personal data as they are the product of the processing of personal data. (https://www.datatilsynet.dk/Media/638321084132236143/Offentlige%20myndigheders%20brug%20af%20kunstig%20intelligens%20-%20Inden%20I%20g%c3%a5r%20i%20gang.pdf), or the discussion paper from the Hamburg Commissioner for Data Protection and Freedom of Information which provides that no personal data is stored in LLMs and therefore data subject rights cannot relate to the model itself (https://datenschutz-hamburg.de/fileadmin/user_upload/HmbBfDI/Datenschutz/Informationen/240715_Discussion_Paper_Hamburg_DPA_KI_Models.pdf).
[4] See, for example, https://ico.org.uk/media/about-the-ico/documents/4018606/chapter-2-anonymisation-draft.pdf
[5] See, for example, https://www.edpb.europa.eu/system/files/2023-10/EDPB_guidelines_202007_controllerprocessor_final_en.pdf
[6] Cal. Civ. Code § § 1798.140(v)(4)(c).
[7] See Joint Veto Request Letter – CA AB 1008, Computer & Communications Industry Association, at https://ccianet.org/library/joint-veto-request-letter-ca-ab-1008/.
[8] Article 11 Section 7200.