FirmAdapt
FirmAdapt
DEMO
Back to Blog
AI complianceregulatoryhealthcareHIPAAPHI

Clinical Trial Data, IRB Approval, and the Generative AI Question

By Basel IsmailMay 1, 2026

Clinical Trial Data, IRB Approval, and the Generative AI Question

Sponsors and CROs are eager to point generative AI at clinical trial operations. The use cases are genuinely compelling: automated protocol deviation detection, patient narrative generation, adverse event coding, site feasibility analysis, even drafting sections of CSRs. The problem is that clinical trial data sits at the intersection of three overlapping regulatory frameworks, and none of them were written with LLMs in mind. Getting approval to use AI tooling in this space is possible, but it requires you to understand exactly where the guardrails are and how they interact.

The Regulatory Stack: HIPAA, IRB, and ICH-GCP

Start with HIPAA. Clinical trial data frequently contains protected health information. Under the Privacy Rule (45 CFR 164.502), covered entities and their business associates can use or disclose PHI for research purposes, but only under specific conditions: authorization from the participant, a waiver of authorization granted by an IRB or Privacy Board, or use of a limited data set with a data use agreement. If you are feeding PHI into a generative AI system, you need to know which of these pathways you are relying on, and you need to confirm that the AI vendor qualifies as a business associate with a signed BAA in place. No BAA, no PHI processing. Full stop.

Then there is the IRB layer. Under the Common Rule (45 CFR 46) and FDA regulations at 21 CFR 56, IRBs approve research protocols and have continuing oversight responsibility. The informed consent document (21 CFR 50.25) specifies how participant data will be used. If your approved protocol says data will be analyzed by the sponsor's biostatistics team using SAS, and you instead route case report forms through a cloud-hosted LLM, you have a protocol deviation. Possibly a significant one. IRBs take data handling changes seriously, particularly after the 2018 revisions to the Common Rule expanded requirements around broad consent for future research use of identifiable data.

ICH-GCP (E6(R2), with R3 adopted by ICH in May 2023) adds another dimension. Section 5.5.3 requires sponsors to ensure data integrity and reliability. GCP expects that systems used to capture, process, or store clinical data are validated. The R3 revision explicitly acknowledges technology-driven approaches and encourages risk-proportionate quality management, but it also reinforces that sponsors must maintain oversight of any technology they deploy. Using an unvalidated AI tool to generate or modify trial data would be a GCP violation, and FDA has shown it will cite sponsors for inadequate computerized system validation under 21 CFR 11.

Where AI Tooling Actually Fits

The realistic path forward involves separating use cases by risk tier.

Lower risk: operational and administrative tasks

  • Site feasibility analysis using aggregated, de-identified data
  • Protocol document drafting (no patient data involved)
  • Regulatory submission formatting and cross-referencing
  • Literature review and competitive landscape analysis

These tasks generally do not involve PHI and fall outside IRB jurisdiction. You still need to validate the tools under your quality management system, but the regulatory burden is manageable.

Medium risk: de-identified or limited data sets

  • Adverse event signal detection across de-identified datasets
  • Automated coding (MedDRA, WHO Drug) using limited data sets
  • Patient narrative drafts generated from structured CRF data with identifiers stripped per the Safe Harbor method (45 CFR 164.514(b))

Here you need to confirm your de-identification methodology is defensible. HHS guidance from November 2012 remains the benchmark. Expert determination under 164.514(b)(1) is stronger than Safe Harbor if you can get it, but Safe Harbor works if you genuinely remove all 18 identifier categories. The risk with generative AI is re-identification through inference, particularly with rare diseases or small trial populations. An LLM that has memorized training data could theoretically reconstruct identifiers from contextual clues. Your de-identification analysis should account for this.

Higher risk: identifiable patient data

  • Automated generation of patient narratives from source data
  • Query resolution using original CRF entries
  • Safety report drafting from individual case safety reports

This is where you need the full stack of approvals. You need a BAA with the AI vendor. You need the IRB to approve a protocol amendment or determine that the change falls within the scope of existing approval. You need informed consent language that covers AI-assisted data processing, or an IRB waiver of the additional consent requirement under 45 CFR 46.116(f). And you need to validate the AI system under 21 CFR 11 and your GCP-compliant quality management framework.

Getting IRB Approval: Practical Steps

IRBs are not inherently hostile to AI. They are hostile to vagueness. When you submit a protocol amendment for AI tooling, be specific about:

  • What data the AI system will access. Specify the data elements, not just "clinical trial data."
  • Where the data is processed. Cloud region, encryption standards, whether the vendor retains any data or uses it for model training. This last point is critical. If your AI vendor's terms of service allow them to use input data for model improvement, you have a HIPAA problem and a GCP problem simultaneously.
  • What the AI system outputs and who reviews it. IRBs want to see human-in-the-loop review for any AI output that affects patient safety or data integrity. Automated adverse event grading with no human review will not pass.
  • Your validation approach. Reference FDA's April 2023 discussion paper on AI/ML in drug development. Show that you have tested the system against known outputs and established acceptance criteria.

One thing worth noting: OHRP issued guidance in January 2024 clarifying that IRBs should evaluate AI tools used in research under existing frameworks rather than waiting for AI-specific regulations. This is helpful because it means IRBs have the authority to approve these amendments now. You do not need to wait for new rules.

The Vendor Problem

Most general-purpose AI platforms are disqualified from clinical trial use before you even get to the IRB question. OpenAI's enterprise API supports BAAs as of early 2024, but their data processing terms require careful review against GCP requirements. Google Cloud's Vertex AI and Microsoft Azure OpenAI Service both offer HIPAA-eligible configurations with BAAs, but "HIPAA-eligible" means you still have to configure the environment correctly. A misconfigured cloud deployment is not compliant just because the vendor signed a BAA.

The deeper issue is 21 CFR 11 compliance. Electronic records used in clinical trials must have audit trails, access controls, and validated system integrity. Most AI platforms were not built with these requirements in mind. You will likely need a middleware layer that handles logging, access control, and output versioning between the AI model and your clinical data management system.

Where FirmAdapt Fits

FirmAdapt's architecture was designed for exactly this kind of multi-framework regulatory environment. The platform enforces data classification at the point of ingestion, so PHI is identified and handled according to HIPAA requirements before it reaches any AI processing layer. Audit trails, access controls, and data retention policies are built into the infrastructure rather than bolted on after the fact, which addresses 21 CFR 11 requirements without requiring a separate middleware build.

For sponsors and CROs navigating IRB submissions, FirmAdapt provides the documentation artifacts that IRBs actually want to see: data flow diagrams, processing logs, validation records, and evidence that the AI vendor does not retain or train on input data. The platform supports BAA execution and can be configured to operate within specific cloud regions, which simplifies the data residency questions that come up during both IRB review and GCP audits. If you are trying to get AI tooling approved for clinical trial operations, having a compliance-first platform removes a significant number of the objections before they arise.

Ready to uncover operational inefficiencies and learn how to fix them with AI?
Try FirmAdapt free with 10 analysis credits. No credit card required.
Get Started Free
Clinical Trial Data, IRB Approval, and the Generative AI Que | FirmAdapt