Faculty Research Data, IRB Approval, and AI Processing
Faculty Research Data, IRB Approval, and AI Processing
If you work at a research university, you already know that Institutional Review Boards exist to protect human subjects. What you might not have thought through yet is what happens when a faculty member feeds interview transcripts, survey responses, or behavioral data into an AI tool that was never reviewed as part of the original IRB protocol. This is becoming a real problem, and most institutions are behind on it.
The Regulatory Baseline: 45 CFR 46 and What It Actually Requires
The Common Rule, codified at 45 CFR 46 (revised effective January 21, 2019), governs federally funded human subjects research. Subpart A lays out the core requirements: informed consent, IRB review, and ongoing oversight of research protocols. The regulation defines "human subject" broadly as a living individual about whom an investigator obtains data through intervention or interaction, or identifiable private information. If you are processing data that fits that definition, you need IRB approval for the methods you are using to process it.
Here is where it gets interesting. Section 46.108(a)(1) requires that IRBs review all research activities involving human subjects. The protocol submitted to the IRB is supposed to describe the procedures to be performed, including how data will be collected, stored, analyzed, and shared. When a researcher submits a protocol saying they will conduct thematic analysis of interview transcripts using NVivo, and then later decides to run those transcripts through GPT-4 or Claude or a fine-tuned model hosted on some startup's API, that is a material change to the protocol. Under 46.108(a)(3)(iii), changes in approved research require IRB review except where necessary to eliminate apparent immediate hazards to subjects.
Pushing research data through a third-party AI service is not eliminating a hazard. It is introducing one.
Why AI Processing Is a Protocol Change, Not Just a Software Upgrade
Some faculty treat AI tools like they would treat switching from SPSS to R. A different analysis tool, same basic operation. But the analogy breaks down quickly when you look at what actually happens during AI processing.
- Data transmission to third parties. Most commercial AI tools send data to external servers. OpenAI's API terms (updated March 2024) state that they do not train on API inputs by default, but the data still transits their infrastructure. For consumer-tier ChatGPT, the default until recently was that inputs could be used for model improvement. If a researcher pastes identifiable interview data into a consumer AI tool, that data may be retained, logged, or processed in ways that violate the consent participants originally gave.
- Re-identification risk. Even with de-identified data, large language models can sometimes re-identify subjects by combining contextual details. A 2023 study from ETH Zurich demonstrated that GPT-4 could infer personal attributes like location, income, and race from anonymized Reddit posts with up to 85% accuracy. IRBs reviewing protocols with AI components need to account for this.
- Consent scope. Most informed consent documents describe specific uses of participant data. If the consent form says "your responses will be analyzed by the research team," running those responses through an external AI service arguably exceeds the scope of consent. OHRP (the Office for Human Research Protections) has issued guidance letters on consent adequacy that would support this reading, including the 2018 guidance on broad consent under the revised Common Rule.
The practical consequence: if a faculty member uses an AI tool on human subjects data without IRB review of that specific use, the institution may be out of compliance with its Federalwide Assurance (FWA). Losing an FWA is catastrophic. It means the institution cannot receive federal funding for any human subjects research. Johns Hopkins, Duke, and other major research universities have faced FWA-related scrutiny for less ambiguous protocol violations.
What IRBs Should Be Asking
Progressive IRBs are starting to add AI-specific questions to their protocol review forms. Here is what a thorough review should cover:
- Which AI tools will be used? Specific vendor names, not just "AI-assisted analysis." The IRB needs to evaluate each tool's data handling practices.
- Where is data processed and stored? On-premises, cloud-hosted, API-based? Which jurisdiction? This matters for GDPR compliance if any subjects are EU residents, and it matters for ITAR if any research touches defense-adjacent topics.
- What are the vendor's data retention and training policies? Does the vendor retain inputs? For how long? Can they use inputs to improve their models? The answers to these questions directly affect whether the protocol meets the privacy protections described in the consent form.
- Has the consent form been updated? Participants should know their data will be processed by AI systems. Broad consent under 46.116(d) may cover some future uses, but it requires specific elements including a description of the types of research that may be conducted.
- What is the re-identification risk? Has the researcher assessed whether the AI tool could infer identifiable information from ostensibly de-identified data?
Stanford's IRB published updated guidance in late 2023 specifically addressing generative AI in research protocols. They now require researchers to document AI tool use in their protocol submissions and to justify why the tool is necessary for the research aims. Other institutions are following, but slowly.
The Enforcement Reality
OHRP has historically relied on compliance letters and corrective action plans rather than fines, which makes some administrators complacent. But the downstream consequences are severe. A 2019 OHRP determination letter to a major medical center resulted in a temporary suspension of multiple research projects and required the institution to re-consent hundreds of participants. The reputational damage and operational disruption dwarfed what a fine would have cost.
NIH has also signaled increased attention to data management in funded research. The 2023 NIH Data Management and Sharing Policy (effective January 25, 2023) requires detailed data management plans for all NIH-funded research. While it does not specifically mention AI processing, the requirement to describe how data will be managed, preserved, and shared creates an obvious hook for scrutiny if AI tools are used without documentation.
And then there is the civil liability angle. If a research participant's identifiable data is exposed through an AI tool that was never part of the approved protocol, the institution faces potential claims under state privacy laws, breach notification statutes, and possibly FERPA if the subjects are students. The University of California system settled a data breach case in 2023 for $1.1 million that involved research data, though the facts were different. The principle holds: inadequate data safeguards in research contexts create real financial exposure.
Building AI Review Into the Protocol From the Start
The cleanest approach is to treat AI tool selection as part of the initial protocol design, not as an afterthought. Researchers should identify which AI tools they plan to use, document the data flows, and include AI-specific language in their consent forms before submitting to the IRB. Amendments are always possible, but they slow things down and create compliance gaps in the interim.
Institutions should also consider maintaining an approved list of AI tools that have been vetted for data handling practices, similar to how many universities maintain approved lists of cloud storage providers for sensitive data. This reduces the burden on individual researchers and gives IRBs a baseline for review.
How FirmAdapt Addresses This
FirmAdapt's architecture is built around the principle that sensitive data should not leave a controlled environment for AI processing. For research institutions, this means faculty can use AI-assisted analysis tools without transmitting human subjects data to third-party APIs. Data stays within the institution's infrastructure, which simplifies IRB review and keeps protocols aligned with consent forms that promise data will be handled by the research team.
FirmAdapt also provides auditable processing logs that document exactly what data was processed, when, and by which model. This gives IRBs and compliance offices the documentation they need to verify that AI use matches the approved protocol, and it gives institutions a defensible record if OHRP or a funding agency comes asking questions.