FirmAdapt
FirmAdapt
LIVE DEMO
Back to Blog
AI complianceregulatorytrade secretsIPconfidentialityInformation governance

Retrieval Augmented Generation and the Confidentiality Boundary You Keep Forgetting

By Basel IsmailMay 23, 2026

Retrieval Augmented Generation and the Confidentiality Boundary You Keep Forgetting

RAG is the architecture everyone reaches for when they want an LLM to answer questions about internal documents. Index your corpus, retrieve relevant chunks at query time, stuff them into the context window, get a grounded response. It works remarkably well. It also creates a confidentiality problem that most implementation teams treat as an afterthought, if they treat it at all.

The core issue is deceptively simple. When you put a document into a RAG pipeline, you are placing it in the inference path. Every query that triggers retrieval of that document effectively grants the querying user read access to its contents, mediated only by the LLM's generation step. The LLM will summarize it, quote it, reason over it, and surface its substance in the response. If your retrieval index contains board minutes, M&A term sheets, employee disciplinary records, and engineering specs all in one undifferentiated pool, your RAG system has just collapsed every access control boundary you spent years building.

The Access Control Problem Nobody Scoped

Traditional document management systems enforce access at the object level. SharePoint has permissions. iManage has security policies. NetDocuments has workspaces. These systems are imperfect, but they represent decades of institutional effort to keep the wrong eyes off sensitive material.

RAG implementations routinely bypass all of it. The ingestion pipeline pulls documents from source systems, chunks them, generates embeddings, and stores them in a vector database. At that point, the original access controls are gone. The vector store does not know that Chunk 47832 came from a document classified as Attorney Work Product, or that Chunk 91204 is from an HR investigation file restricted to three people in the organization.

Some teams attempt to solve this with metadata tagging at ingestion time, carrying forward the source system's permissions as filterable attributes. This is better than nothing, but it introduces its own fragility. Permissions in the source system are dynamic; they change when people move roles, when matters close, when NDAs expire. The vector store's metadata becomes stale the moment it is written unless you build a synchronization layer, and almost nobody does.

Trade Secrets Get Especially Dangerous

Under the Defend Trade Secrets Act of 2016 (18 U.S.C. 1836), a trade secret loses its protected status if the owner fails to take "reasonable measures" to keep it secret. Courts have been interpreting that requirement with increasing specificity. In Compulife Software Inc. v. Newman (11th Cir. 2020), the court scrutinized the plaintiff's actual technical controls, not just its policies. In Epic Systems Corp. v. Tata Consultancy Services, a jury awarded $940 million (later reduced to $420 million) in a case where access controls were central to the narrative.

Now consider what happens when your proprietary formulations, pricing algorithms, or customer acquisition models are sitting in a RAG index that any employee with access to the chatbot can query. You have arguably failed the "reasonable measures" test. The information is technically accessible to anyone who can phrase the right question. A plaintiff's attorney in a future trade secret misappropriation case will have a field day with that architecture diagram.

The Uniform Trade Secrets Act, adopted in some form by 48 states, uses similar "reasonable efforts" language. If your RAG system makes trade secrets retrievable by a broad user base, you are undermining your own legal position before any adversary even enters the picture.

Privilege Waiver Is the Other Shoe

Attorney-client privilege and work product protection can be waived by disclosure to third parties, and in some circuits, by failure to maintain confidentiality even within the organization. If privileged legal memoranda are ingested into a RAG corpus accessible to non-legal staff, you may have a waiver problem.

Federal Rule of Evidence 502(b) provides a safety net for inadvertent disclosures, but only if the holder took "reasonable steps to prevent disclosure" and "promptly took reasonable steps to rectify the error." An architecture that systematically places privileged documents in a retrieval index available to the whole company is hard to characterize as inadvertent. It looks deliberate. And 502(b) does not protect deliberate disclosures, even ones made through ignorance of the technical implications.

The Advisory Committee Notes to Rule 502 specifically reference the costs of privilege review in electronic discovery. Courts are sympathetic to mistakes in large document sets. They are less sympathetic to architectural choices that eliminate confidentiality boundaries by design.

Regulated Data Adds Another Layer

If your RAG corpus contains PHI governed by HIPAA, you have a minimum necessary standard problem (45 C.F.R. 164.502(b)). The minimum necessary rule requires covered entities to limit access to only the PHI reasonably necessary for a particular purpose. A RAG system that retrieves patient records in response to operational queries is almost certainly violating this standard.

For financial services firms, GLBA's Safeguards Rule (16 C.F.R. Part 314, as amended effective June 2023) requires access controls proportionate to the sensitivity of customer financial information. The SEC's cybersecurity disclosure rules (adopted July 2023) create board-level accountability for material cybersecurity risks, and a misconfigured RAG pipeline that exposes customer data across the organization could qualify.

ITAR and EAR controls present perhaps the starkest version of this problem. If technical data subject to export controls ends up in a RAG index queryable by foreign national employees, you may have committed a deemed export violation. The penalties under ITAR can reach $1.3 million per violation (as adjusted), and criminal penalties include up to 20 years imprisonment under 22 U.S.C. 2778.

What a Defensible RAG Architecture Actually Requires

The minimum viable approach involves several things that most proof-of-concept deployments skip entirely.

  • Document-level classification at ingestion. Every chunk needs to carry forward the sensitivity classification, access group, and regulatory category of its source document. This is not optional metadata; it is the foundation of any access control scheme.
  • Query-time permission filtering. Before retrieved chunks reach the LLM's context window, the system must verify that the querying user has authorization to access every chunk being passed. This needs to happen against a live permissions source, not a stale snapshot.
  • Segregated indices for high-sensitivity material. Trade secrets, privileged communications, ITAR-controlled technical data, and PHI should not live in the same vector store as general corporate knowledge. The blast radius of a misconfiguration is too large.
  • Audit logging of retrieval events. You need to know which user's query caused which document chunks to be retrieved and included in a response. This is your evidence of reasonable measures, and your incident response starting point.
  • Periodic access review. Just like you review access to source systems, you need to review who can query which segments of the RAG index. Role changes, departures, and matter closures all need to trigger revalidation.

How FirmAdapt Handles This

FirmAdapt's architecture was built around the assumption that documents in the inference path are a confidentiality risk, not a feature to celebrate. The platform enforces document-level access controls at retrieval time, synchronized with your existing identity and permissions infrastructure. Chunks inherit classification from their source documents, and query-time filtering ensures that no user receives a response informed by material they would not be authorized to view directly. Audit logs capture the full retrieval chain for every interaction.

For organizations dealing with trade secrets, privileged materials, or regulated data, FirmAdapt maintains segregated retrieval boundaries that map to your existing information governance policies. The goal is straightforward: give your teams the productivity benefits of RAG without quietly dismantling the confidentiality controls your legal and compliance teams spent years putting in place.

Ready to uncover operational inefficiencies and learn how to fix them with AI?
Try FirmAdapt free with 10 analysis credits. No credit card required.
Get Started Free