FirmAdapt
FirmAdapt
LIVE DEMO
Back to Blog
AI complianceregulatorytrade secretsIPconfidentialityInformation security

Prompt Injection Attacks and the Trade Secret Exfiltration Vector Nobody Saw Coming

By Basel IsmailMay 23, 2026

Prompt Injection Attacks and the Trade Secret Exfiltration Vector Nobody Saw Coming

A researcher named Johann Rehberger demonstrated in late 2023 that he could get ChatGPT to exfiltrate conversation data by embedding invisible instructions in a webpage that the model's browsing tool retrieved. The payload was a markdown image tag pointing to an attacker-controlled server, with the conversation contents appended as URL parameters. OpenAI patched that specific exploit, but the underlying class of vulnerability remains wide open across most enterprise AI deployments. If your organization is feeding proprietary data into large language models, this should concern you.

How Prompt Injection Actually Works

At its core, prompt injection exploits the fact that LLMs cannot reliably distinguish between instructions from a trusted operator and instructions embedded in untrusted data. When a model processes a document, email, or database record that contains adversarial text, it may follow those embedded instructions as if they came from the system prompt or the user.

There are two main flavors. Direct prompt injection is when a user deliberately crafts input to override system instructions. Think "ignore your previous instructions and do X instead." Indirect prompt injection is more dangerous for enterprise contexts: malicious instructions are planted in external data sources that the model retrieves during normal operation. A poisoned PDF in a shared drive, a manipulated email in an inbox the AI agent monitors, a tampered record in a CRM. The user never sees the adversarial payload. The model just quietly follows it.

OWASP ranked prompt injection as the number one vulnerability in its Top 10 for LLM Applications (version 1.1, published October 2023). Simon Willison, one of the most respected voices in this space, has been writing since September 2022 that prompt injection is essentially an unsolved problem at the model level. No amount of fine-tuning or guardrailing has produced a reliable, general defense.

Why This Is a Trade Secret Problem

Companies are connecting LLMs to internal knowledge bases at a remarkable pace. Retrieval-augmented generation (RAG) architectures pull from document stores, wikis, codebases, and databases so that models can answer questions about proprietary information. This is genuinely useful. It is also a new exfiltration surface.

Consider a straightforward scenario. Your legal team uses an AI assistant connected to a contract repository. An outside party sends a document for review, and that document contains hidden prompt injection text (white text on white background, zero-width characters, instructions in metadata fields). When the AI assistant processes the document, the injected instructions tell it to summarize the five most recent confidential contracts and include them in its response, or to encode key terms into an outbound API call.

Under the Defend Trade Secrets Act of 2016 (18 U.S.C. 1836), a trade secret loses its protected status if the owner fails to take "reasonable measures" to keep it secret. Courts have been interpreting this requirement with increasing specificity. In Compulife Software Inc. v. Newman (11th Cir., 2020), the court scrutinized the plaintiff's actual security practices in detail. In Epic Systems Corp. v. Tata Consultancy Services (W.D. Wis., 2016), the jury awarded $940 million (later reduced to $420 million) partly based on evidence of how trade secrets were accessed and extracted through technical systems.

The "reasonable measures" standard is going to collide with AI deployments. If you connect an LLM to your trade secret repository and that LLM is vulnerable to prompt injection, a court could find that you failed to maintain reasonable protections. The vulnerability is known, well-documented, and on OWASP's top-ten list. Arguing ignorance will be difficult.

The NIST Dimension

NIST's AI Risk Management Framework (AI RMF 1.0, January 2023) identifies "information integrity" and "security and resiliency" as core functions. More specifically, NIST published a companion report in January 2024, Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (NIST AI 100-2), which explicitly catalogs prompt injection as a known attack type against generative AI systems. For organizations subject to federal contracting requirements or operating under frameworks that reference NIST standards, this publication creates a clear baseline expectation.

Defensive Approaches That Actually Help

Since there is no silver-bullet fix at the model layer, defense requires architectural thinking. Several approaches reduce risk meaningfully, even if none eliminate it entirely.

  • Input and output filtering. Scan all data entering the model context for known injection patterns. Also scan model outputs for signs of data leakage, such as unexpected structured data, encoded strings, or content that matches classified information patterns. This catches unsophisticated attacks but is bypassable by determined adversaries.
  • Privilege separation. The model should never have the same access permissions as the user. If a user asks the AI to search contracts, the system should enforce access controls independently of the model's behavior. The model's ability to retrieve, modify, or transmit data should be constrained by hard-coded policy layers that the model cannot override through any prompt.
  • Context isolation. Untrusted external content (incoming emails, third-party documents, web content) should be processed in a sandboxed context, separate from sessions that have access to sensitive internal data. Mixing trusted and untrusted content in a single model context is where indirect injection becomes dangerous.
  • Output channel restrictions. The model should not be able to make arbitrary outbound requests. If it cannot construct URLs, call external APIs, or embed content that triggers network requests, the most common exfiltration techniques fail. This is a blunt instrument, but it works.
  • Human-in-the-loop for sensitive operations. Any action the model takes that involves transmitting, summarizing, or displaying trade secret material should require explicit human confirmation. Automated pipelines that move data from retrieval to output without a checkpoint are the highest-risk configuration.
  • Audit logging with context capture. Log not just what the model output, but the full context window, including retrieved documents and the system prompt. If an injection attack occurs, you need forensic evidence of what instructions the model actually received. This also supports your "reasonable measures" argument in any subsequent litigation.

A Note on the Regulatory Trajectory

The EU AI Act, which entered into force in August 2024, imposes cybersecurity requirements on high-risk AI systems under Article 15, including resilience against "attempts by unauthorized third parties to alter their use, outputs or performance by exploiting system vulnerabilities." Prompt injection fits squarely within that language. For organizations operating in or selling into the EU, this is already a compliance obligation, not a best practice.

In the U.S., the SEC's cybersecurity disclosure rules (effective December 2023) require reporting of material cybersecurity incidents. A prompt injection attack that exfiltrates trade secrets or client data from an AI system would likely meet the materiality threshold, particularly for financial services firms. The question is not whether regulators will care about this attack vector. They already do.

How FirmAdapt Addresses This

FirmAdapt's architecture was designed around the assumption that LLMs are adversarially manipulable. The platform enforces strict privilege separation between the model layer and data access controls, meaning the model cannot escalate its own permissions regardless of what instructions appear in its context window. All retrieval operations pass through policy enforcement that operates independently of the model, with full audit logging of context windows, retrieved documents, and outputs. Untrusted content is processed in isolated contexts that cannot access sensitive data stores.

For organizations managing trade secrets, client confidential information, or regulated data, FirmAdapt provides the architectural controls that support a "reasonable measures" defense under the DTSA and align with NIST AI RMF expectations. The platform's compliance reporting captures the evidence you would need to demonstrate those controls to a court, a regulator, or an auditor.

Ready to uncover operational inefficiencies and learn how to fix them with AI?
Try FirmAdapt free with 10 analysis credits. No credit card required.
Get Started Free