FirmAdapt
FirmAdapt
LIVE DEMO
Back to Blog
AI complianceregulatorytrade secretsIPconfidentialityInformation governance

Open Source LLMs and the False Sense of Security

By Basel IsmailMay 22, 2026

Open Source LLMs and the False Sense of Security

A CISO I spoke with recently described their self-hosted Llama 2 deployment as "air-gapped AI." No data leaves our walls, they said. Problem solved. Except it wasn't, and the gaps in their reasoning were significant enough that I think the broader conversation around open source LLMs and compliance deserves a more honest treatment than it usually gets.

Self-hosting an open source model does solve a real and specific problem: you eliminate the risk of sending sensitive data to a third-party API endpoint. For organizations handling trade secrets, attorney work product, or classified technical specifications, that concern is legitimate. The Samsara and Samsung incidents in early 2023, where employees pasted proprietary source code into ChatGPT, demonstrated what happens when hosted AI tools meet confidential information without guardrails. Samsung's semiconductor division banned generative AI tools entirely after three separate leaks in under a month.

So yes, keeping inference local means your prompts and outputs stay on infrastructure you control. But the compliance analysis cannot stop there, and too often it does.

What Self-Hosting Actually Gets You

Running a model like Llama 3, Mistral, or Falcon on your own infrastructure gives you control over the network boundary. Your data does not traverse someone else's servers. You avoid the terms of service issues that come with commercial APIs, where providers may retain inputs for model improvement unless you negotiate enterprise agreements that say otherwise. OpenAI's default data usage policy, for instance, changed multiple times between 2023 and 2024, and the enterprise API terms differ meaningfully from the consumer product terms.

For trade secret protection specifically, this matters. Under the Defend Trade Secrets Act of 2016 (18 U.S.C. 1836), maintaining trade secret status requires "reasonable measures" to keep the information secret. Feeding proprietary formulas or business strategies into a third-party API with ambiguous data retention policies could undermine a future misappropriation claim. Courts have not yet ruled directly on whether using a commercial LLM API constitutes failure to maintain secrecy, but the argument is easy to construct, and opposing counsel will absolutely construct it.

Self-hosting sidesteps that particular risk. Credit where it is due.

What Self-Hosting Does Not Get You

Here is where the conversation gets more interesting, and where I see organizations making expensive mistakes.

Access controls and audit trails

Deploying an open source model on your own GPU cluster does not automatically create the access governance layer that regulators expect. Under frameworks like NIST SP 800-53 (Rev. 5) and ISO 27001:2022, you need role-based access controls, logging of who queried the model with what inputs, and retention policies for those logs. Most open source model deployments I have seen are running on a shared endpoint with minimal authentication. The model itself has no concept of user permissions. If your intern and your general counsel can send the same queries and receive the same outputs, you have a segregation of duties problem.

Output governance

The model does not know what it should not say. Open source models ship without the compliance-specific output filtering that regulated industries need. A self-hosted Mistral instance will happily generate a summary of a privileged legal memorandum and serve it to anyone with endpoint access. It will combine information from different classification levels if your retrieval-augmented generation pipeline does not enforce boundaries. The model is a prediction engine. It has no opinion about information governance.

Model provenance and supply chain risk

Open source models come with their own supply chain concerns. The training data for most open source LLMs is not fully documented. Meta's Llama 3 model card provides high-level descriptions of training data but not a complete manifest. If your organization operates under regulations that require you to validate the provenance of tools processing sensitive data, such as ITAR for defense contractors or certain FDA guidance documents for life sciences, you may struggle to satisfy auditors. The EU AI Act, which entered into force in August 2024, imposes transparency obligations on deployers of general-purpose AI models that include documenting training data characteristics. Self-hosting does not exempt you from these requirements.

Ongoing maintenance and vulnerability management

Open source models require patching and monitoring just like any other software component. The difference is that the "patch" for a model vulnerability might mean redeploying a 70-billion-parameter model, which is not a trivial operation. When researchers at Carnegie Mellon and the Center for AI Safety published universal adversarial attack techniques against aligned LLMs in July 2023, commercial providers could patch their systems centrally. Self-hosted deployments needed to implement mitigations independently, and many did not.

The Honest Comparison

The real tradeoff looks something like this:

  • Data boundary control: Self-hosted wins clearly. Your data stays on your metal.
  • Access governance: Commercial enterprise platforms typically ship with identity integration, RBAC, and audit logging. Self-hosted deployments require you to build or buy these layers separately.
  • Output filtering and compliance guardrails: Enterprise hosted solutions increasingly offer configurable content policies and DLP integration. Open source models require you to implement these from scratch.
  • Regulatory documentation: Enterprise vendors provide SOC 2 reports, data processing agreements, and compliance attestations. Self-hosted means you own the entire compliance narrative, which can be an advantage or a burden depending on your team's capacity.
  • Cost of ongoing compliance: The infrastructure cost of self-hosting is well understood. The compliance engineering cost, building and maintaining the governance wrapper, is routinely underestimated by a factor of three to five, based on conversations with teams who have done it.

For trade secret protection specifically, the calculus is nuanced. Self-hosting protects the confidentiality of your inputs, but without proper access controls and audit trails, you may still fail the "reasonable measures" test under the DTSA or the Uniform Trade Secrets Act (adopted in some form by 48 states). A 2022 ruling in Compulife Software Inc. v. Newman reinforced that technical access controls are a key factor courts examine when evaluating whether reasonable measures were taken. Simply having the data on-premises is necessary but not sufficient.

The Middle Path Most Organizations Actually Need

The binary framing of "open source versus hosted" obscures what most regulated organizations actually need, which is a governed AI environment that provides data boundary control AND the compliance infrastructure that makes the deployment defensible. Some organizations build this themselves on top of open source models. It works, but it requires dedicated ML engineering and compliance engineering resources working in concert, and it requires ongoing investment as both the models and the regulatory landscape evolve.

The organizations I see struggling the most are the ones who deployed a self-hosted model, declared victory on data security, and never built the governance layer. Six months later, they have no audit trail of what queries were run against what data, no access segmentation, and no ability to demonstrate compliance to a regulator or in litigation discovery.

How FirmAdapt Approaches This

FirmAdapt was built around the premise that data boundary control and compliance governance are both requirements, not alternatives. The platform provides the information governance infrastructure, including role-based access controls, comprehensive audit logging, output filtering, and data classification enforcement, as foundational capabilities rather than aftermarket additions. Organizations get the data sovereignty benefits of controlled deployment with the compliance documentation and access governance that regulators and courts expect to see.

For trade secret workflows specifically, FirmAdapt maintains granular records of who accessed what information through the AI layer, enforces need-to-know boundaries within retrieval pipelines, and generates the kind of audit trail that supports a "reasonable measures" defense under the DTSA. The goal is to make the compliance posture of your AI deployment as strong as the rest of your information security program, without requiring you to staff a dedicated team to build and maintain it.

Ready to uncover operational inefficiencies and learn how to fix them with AI?
Try FirmAdapt free with 10 analysis credits. No credit card required.
Get Started Free
Open Source LLMs and the False Sense of Security | FirmAdapt