Provenance Is the New Perimeter

Daniel Bardenstein
September 30, 2025

Why Visibility Into AI’s Data Lineage Is a National Security Imperative

The National Academies just released one of the most important reports on AI you’ll read this year: Machine Learning for Safety-Critical Applications. The core message is simple but profound: if we’re going to deploy machine learning in places where failure costs lives, from defense systems and aircraft to medical devices and critical infrastructure, we need to rethink how we build trust in these systems from the ground up.

At the center of that paper is a concept too often treated as an afterthought: provenance, which is the ability to know, with certainty, where a model came from, what data shaped it, and how it’s changed over time.

Today, most AI ethics and AI security policies, tools, and discussions center on 'explainability’. But, explainability is the wrong first question. Before we ask why a model made a decision, we need to ask where that model came from. Without that, trust is impossible, and in a safety-critical context, impossible means unacceptable.

The Hard Truth: Most Organizations Have No Idea What’s Inside Their Models

Traditional software is traceable. We can follow requirements through code, testing, and deployment. But ML goes through an extra step. Models are written and trained on varying datasets. Their logic emerges from data, pipelines, and iterative optimization. And because that process is often opaque, even the engineers who built a system can’t fully explain what’s inside it.

That’s not just a technical problem, it’s a safety and national security problem.

  • If you don’t know where your training data came from, you don’t know whether it’s accurate, legally obtained, or poisoned.
  • If you don’t know how your model evolved, you can’t trace the root cause of a failure.

And that’s the gap the National Academies are warning about. They’re calling for a fundamental shift: treating provenance and lineage not as “nice-to-have” metadata but as safety-critical artifacts on par with the model itself.

Data Lineage Is Where Safety Begins

The report is blunt about this: data is now part of the safety case. If you can’t prove the origin, quality, integrity, and coverage of the data that shaped your model, you can’t claim your system is safe.

That means organizations need answers to questions most can’t answer today:

  • Where did every dataset come from?
  • Who collected it, labeled it, and transformed it and how?
  • Does it accurately reflect the conditions the system will face in the real world?
  • Could an adversary have tampered with it?
  • Has it changed over time, and if so, how has that affected model behavior?

The answers to those questions are not academic. They’re the difference between a system that behaves predictably in the field and one that fails when it matters most.

Provenance Is the Key to Trust, Accountability, and Control

If provenance sounds like a compliance problem, you’re thinking too small. In reality, it’s the backbone of three capabilities every organization deploying AI in a safety-critical context must have:

  • Trust: Without a verifiable chain of custody for data and models, no one;  not users, regulators, or operators,  can trust your system.
  • Accountability: When failures occur (and they will), provenance is what lets you trace issues to their root cause; a dataset, a labeling decision, a pipeline step, and fix them.
  • Control: Provenance turns opaque AI pipelines into systems you can govern. It allows you to verify integrity, detect tampering, and demonstrate compliance continuously — not just once at certification time.

This is the foundation the National Academies say we need to build if we’re serious about deploying AI in critical missions. And they’re right.

A Real-World Risk: Shadow AI in the Wild
This isn’t hypothetical. In one recent case, an employee at a Fortune 500 company quietly downloaded an open-source model like DeepSeek onto a workstation, fine-tuned it with sensitive internal data, and started deploying it inside a mission-critical workflow, all outside official review channels. Despite having AI governance policies in place, the organization had no visibility into the model’s provenance, what data had shaped it, or even that it existed. This kind of “shadow AI” not only bypasses compliance requirements but creates massive safety, legal, and security risks precisely the kind of blind spot provenance and lineage controls are designed to eliminate.

Operationalizing Provenance: How Manifest Makes AI Transparency Real

At Manifest, we believe transparency isn’t an add-on, it’s the foundation of secure, trustworthy AI. That’s why we’re building the infrastructure to make provenance and lineage practical, scalable, and continuous across the AI lifecycle.

AI Bills of Materials (AIBOMs): We create a complete, machine-readable inventory of every model, including its origin, training lineage, dependencies, and associated datasets. Our AIBOMs tie models back to their data sources, capture version history and fine-tuning events, and flag licensing or compliance risks. This makes it possible to answer the most important question in safety-critical AI: “Where did this model come from, and what shaped it?”

Provenance Validation & Risk Assessment: Manifest continuously validates provenance chains, ensuring that data and model lineage remain intact and auditable. We score risks related to training data, licensing, and vulnerabilities,  and trace the downstream impact of flagged datasets or components so organizations can act quickly and confidently.

Shadow AI Detection & Policy Enforcement: We eliminate blind spots by detecting untracked or unauthorized models across code, deployments, and pipelines. Policies can require provenance documentation (like AIBOMs) before a model moves into production, turning transparency from a manual chore into an automated control.

Integration Into Existing Workflows: Provenance is only useful if it’s operationalized. Manifest integrates with CI/CD, SCA, TPRM, and security workflows so provenance validation, risk scoring, and anomaly detection happen automatically — not as an after-action report.

From Burden to Advantage

There’s another message buried in the National Academies report: provenance isn’t just a compliance checkbox. It’s a strategic capability. Organizations that can trace, audit, and explain their AI systems will deploy faster, certify more easily, and respond more effectively to adversarial threats.

And in national security contexts, that visibility is more than a competitive edge, it’s a deterrent. It signals to adversaries, regulators, and partners alike: We understand our AI. We control it. And we can prove it.

The Bottom Line

We’re entering an era where AI will make decisions that matter — decisions with lives, missions, and national interests on the line. In that world, provenance isn’t optional. It’s the new perimeter: the boundary between AI we can trust and AI we can’t.

The question every organization should be asking right now isn’t “How do we make our models more explainable?” — it’s “How do we make them traceable?”

At Manifest, we’re building the tools to answer that question, and to make provenance a foundation, not an afterthought, of the AI systems that shape our future. Talk to our team to learn more. 

“Manifest knows the AIBOM and cybersecurity space, sees the problems arising, and always has a solution to showcase.”
Manager of Global Technology Legal Compliance,
Multinational Software Company
Secure your software supply chain today.
Get a demo