Author:
CSO & Co-Founder
Reading time:
Why AI Audits Are Moving From Models to Systems and What to Do About It
In March 2023, Italy’s data protection authority temporarily blocked access to ChatGPT after regulators concluded they could not clearly explain how the system worked, what data it relied on, or how individuals could challenge its outputs.
Backed by wider guidance from EU regulators, this decision marked a turning point. It sent a clear message: AI systems can no longer operate as opaque black boxes. They must be built with visible structure, traceability, and effective oversight.
The problem is that most compliance teams are already overloaded. They are expected to provide oversight for systems that run around the clock, constantly change, and interact across multiple tools and data sources. Manual processes were never designed for this pace or complexity.
When AI systems operate without clear controls and oversight, organizations are not avoiding risk, they are simply deferring it. And deferred risk compounds.
This is what creates compliance debt. AI systems may appear to function well in production, but without traceability, accountability, and ongoing evaluation, they become fragile assets.
The longer this debt accumulates, the higher the cost of correction. Retrofitting governance into live AI systems requires reworking architectures, rebuilding documentation, revalidating models, and sometimes pausing or rolling back deployments altogether.
For European companies, the pressure is about to intensify. In 2026, the EU AI Act comes fully into force, turning expectations into obligations with real consequences.
Among the many changes, one will matter more than the rest: how compliance is judged.
Internal policies and good intentions won’t be enough for regulators, who will now require organizations to demonstrate how AI systems behave during use, including how risks are monitored, decisions are made, and oversight is applied.
This shift puts clear control and governance at the centre of the current wave of AI adoption.
Traditional automation and generic large language models both struggle in regulated environments. They fail for different reasons, but in a similar way.
Traditional automation works great in stable, well-defined processes with fixed rules.
The only issue is that compliance is, by its nature, not stable. Rules change, interpretations shift, and exceptions appear all the time. Rule-based systems cannot adapt to this. The result is a rigid system that looks compliant on paper but breaks down in real use.
Generic LLMs introduce the opposite problem: too much flexibility.
They can summarise policies, draft assessments, and generate fluent explanations quickly, but they can also hallucinate facts and sources, give inconsistent answers, and hide uncertainty behind confident language. In regulated environments, this is unacceptable.
The problem behind both approaches is largely architectural.
Most automation tools and LLMs are deployed as tools: called when needed and largely ignored otherwise. Compliance and impact evaluation are different. They require systems that not only run non-stop, but also allow to look under the hood at any given time.
AI agents are often discussed as productivity tools, but in compliance and impact evaluation, they can serve a very different role: control systems.
What makes AI agent systems different from other AI solutions is their design. Instead of relying on a singular model, they are constructed as a granular process that coordinates various models, tools, and data under clear rules.
Their biggest advantage is that they allow for uninterrupted 24/7 oversight. They can reassess systems as they change, detect drift, triage incidents, and verify safeguards as the systems are running. This is exactly what regulators expect: a shift from after-the-fact reviews to control built directly into operations.
The next question is obvious: what stops AI agent systems from becoming just another opaque layer?
The answer is, once again, design.
Governability does not emerge automatically from using AI, it has to be built in.
Grounding means that an AI agent does not rely solely on its internal model knowledge. Instead, it bases its reasoning on explicit, retrievable evidence.
In practice, the agent must first pull the relevant inputs (such as approved policies, model logs, dataset details, or internal regulatory guidance) and base its reasoning only on what it finds. This prevents the system from inventing explanations.
It also makes every decision traceable, because you can see exactly what evidence was used. While it’s not enough to guarantee correctness, it does guarantee inspectability, and for regulators, that matters more than raw accuracy.
Unconstrained reasoning is risky. When an AI agent is allowed to “think freely,” its behaviour becomes unpredictable and difficult to defend.
Controlled reasoning solves this by structuring how decisions are made. Tasks get broken into clear steps, autonomy gets limited, and outputs follow defined formats. If inputs are missing or unclear, the agent must signal uncertainty or stop. Because the process is explicit, it can be supervised, replayed, and audited.
The goal is not perfect reasoning, but bounded behaviour – decisions that follow organizational rules and can be clearly explained after the fact.
Oversight is not “a human somewhere in the loop.” It consists of explicit control gates, such as:
Humans are not a safety net for weak automation. Because machines cannot be held accountable, human involvement must be built in at decision-critical points involving judgment, ethics, or legal responsibility.
Auditability means being able to explain, after the fact, what the system did, when it did it, and why. This includes what information it used, which rules applied, and where humans approved or intervened.
A governed agent produces two outputs:
This is the difference between writing a report and proving that a controlled process took place.
Auditability does not imply the system was correct, but it does prove the system was operated under control, which matters much more.
When AI agents are used for compliance, risk, or impact evaluation, the goal is not perfect accuracy. That’s unrealistic, and regulators don’t expect it. What they do expect is governability: the ability to understand, supervise, and intervene when needed.
Good design starts with an assumption that errors will happen, and puts things like grounding, controlled reasoning, human oversight, and auditability in place to detect mistakes when they do happen, limit their impact, and assign responsibility when things go wrong.
The shift is from trusting the model to trusting the system around it. This matches how regulators actually evaluate AI systems.

Read also: How to Successfully Implement Agentic AI in Your Organization

The move toward grounded, auditable, human-supervised AI is not a niche concern. It affects any role responsible for decisions, risk, or accountability
Consider a situation many large organizations already face: a company uses AI to support CV screening. From a regulatory perspective, this is a sensitive use case. It influences access to employment and is likely to fall into a high-risk category under the EU AI Act.
As a result, the company must be able to demonstrate ongoing, documented control over how the system operates and how its outputs are used.
An agent-based compliance workflow can support this by:
Every decision is tied to a process, every process is tied to evidence, every approval is tied to a role, and every change leaves a trail. That is exactly what outcome-based regulation tests for.
The same structure applies across other regulated use cases, but some sectors will feel this shift sooner and more strongly due to the nature of their AI use:
In these industries, AI systems already influence high-impact outcomes. Oversight expectations follow naturally.
AI compliance has moved from a future concern to a current operational risk. As AI systems scale, oversight gaps scale with them. Regulation is responding by focusing on control, evidence, and accountability over time, rather than one-off reviews or stated intentions.
Many organizations are already part of the way there.
They have pilots that work. They have models that deliver real value. The next step is establishing a clear path from experimentation to AI systems that can operate reliably in production and stand up to audits
Making that transition requires more than technical capability. This is why the challenge spans roles and industries, and is exactly the reason why teams benefit most by working with partners who understand both sides of the problem: how AI systems are built and how they are examined under real regulatory and audit pressure.
Addepto is that partner. We help organizations design and deploy AI agent systems that are built for supervision from day one.

Read also: How to Choose the Right AI Company To Work With?

AI agents belong in high-stakes environments when governance is part of how they are designed and run. As AI becomes a permanent part of business operations, governance becomes a permanent requirement alongside it.
Teams that build governed AI now put themselves in a position to stay trusted over time.
Auditing an AI system means evaluating the entire operational setup around the model, not only its accuracy or training data. This includes how data flows through the system, how decisions are made, what safeguards are in place, how humans intervene, how changes are tracked, and how evidence is logged over time. Regulators increasingly care about whether organizations can demonstrate continuous control, traceability, and accountability in real-world use, not just technical model performance in isolation.
Organizations should start by mapping their existing AI systems: identifying use cases, risk levels, data sources, decision paths, and current oversight gaps. From there, they can introduce structured governance mechanisms such as evidence grounding, controlled workflows, human approval gates, and audit trails. The goal is not to rebuild everything immediately, but to progressively embed observability and control into live systems so they can withstand regulatory scrutiny once the EU AI Act comes into force.
Generic LLMs are designed for flexibility and fluency, not for accountability or verifiability. They can hallucinate information, provide inconsistent outputs, and obscure uncertainty behind confident language. In regulated contexts, this creates unacceptable risk because decisions must be explainable, reproducible, and defensible. Without grounding, controlled reasoning, and auditability, LLM outputs cannot reliably support compliance, risk assessment, or legally sensitive workflows.
AI agent systems enable continuous monitoring, structured decision-making, and built-in auditability across complex AI workflows. Instead of performing one-off reviews, organizations can maintain real-time oversight, detect drift or policy violations, enforce human approvals where required, and automatically generate evidence for audits. This reduces compliance debt, improves operational resilience, and allows AI systems to scale safely in production environments where accountability and regulatory trust are essential.
Category:
Discover how AI turns CAD files, ERP data, and planning exports into structured knowledge graphs-ready for queries in engineering and digital twin operations.