in Blog

March 18, 2026

What Is Agent Bricks? Databricks’ Approach to Building Governed AI Agents at Scale

Home » What Is Agent Bricks? Databricks’ Approach to Building Governed AI Agents at Scale

Author:

Mateusz Kijewski

Senior Data Engineer

Reading time:

12 minutes

Agent Bricks is a meaningful architectural step toward making enterprise AI agents viable in production by embedding them directly into governed data environments. Its biggest strength—governance inheritance—is real and differentiating, while its automation around evaluation and optimization can significantly accelerate delivery for standard use cases. However, it does not remove the need for strong data foundations, clear evaluation criteria, or technical expertise.

It simplifies the path, but it does not eliminate the work.

Key Takeaways:

Agent Bricks solves infrastructure friction more than model performance
Governance inheritance is its strongest and most defensible advantage
Automation accelerates delivery, but only within predefined boundaries
Success depends more on data, evaluation, and governance than on tooling choice

Agent Bricks and the Shift to Data-Native AI Agents: An Architectural Analysis

Agent Bricks is Databricks’ attempt to solve one of the most frustrating problems enterprise teams face right now: building AI agents that actually work on production data, inside real governance constraints, without spending months assembling the infrastructure before a single useful thing gets built.

The promise is straightforward — describe what you want, connect your data, and the platform handles the rest. Evaluation, optimization, deployment, governance. Done.

That promise lands because the underlying problem is real. Most enterprise AI agent projects don’t fail because the model is wrong.

They fail because the surrounding infrastructure — access controls, data quality, audit trails, lineage — was built for humans, not for autonomous systems, and retrofitting it for agents turns out to be harder and slower than anyone planned.

The shift to data-native agents embodied by Agent Bricks is not a technology upgrade — it is a governance decision. Organizations that embed agents inside the data platform eliminate an entire class of compliance and maintenance problems that external agents can only partially solve through engineering discipline, while also significantly simplifying how agents integrate with enterprise data.

Edwin Lisowski

CGO & Co-founder at Addepto

What is Agent Bricks?

Agent Bricks bets that an opinionated, automated approach can shortcut most of that work. This article is about whether that bet holds up in practice.

Agent Bricks is a declarative agent-building layer inside Databricks’ Mosaic AI platform. You point it at a task description and a governed data source in Unity Catalog, and it assembles the full agent stack underneath: retrieval pipeline, prompt strategy, model selection, evaluation suite, and deployment endpoint.

Currently, Agent Bricks is in public preview (beta) on Azure Databricks and is only available in selected regions, with specific workspace and feature requirements. This regional and platform-scoped availability may influence how quickly teams can experiment with or productionize Agent Bricks in enterprise environments.

The assembly isn’t fixed — it runs an automated optimization sweep across those variables and surfaces the configuration that hits your quality target at the lowest inference cost.

Under the hood, three mechanisms do the work that teams would otherwise build manually:

Evaluation — Agent Bricks generates task-specific benchmarks using synthetic data derived from your actual governed data, and bootstraps the evaluation process with baseline judges. These include Agent-as-a-Judge, Tunable Judges, and a no-code Judge Builder, but they are not fully autonomous — teams are expected to refine, extend, and calibrate these judges over time. You define what “correct” means in domain terms, and the platform helps operationalize that definition into an evolving, repeatable evaluation pipeline.
Optimization — the platform runs sweeps across prompt strategies, model choices, chunking configurations, retrieval parameters, and fine-tuning options using a technique called Test-Adaptive Optimization (TAO), which adapts the search based on evaluation feedback rather than running a fixed grid. The result is a cost-vs-quality curve you can inspect before deploying.
Agent Learning from Human Feedback (ALHF) — domain experts provide natural language corrections after deployment. The system translates that guidance into technical adjustments — retrieval filters, prompt modifications, vector index changes — without requiring the expert to touch code.

These three mechanisms apply regardless of what you’re building. What changes depending on your use case is the task template — the pre-wired configuration of Mosaic AI components that Agent Bricks sits on top of.

There are currently five:

What problem Agent Bricks is trying to solve?

The promise is straightforward — describe the task, connect your governed data, and the platform handles evaluation, optimization, and deployment. Governance is inherited rather than bolted on, because the agent runs inside the data platform rather than connecting to it from outside.

That promise lands because the underlying problem is real. Most enterprise AI agent projects don’t fail because the model is wrong.

They fail because the surrounding infrastructure was built for humans, not autonomous systems — and the gap shows up in four specific places.

Access control
Enterprise roles and permissions don’t travel when you connect an external agent via API or JDBC. You either give the agent credentials that are too broad, or too narrow to be useful. Rebuilding the correct policies in a separate AI layer means maintaining two governance models that will drift from each other.
Audit trails
Regulators need to know what data informed a decision, when it was accessed, and by what process. That trail exists in the data warehouse. The moment an external agent enters the picture, it goes cold — and instrumenting it manually is the kind of work that turns a three-month project into a nine-month one.
Data quality
Enterprise data is full of known issues that analysts navigate by habit — unreliable fields after a migration, inconsistent category codes across regions. None of this is documented in a format an agent can read. It lives in team knowledge. An agent retrieves the data and trusts it.
Evaluation
Knowing whether an agent actually performs correctly — on your data, for your edge cases — requires test sets, judges, and scoring criteria that most teams underestimate and never finish properly. Agents get shipped on demo performance and found to be wrong in production.

These four problems compound each other, and Agent Bricks bets that running agents inside the platform where governance, lineage, and evaluation infrastructure already exist dissolves most of them by default.

Does Agent Bricks actually deliver their promises?

The honest answer is: it depends on which promise you’re examining. Some are architecturally genuine. Some are real but narrower than the marketing suggests. And one area — where the opinionated defaults end — requires understanding how Agent Bricks fits into the broader Databricks toolset to get any value at all.

Governance inheritance — this one is real

The core architectural claim holds up. Because Agent Bricks runs on top of Unity Catalog, the access policies already defined for your data apply to the agent automatically. Row-level security, column masking, attribute-based access control — none of it needs to be re-implemented in the agent layer.

The audit trail problem is similarly resolved: every prompt, retrieval, and model output is traced through MLflow and linked to Unity Catalog lineage. For regulated industries, this is the most practically significant thing Agent Bricks does, and it does it without any additional configuration.

This is also the hardest thing to replicate with a generic framework. You can wire LangChain to a governed data source, but the governance doesn’t travel with the connection — you get data access, not policy inheritance. That distinction is not marketing. It is a genuine architectural difference.

Automated evaluation — real infrastructure, but half the work

Agent Bricks generates synthetic evaluation benchmarks from your governed data, runs them through Agent-as-a-Judge and Tunable Judges, and surfaces quality scores before you deploy. That infrastructure is real and saves significant setup time compared to building eval pipelines from scratch.

What the marketing doesn’t say clearly enough is that the automation only operationalizes evaluation criteria you have already defined. Deciding what a correct answer looks like in your domain — what counts as a good extraction, what citation format satisfies your compliance team, what edge cases your data actually produces — is work the platform cannot do.

That definition takes as long as, sometimes longer than, the agent builds itself.

Agent Bricks automates the measurement but it does not automate the judgment.

Automated optimization — genuine, but within fixed boundaries

The TAO optimization sweep is real. The platform searches across prompt strategies, model choices, retrieval configurations, and fine-tuning options and presents you with a cost-vs-quality curve. For teams whose use case fits one of the five predefined templates, this saves weeks of manual iteration.

The boundary is what the sweep covers. You do not choose the embedding model. You do not control the chunking strategy. You cannot run your own A/B experiments within the predefined bricks.

The optimization is automated, but it operates on Databricks’ configuration space, not yours. For common enterprise patterns — knowledge assistants, document extraction — that space is usually sufficient. For anything that sits outside it, the automation simply doesn’t apply.

The “no-code” claim — accurate but misleading

Agent Bricks genuinely requires no code for the supported templates. That is not the misleading part. The misleading part is the implication that no-code means no technical work.

Preparing governed data sources, defining evaluation criteria, configuring Unity Catalog policies, and operationalizing the deployed endpoint all require technical judgment even if they don’t require writing Python. Teams that approach Agent Bricks expecting a non-technical workflow will hit that gap quickly.

Where the opinionated defaults end — and how to extend them?

The five predefined templates are intentionally opinionated. That is a design choice, not a limitation to work around — within their scope, the automation is genuinely powerful precisely because the configuration space is constrained.

The question is what happens when your use case sits outside that scope, or when you need to combine predefined agent behavior with custom logic.

Databricks’ answer is a layered extension model with three components

Custom Agents let you bring any framework — LangChain, LangGraph, CrewAI, OpenAI SDK — and deploy it through the same governed infrastructure.

The agent is registered in Unity Catalog, gets MLflow tracing automatically, and deploys as a managed Databricks App with scale-to-zero. What it doesn’t get is the auto-optimization layer — TAO sweeps, ALHF, synthetic evaluation generation are exclusive to the predefined bricks. Custom Agents inherit the platform’s infrastructure. They don’t inherit its autopilot.

MCP integration extends what even predefined bricks can reach. Teams can connect external MCP servers from the Databricks Marketplace — web search, financial data feeds, third-party APIs — or build custom MCP servers governed through Unity Catalog. This is the cleanest way to add external tool connectivity without moving off the predefined templates entirely.

The Supervisor Agent is where the extension model becomes architecturally complete. It orchestrates multiple subagents — Knowledge Assistants, Genie Spaces, Unity Catalog functions, and MCP servers — into a single coordinated endpoint.

In practice this means you can combine a predefined Knowledge Assistant handling document retrieval with a Genie Space handling structured data queries, routing between them based on the nature of the user’s request.

The Supervisor manages the coordination; each subagent operates within its own optimized, governed context.

The constraint worth knowing: as of early 2026 the Supervisor Agent supports Knowledge Assistant endpoints, Genie Spaces, Unity Catalog functions, and MCP servers as subagents — but not arbitrary Custom Agent endpoints directly.

Mixing fully custom logic into a Supervisor-orchestrated flow currently requires routing through a Unity Catalog function or MCP server as an intermediary. It is workable, but it adds a layer the documentation does not make obvious.

The practical architecture that emerges from this is a spectrum rather than a binary choice:

Common, well-scoped patterns — document Q&A, structured extraction, data querying — belong in predefined bricks where the automation pays off fully
Custom logic that doesn’t fit the templates gets built as Custom Agents and connected via MCP or Unity Catalog functions
The Supervisor Agent sits above both, handling routing and coordination
The governance layer sits below all of it, applying uniformly regardless of which path each agent took to get there

Conclusion

Agent Bricks delivers on its core architectural promise. The governance inheritance is real, the evaluation infrastructure is genuinely useful, and for the right use case the optimization automation saves weeks of manual work. The product is not overhyped in what it does — it is overhyped in how easy it is to get there.

From our own implementation experience, the gap between a promising Agent Bricks deployment and a production-ready one comes down to decisions made before the agent is built, not during. A few that consistently matter:

Sort the data foundation before touching the agent.
If your datasets aren’t governed, documented, and quality-baselined in Unity Catalog before you start, the agent will expose those gaps faster and more visibly than any human user ever did. This cannot be done in parallel with agent development.
Define evaluation before you define the agent.
Most teams build first and evaluate later. The sequence should be reversed. The automated optimization TAO runs is only as good as the evaluation criteria it runs against — garbage criteria produce confidently optimized garbage.
Plan the governance conversation before it finds you.
Every enterprise AI project hits a compliance review. Agent Bricks gives you concrete answers to most governance questions by default. Prepare those answers before the review, not during it.
Treat the agent as a production workload from day one. SLIs, runbooks, monitoring dashboards, canary deployments. The operational discipline that applies to data pipelines applies equally here. Teams that treat agents as experiments until forced to operate them professionally pay for it later.

No-code does not mean no expertise. The framework choice — LangChain, LangGraph, CrewAI, Agent Bricks — matters less than the quality of what surrounds it: the data foundation, the evaluation infrastructure, the governance model. We have seen poor outcomes from quickly assembled Agent Bricks deployments and strong outcomes from carefully governed LangGraph ones. The inverse is equally true.

Agent Bricks is the right choice when the use case fits its predefined templates, the team values automation over customization, and the data platform is already in place. It is not a shortcut past the engineering. It is a powerful tool for engineers who know the platform well enough to use it deliberately — and for organizations that understand when to bring in the expertise they don’t yet have.

FAQ

Is Agent Bricks a no-code solution for building AI agents?

Technically yes—but practically no. You don’t need to write code for supported templates, but you still need to prepare data, define evaluation criteria, and configure governance. That requires technical expertise.

When should I choose Agent Bricks over frameworks like LangChain or LangGraph?

Choose Agent Bricks when your use case fits standard enterprise patterns (like document Q&A or structured extraction) and you already operate within Databricks. If you need deep customization or experimental architectures, frameworks like LangGraph may be a better fit.

What makes governance inheritance such a big deal?

Because it eliminates duplicated effort and risk. Instead of recreating access control, audit trails, and lineage for agents, you reuse what already exists in your data platform—reducing compliance complexity significantly.

Does Agent Bricks eliminate the need for evaluation pipelines?

No. It automates execution, not definition. You still need to decide what “correct” means in your domain—something most teams underestimate.

Can I extend Agent Bricks beyond predefined templates?

Yes, but with trade-offs. You can use Custom Agents, MCP servers, or Unity Catalog functions, but you lose parts of the automation layer (like TAO optimization) and may need additional architectural work.

What is the biggest risk when adopting Agent Bricks?

Treating it as a shortcut. Teams that skip data preparation, evaluation design, or governance planning often end up with agents that work in demos but fail in production.

Category:

Data Engineering

Share this article: