Agent Bricks is a meaningful architectural step toward making enterprise AI agents viable in production by embedding them directly into governed data environments. Its biggest strength—governance inheritance—is real and differentiating, while its automation around evaluation and optimization can significantly accelerate delivery for standard use cases. However, it does not remove the need for strong data foundations, clear evaluation criteria, or technical expertise.
It simplifies the path, but it does not eliminate the work.
Agent Bricks is Databricks’ attempt to solve one of the most frustrating problems enterprise teams face right now: building AI agents that actually work on production data, inside real governance constraints, without spending months assembling the infrastructure before a single useful thing gets built.
The promise is straightforward — describe what you want, connect your data, and the platform handles the rest. Evaluation, optimization, deployment, governance. Done.
That promise lands because the underlying problem is real. Most enterprise AI agent projects don’t fail because the model is wrong.
They fail because the surrounding infrastructure — access controls, data quality, audit trails, lineage — was built for humans, not for autonomous systems, and retrofitting it for agents turns out to be harder and slower than anyone planned.
The shift to data-native agents embodied by Agent Bricks is not a technology upgrade — it is a governance decision. Organizations that embed agents inside the data platform eliminate an entire class of compliance and maintenance problems that external agents can only partially solve through engineering discipline, while also significantly simplifying how agents integrate with enterprise data.
Edwin Lisowski
CGO & Co-founder at Addepto
Agent Bricks bets that an opinionated, automated approach can shortcut most of that work. This article is about whether that bet holds up in practice.
Agent Bricks is a declarative agent-building layer inside Databricks’ Mosaic AI platform. You point it at a task description and a governed data source in Unity Catalog, and it assembles the full agent stack underneath: retrieval pipeline, prompt strategy, model selection, evaluation suite, and deployment endpoint.
Currently, Agent Bricks is in public preview (beta) on Azure Databricks and is only available in selected regions, with specific workspace and feature requirements. This regional and platform-scoped availability may influence how quickly teams can experiment with or productionize Agent Bricks in enterprise environments.

The assembly isn’t fixed — it runs an automated optimization sweep across those variables and surfaces the configuration that hits your quality target at the lowest inference cost.
Under the hood, three mechanisms do the work that teams would otherwise build manually:
These three mechanisms apply regardless of what you’re building. What changes depending on your use case is the task template — the pre-wired configuration of Mosaic AI components that Agent Bricks sits on top of.
There are currently five:

Agent Bricks is Databricks’ attempt to solve one of the most frustrating problems enterprise teams face right now: building AI agents that actually work on production data, inside real governance constraints, without spending months assembling infrastructure before anything useful gets built.
The promise is straightforward — describe the task, connect your governed data, and the platform handles evaluation, optimization, and deployment. Governance is inherited rather than bolted on, because the agent runs inside the data platform rather than connecting to it from outside.
That promise lands because the underlying problem is real. Most enterprise AI agent projects don’t fail because the model is wrong.
They fail because the surrounding infrastructure was built for humans, not autonomous systems — and the gap shows up in four specific places.
These four problems compound each other, and Agent Bricks bets that running agents inside the platform where governance, lineage, and evaluation infrastructure already exist dissolves most of them by default.
The honest answer is: it depends on which promise you’re examining. Some are architecturally genuine. Some are real but narrower than the marketing suggests. And one area — where the opinionated defaults end — requires understanding how Agent Bricks fits into the broader Databricks toolset to get any value at all.
The core architectural claim holds up. Because Agent Bricks runs on top of Unity Catalog, the access policies already defined for your data apply to the agent automatically. Row-level security, column masking, attribute-based access control — none of it needs to be re-implemented in the agent layer.
The audit trail problem is similarly resolved: every prompt, retrieval, and model output is traced through MLflow and linked to Unity Catalog lineage. For regulated industries, this is the most practically significant thing Agent Bricks does, and it does it without any additional configuration.
This is also the hardest thing to replicate with a generic framework. You can wire LangChain to a governed data source, but the governance doesn’t travel with the connection — you get data access, not policy inheritance. That distinction is not marketing. It is a genuine architectural difference.
Agent Bricks generates synthetic evaluation benchmarks from your governed data, runs them through Agent-as-a-Judge and Tunable Judges, and surfaces quality scores before you deploy. That infrastructure is real and saves significant setup time compared to building eval pipelines from scratch.
What the marketing doesn’t say clearly enough is that the automation only operationalizes evaluation criteria you have already defined. Deciding what a correct answer looks like in your domain — what counts as a good extraction, what citation format satisfies your compliance team, what edge cases your data actually produces — is work the platform cannot do.
That definition takes as long as, sometimes longer than, the agent builds itself.
Agent Bricks automates the measurement but it does not automate the judgment.
The TAO optimization sweep is real. The platform searches across prompt strategies, model choices, retrieval configurations, and fine-tuning options and presents you with a cost-vs-quality curve. For teams whose use case fits one of the five predefined templates, this saves weeks of manual iteration.
The boundary is what the sweep covers. You do not choose the embedding model. You do not control the chunking strategy. You cannot run your own A/B experiments within the predefined bricks.
The optimization is automated, but it operates on Databricks’ configuration space, not yours. For common enterprise patterns — knowledge assistants, document extraction — that space is usually sufficient. For anything that sits outside it, the automation simply doesn’t apply.
Agent Bricks genuinely requires no code for the supported templates. That is not the misleading part. The misleading part is the implication that no-code means no technical work.
Preparing governed data sources, defining evaluation criteria, configuring Unity Catalog policies, and operationalizing the deployed endpoint all require technical judgment even if they don’t require writing Python. Teams that approach Agent Bricks expecting a non-technical workflow will hit that gap quickly.
The five predefined templates are intentionally opinionated. That is a design choice, not a limitation to work around — within their scope, the automation is genuinely powerful precisely because the configuration space is constrained.
The question is what happens when your use case sits outside that scope, or when you need to combine predefined agent behavior with custom logic.
Custom Agents let you bring any framework — LangChain, LangGraph, CrewAI, OpenAI SDK — and deploy it through the same governed infrastructure.

Read more: Analysis and Comparison of AI Agent Frameworks: From Fundamentals to Multi-Agent Systems

The agent is registered in Unity Catalog, gets MLflow tracing automatically, and deploys as a managed Databricks App with scale-to-zero. What it doesn’t get is the auto-optimization layer — TAO sweeps, ALHF, synthetic evaluation generation are exclusive to the predefined bricks. Custom Agents inherit the platform’s infrastructure. They don’t inherit its autopilot.
MCP integration extends what even predefined bricks can reach. Teams can connect external MCP servers from the Databricks Marketplace — web search, financial data feeds, third-party APIs — or build custom MCP servers governed through Unity Catalog. This is the cleanest way to add external tool connectivity without moving off the predefined templates entirely.

Read more: Model Context Protocol (MCP): Solution to AI Integration Bottlenecks

The Supervisor Agent is where the extension model becomes architecturally complete. It orchestrates multiple subagents — Knowledge Assistants, Genie Spaces, Unity Catalog functions, and MCP servers — into a single coordinated endpoint.
In practice this means you can combine a predefined Knowledge Assistant handling document retrieval with a Genie Space handling structured data queries, routing between them based on the nature of the user’s request.
The Supervisor manages the coordination; each subagent operates within its own optimized, governed context.
The constraint worth knowing: as of early 2026 the Supervisor Agent supports Knowledge Assistant endpoints, Genie Spaces, Unity Catalog functions, and MCP servers as subagents — but not arbitrary Custom Agent endpoints directly.
Mixing fully custom logic into a Supervisor-orchestrated flow currently requires routing through a Unity Catalog function or MCP server as an intermediary. It is workable, but it adds a layer the documentation does not make obvious.
The practical architecture that emerges from this is a spectrum rather than a binary choice:
Agent Bricks delivers on its core architectural promise. The governance inheritance is real, the evaluation infrastructure is genuinely useful, and for the right use case the optimization automation saves weeks of manual work. The product is not overhyped in what it does — it is overhyped in how easy it is to get there.
From our own implementation experience, the gap between a promising Agent Bricks deployment and a production-ready one comes down to decisions made before the agent is built, not during. A few that consistently matter:
No-code does not mean no expertise. The framework choice — LangChain, LangGraph, CrewAI, Agent Bricks — matters less than the quality of what surrounds it: the data foundation, the evaluation infrastructure, the governance model. We have seen poor outcomes from quickly assembled Agent Bricks deployments and strong outcomes from carefully governed LangGraph ones. The inverse is equally true.
Agent Bricks is the right choice when the use case fits its predefined templates, the team values automation over customization, and the data platform is already in place. It is not a shortcut past the engineering. It is a powerful tool for engineers who know the platform well enough to use it deliberately — and for organizations that understand when to bring in the expertise they don’t yet have.
Technically yes—but practically no. You don’t need to write code for supported templates, but you still need to prepare data, define evaluation criteria, and configure governance. That requires technical expertise.
Choose Agent Bricks when your use case fits standard enterprise patterns (like document Q&A or structured extraction) and you already operate within Databricks. If you need deep customization or experimental architectures, frameworks like LangGraph may be a better fit.
Because it eliminates duplicated effort and risk. Instead of recreating access control, audit trails, and lineage for agents, you reuse what already exists in your data platform—reducing compliance complexity significantly.
No. It automates execution, not definition. You still need to decide what “correct” means in your domain—something most teams underestimate.
Yes, but with trade-offs. You can use Custom Agents, MCP servers, or Unity Catalog functions, but you lose parts of the automation layer (like TAO optimization) and may need additional architectural work.
Treating it as a shortcut. Teams that skip data preparation, evaluation design, or governance planning often end up with agents that work in demos but fail in production.
Category:
Discover how AI turns CAD files, ERP data, and planning exports into structured knowledge graphs-ready for queries in engineering and digital twin operations.