Databricks Data + AI Summit 2026 on June 16–17 revealed a fundamental shift in how Databricks approaches data infrastructure. The clearest message: Databricks is no longer building primarily for humans as the main users of data. It is built for AI agents.
That shift changes what matters across architecture, governance, security, observability, and cost control. Agents need reliable access to accurate, up-to-date, and well-governed business data. They also need systems that control what they can do, monitor how they behave, evaluate their outputs, and keep usage and spending within limits.
Databricks is betting that the AI market will not be won by whoever builds the smartest model. As models become more widely available and increasingly similar, the real competitive advantage will come from owning the governed, contextualized data layer those models and agents depend on. In other words, the competition is moving from the best model to the best data infrastructure.
KEY TAKEAWAYS
Databricks is making an explicit bet: AI agents (not humans) will become the primary consumers of the data layer. This is not about adding a copilot to BI or making dashboards more attractive. The shift is much bigger.
Kasey Uhlenhuth, Director of Product for AI, framed the entire Summit around a clear divide:
Humans get Genie (the conversational interface), while developers get Agent Bricks (the platform for building and operating agents). But the agents themselves become the real workload.
AGI is already here; the bottleneck is context, not intelligence.
Ali Ghodsi
CEO Databricks
Databricks argues that the market will not be won by whoever builds the smartest model, because AI models are increasingly becoming commodities. What makes an AI agent valuable is the data it can access. That data needs to be accurate, current, secure, and easy for the agent to understand. A powerful model will still produce poor results if it works with outdated, incomplete, or badly organized information.
What Databricks announced:
What this means in practice:
Today, we have the classic problem: transactional databases (Postgres, MySQL) and analytics warehouses (Databricks, Snowflake) are separate. Data flows one-way via ETL. By the time an agent queries the warehouse, it’s hours old. Try to have an agent update a customer record based on analytical insights? You either hit the slow transactional DB or make a stale decision from the warehouse.
With LTAP (and Lakebase as the underlying OLTP layer), you have one governed data copy. Fresh operational context + analytical context in the same query with no sync lag or replica drift.
What Databricks announced:
The key shift: permissions become stateful and contextual, not static. Traditional access control is static – “this agent can always access table X.” That breaks with autonomous agents. An agent analyzing customer churn legitimately needs email addresses and phone numbers, but shouldn’t be able to export that list for unsolicited marketing.
An agent processes a customer record with health information. The system detects PII was accessed, so it restricts what happens next: the agent can email a coworker (fine) but cannot publish to a public site (exposure risk) and must request approval before updating Salesforce.
Agent trace data (queries, results, attempted actions) streams into the Lakehouse. You’ll build pipelines to process these in real time, flagging things like “which agents touched PII,” “did any agent attempt unauthorized actions,” and “aggregate risk across all agents.” This monitoring work ties into security tools like Lakewatch.
You will likely build pipelines that ingest agent traces, calculate PII exposure, and flag suspicious agent behavior. This is new operational work.
What Databricks announced:
What this means for your team:
What Databricks announced:
The pitch: consolidate specialized tools into one governed platform. The reality: most companies run fragmented stacks—Kafka for streaming, Druid or Pinot for real-time dashboards, Databricks or Snowflake for analytics, separate CDPs, maybe a SIEM. Each has its own access model and governance. AI agents query multiple systems and get incomplete context. Compliance is a nightmare.
Make Databricks the unified foundation. Use Lakeflow to ingest from all sources. Use Lakehouse//RT for real-time queries instead of maintaining a separate engine. Use Unity Catalog as the single governance model.
You’re centralizing governance and context delivery, not eliminating complexity. You still have multiple systems. What changes is that access, monitoring, and data lineage flow through one place, and AI agents get unified context instead of scattered queries. This is genuinely valuable, it prevents AI hallucination by giving agents complete, current information. But it requires you to:
Lakehouse//RT is still in beta and not production-ready everywhere yet. If you’re running Druid or Pinot today, don’t rip it out immediately.
What Databricks announced:
An agent doesn’t know what “customer risk score” means. A model doesn’t know your company’s definition of “profitable customer.” Embeddings and semantic search don’t help here. But a governed semantic layer does.
You define: “customer_risk_score” is this column, calculated by this formula, owned by this team, updated daily. “Profitable customer” is defined as customers with LTV > $5k and churn < 15%. Metrics are queryable by SQL, BI, and agents. That layer is what makes agents accurate and safe, not model choice.
This is a shift from “data governance is a compliance thing” to “data governance is a capability thing.” Agents won’t work without it. This only works if your organization commits to maintaining it. Definitions go stale. Ownership changes. Metrics get redefined. The semantic layer requires ongoing discipline. But without that investment, agents will continue hallucinating because they lack reliable context.
| Feature | Status |
|---|---|
| LTAP | Coming Soon |
| Lakehouse//RT | Beta |
| SDP Real-Time Mode | Public Preview |
| Unity Catalog Glossary | Preview Soon |
| AI Runtime | Public Preview |
| Lakebase | Production (at scale) |
LTAP is coming soon but not yet generally available. Start by auditing your current ETL lag, where does stale data cost you most? Use this window to plan your migration strategy and assess which workloads would benefit most from real-time unified data access.
Calculate your current Genie usage immediately. Apply the new pay-as-you-go model (free monthly allowance + DBU overage billing) to estimate costs. Build cost tracking into any new agent deployments from the start rather than retrofitting later.
No. Lakehouse//RT is still in beta. If your real-time systems are working, keep them for now. Plan for a migration timeline once the feature reaches general availability and you’ve tested it in your environment.
Static governance says “this agent can always access table X.” Stateful governance tracks what data the agent touched and adapts rules based on what it tries to do next. An agent can query PII for legitimate analysis but can’t export it for marketing. The system enforces different rules based on context.
Start small. Pick 3 – 5 critical metrics and define them properly in Unity Catalog Metrics. Assign clear ownership. Update them regularly. A small, maintained semantic layer beats a large, stale one. This becomes the foundation for agent accuracy and organizational discipline.
Data engineers need to plan for agent trace ingestion and PII detection pipelines. Architects should reconsider whether separate transactional/analytical systems still make sense. Data governance teams need to shift from compliance documentation to maintaining live semantic layers. Finance needs cost allocation models per agent and use case.
Category:
Discover how AI turns CAD files, ERP data, and planning exports into structured knowledge graphs-ready for queries in engineering and digital twin operations.