in Blog

June 19, 2026

Databricks Data + AI Summit 2026: What’s Actually Happening

Author:




Vadym Mariiechko

Data Engineer


Reading time:




8 minutes


Databricks Data + AI Summit 2026 on June 16–17 revealed a fundamental shift in how Databricks approaches data infrastructure. The clearest message: Databricks is no longer building primarily for humans as the main users of data. It is built for AI agents.

That shift changes what matters across architecture, governance, security, observability, and cost control. Agents need reliable access to accurate, up-to-date, and well-governed business data. They also need systems that control what they can do, monitor how they behave, evaluate their outputs, and keep usage and spending within limits.

Databricks is betting that the AI market will not be won by whoever builds the smartest model. As models become more widely available and increasingly similar, the real competitive advantage will come from owning the governed, contextualized data layer those models and agents depend on. In other words, the competition is moving from the best model to the best data infrastructure.

KEY TAKEAWAYS

Model intelligence is commoditizing – the winner is who owns the governed, contextualized data layer that agents rely on.
AI agents (not humans) are becoming the primary consumers of data platforms, requiring fundamentally different architecture and governance.
Fresh data is essential – LTAP brings transactional and analytical workloads together on a single copy with no ETL lag.
Governance must be built-in and stateful, not bolted-on – permissions adapt based on what agents actually do, not static rules.
Your semantic layer (glossary, domains, metrics) is now infrastructure, not documentation – agents can’t work without reliable context.

AI Agents as Your Primary User

Databricks is making an explicit bet: AI agents (not humans) will become the primary consumers of the data layer. This is not about adding a copilot to BI or making dashboards more attractive. The shift is much bigger.

Kasey Uhlenhuth, Director of Product for AI, framed the entire Summit around a clear divide:

  • Day 1 focused on consuming AI, with Genie designed for knowledge workers.
  • Day 2 focused on developers building agents that can run business processes.

Humans get Genie (the conversational interface), while developers get Agent Bricks (the platform for building and operating agents). But the agents themselves become the real workload.

AGI is already here; the bottleneck is context, not intelligence.

Ali Ghodsi

CEO Databricks

Databricks argues that the market will not be won by whoever builds the smartest model, because AI models are increasingly becoming commodities. What makes an AI agent valuable is the data it can access. That data needs to be accurate, current, secure, and easy for the agent to understand. A powerful model will still produce poor results if it works with outdated, incomplete, or badly organized information.

Fresh Data Is a Must

What Databricks announced:

  • LTAP (Lake Transactional/Analytical Processing): One copy of data for both transactional and analytical workloads, designed to bring transactional workloads (OLTP) and analytical workloads (OLAP) together on a single copy of data without ETL pipelines or replicas. Built on Lakebase and governed through Unity Catalog, Databricks describes LTAP as the “world’s first” architecture of its kind and represents one of the company’s biggest long-term bets on the future of data infrastructure.

What this means in practice:

Today, we have the classic problem: transactional databases (Postgres, MySQL) and analytics warehouses (Databricks, Snowflake) are separate. Data flows one-way via ETL. By the time an agent queries the warehouse, it’s hours old. Try to have an agent update a customer record based on analytical insights? You either hit the slow transactional DB or make a stale decision from the warehouse.

With LTAP (and Lakebase as the underlying OLTP layer), you have one governed data copy. Fresh operational context + analytical context in the same query with no sync lag or replica drift.

PRO TIP
Start thinking about how your data sync would change if fresh data was free, and where stale data costs you most today.

Governance Has to Be Built-in, Not Bolted-on

What Databricks announced:

  • Unity AI Gateway with Contextual Service Policies: SQL-based, stateful rules can allow, block, or require approval for specific agent actions, such as writing to sensitive folders or pushing code. The gateway includes guardrails for PII exposure and prompt injection.
  • Agent tracing: Every agent input, output, and reasoning path is stored in the Lakehouse and governed through Unity Catalog. Teams can analyze these traces with Genie or connect them to Lakewatch for real-time security and PII alerts.
  • Budgets, smart routing, and cost caps: Teams can control agent spending, set usage limits, and automatically route requests to the most appropriate model based on cost or performance.

The key shift: permissions become stateful and contextual, not static. Traditional access control is static – “this agent can always access table X.” That breaks with autonomous agents. An agent analyzing customer churn legitimately needs email addresses and phone numbers, but shouldn’t be able to export that list for unsolicited marketing.

Concrete example

An agent processes a customer record with health information. The system detects PII was accessed, so it restricts what happens next: the agent can email a coworker (fine) but cannot publish to a public site (exposure risk) and must request approval before updating Salesforce.

What lands on data engineers

Agent trace data (queries, results, attempted actions) streams into the Lakehouse. You’ll build pipelines to process these in real time, flagging things like “which agents touched PII,” “did any agent attempt unauthorized actions,” and “aggregate risk across all agents.” This monitoring work ties into security tools like Lakewatch.

You will likely build pipelines that ingest agent traces, calculate PII exposure, and flag suspicious agent behavior. This is new operational work.

Built-in Cost Control

What Databricks announced:

  • Genie (the agent-facing query interface) transitions from free to pay-as-you-go on July 6, 2026. Users get a free monthly LLM allowance, then usage is billed in DBUs. Admins set budgets, hard caps, and alerts via Unity AI Gateway.
  • Smart routing: Routes each query to the optimal model based on task complexity, quality, cost, and reliability. Cheaper models handle simple tasks; stronger models handle complex ones.

What this means for your team:

  • If you’re using Genie today: Costs will shift when billing turns on July 6, 2026. Calculate what your current usage would cost under the new pay-as-you-go model – free allowance per user, then DBU billing for overages.
  • If you’re planning to deploy agents: Build cost tracking into your architecture from the start. Waiting to add monitoring later won’t work when costs can spike unpredictably.
  • Budget administration gets more complex: You’ll need to track not just total compute spend, but costs per user, per agent, and per use case. Admins will use Unity AI Gateway to set budgets, hard caps, and alerts at the granular level that actually matters operationally.

One Data Layer

What Databricks announced:

  • Lakeflow: The unified ingestion and orchestration layer.
  • Lakehouse//RT: Real-time analytics capability.

The pitch: consolidate specialized tools into one governed platform. The reality: most companies run fragmented stacks—Kafka for streaming, Druid or Pinot for real-time dashboards, Databricks or Snowflake for analytics, separate CDPs, maybe a SIEM. Each has its own access model and governance. AI agents query multiple systems and get incomplete context. Compliance is a nightmare.

Databricks’ actual offer

Make Databricks the unified foundation. Use Lakeflow to ingest from all sources. Use Lakehouse//RT for real-time queries instead of maintaining a separate engine. Use Unity Catalog as the single governance model.

The honest take

You’re centralizing governance and context delivery, not eliminating complexity. You still have multiple systems. What changes is that access, monitoring, and data lineage flow through one place, and AI agents get unified context instead of scattered queries. This is genuinely valuable, it prevents AI hallucination by giving agents complete, current information. But it requires you to:

  • Maintain metadata and governance as an ongoing discipline, not a one-time project
  • Accept that organizational issues (unclear data ownership, weak governance culture) won’t be solved by the platform

Lakehouse//RT is still in beta and not production-ready everywhere yet. If you’re running Druid or Pinot today, don’t rip it out immediately.

Your Semantic Layer Is Now Table Stakes

What Databricks announced:

  • Genie Ontology: Continuously learns business context from Databricks plus 50+ connected applications (Slack, Jira, Drive, Confluence, SharePoint).
  • Unity Catalog Semantics: A new governance layer feeding Genie Ontology, comprising three components:
    • Glossary: Authoritative term definitions (coming soon)
    • Domains: Business-aligned scoped context (public preview)
    • Metrics: Governed KPIs queryable from SQL, BI, APIs, and AI agents

Why this matters

An agent doesn’t know what “customer risk score” means. A model doesn’t know your company’s definition of “profitable customer.” Embeddings and semantic search don’t help here. But a governed semantic layer does.

You define: “customer_risk_score” is this column, calculated by this formula, owned by this team, updated daily. “Profitable customer” is defined as customers with LTV > $5k and churn < 15%. Metrics are queryable by SQL, BI, and agents. That layer is what makes agents accurate and safe, not model choice.

What this means for your work

  • Your Glossary isn’t a documentation artifact anymore. It’s infrastructure.
  • Your metric definitions need to be in Unity Catalog Metrics, not a Confluence page.
  • Your data ownership model (Domains) needs to be explicit and enforced.

This is a shift from “data governance is a compliance thing” to “data governance is a capability thing.” Agents won’t work without it. This only works if your organization commits to maintaining it. Definitions go stale. Ownership changes. Metrics get redefined. The semantic layer requires ongoing discipline. But without that investment, agents will continue hallucinating because they lack reliable context.

Databricks Feature Status and Timeline

Feature Status
LTAP Coming Soon
Lakehouse//RT Beta
SDP Real-Time Mode Public Preview
Unity Catalog Glossary Preview Soon
AI Runtime Public Preview
Lakebase Production (at scale)

FAQ


When should we start implementing LTAP?

plus-icon minus-icon

LTAP is coming soon but not yet generally available. Start by auditing your current ETL lag, where does stale data cost you most? Use this window to plan your migration strategy and assess which workloads would benefit most from real-time unified data access.


How does the July 6, 2026 billing change affect our budget?

plus-icon minus-icon

Calculate your current Genie usage immediately. Apply the new pay-as-you-go model (free monthly allowance + DBU overage billing) to estimate costs. Build cost tracking into any new agent deployments from the start rather than retrofitting later.


Do we need to replace Druid or Pinot with Lakehouse//RT now?

plus-icon minus-icon

No. Lakehouse//RT is still in beta. If your real-time systems are working, keep them for now. Plan for a migration timeline once the feature reaches general availability and you’ve tested it in your environment.


What's the difference between static and stateful governance?

plus-icon minus-icon

Static governance says “this agent can always access table X.” Stateful governance tracks what data the agent touched and adapts rules based on what it tries to do next. An agent can query PII for legitimate analysis but can’t export it for marketing. The system enforces different rules based on context.


How do we build a semantic layer if our data governance is weak?

plus-icon minus-icon

Start small. Pick 3 – 5 critical metrics and define them properly in Unity Catalog Metrics. Assign clear ownership. Update them regularly. A small, maintained semantic layer beats a large, stale one. This becomes the foundation for agent accuracy and organizational discipline.


Who on our team needs to prepare for these changes?

plus-icon minus-icon

Data engineers need to plan for agent trace ingestion and PII detection pipelines. Architects should reconsider whether separate transactional/analytical systems still make sense. Data governance teams need to shift from compliance documentation to maintaining live semantic layers. Finance needs cost allocation models per agent and use case.




Category:


Data Engineering