in Blog

September 01, 2025

Data Migration with AI: Technical Challenges and Lessons from Real-World Practice

Author:

Edwin Lisowski

CSO & Co-Founder

Reading time:

9 minutes

On paper, data migration sounds simple: move the data, switch on the new system, and unlock the benefits of cloud or advanced analytics. In practice, most executives know the reality: migrations drag on for months, cost more than planned, and sometimes lose data in the process.

Industry surveys keep repeating the same theme: migration projects fail or overrun at alarming rates.

AI looked like the breakthrough. Tools that map, clean, and validate data automatically promised faster, safer projects. But when companies bolted AI onto already messy processes, the results often backfired. Instead of clarity, they got automation no one could explain, pipelines with no clear owner, and audits that didn’t hold up.

If AI was supposed to solve these problems, why are enterprises still struggling?

How We Got Here & Why Data Migration Still Fails

Migration is the gateway to modernization. Cloud offers scale and lower cost. Analytics promises sharper insights. Mergers create efficiency. But none of it happens until the data is moved and usable.

The mechanics are straightforward: copy data from one system to another, test it, and cut over to the new environment while retiring the old. The complexity lies in scale.

For decades, migrations ran on hand-written scripts, legacy ETL tools, and endless mapping workshops. Accuracy depended on people catching mistakes. It was slow, brittle, and expensive.

When AI arrived, migration looked like the perfect use case. AI could scan schemas, suggest mappings, detect anomalies, even clean inconsistent records. And it worked, up to a point. It quickly turned out that without governance, AI “fixed” data in ways that broke compliance, produced mappings that failed in production, and created dashboards full of errors no one could explain.

The Challenges That Keep AI Data Migrations Failing

Too many tools. Profiling, mapping, and testing often live in separate applications. Fragmented tools and multiple interfaces mean context gets lost between teams. Effort gets duplicated, and errors creep in
No shared lineage. Without a shared record of changes, it’s impossible to trace who did what or why. Automation steps such as profiling, ETL, transformation, and validation often run in isolation, breaking traceability and making audits painful.
Opaque rules. Automated mappings and cleanups can produce results, but they are rarely explainable. When auditors ask for proof, teams often have little more than a log file. That undermines compliance and trust in the process.
Light testing. Without parity checks and alerts, issues slip into production. Blind automation can move or expose data in ways that no one has tracked, creating openings for leaks and regulatory breaches.
Pilots don’t scale. A script that works on a small dataset often collapses in production. Without shared configuration, monitoring, and rollback plans, migrations stay stuck in the lab and fail to grow into enterprise-ready systems.

The New Model: Orchestrated, Human-in-the-Loop AI Approach

Most data migrations fail not because AI isn’t powerful enough, but because the pieces don’t connect.

Enterprises are now learning the hard way that running processes in isolation creates blind spots, brittle handoffs, and high failure rates.

The solution? Treating migration as a composable, orchestrated system where every step flows into the next. This is what business orchestration does: connecting disparate tools under one roof.

Platforms like Matillion, Datafold DMA, or Hevo are gaining traction because they replace silos with pipelines that connect discovery, ETL, validation, and monitoring. Central control gives teams end-to-end visibility: trace lineage, explain AI decisions, and intervene in real time.

AI provides speed. Automated mapping, cleaning, and validation accelerate projects.
Orchestration provides control. Every step is connected, monitored, and auditable.
Humans provide judgment. Domain experts review, override, or approve decisions in sensitive domains like finance or healthcare.

This way AI stops being a risky bolt-on and becomes part of a governed, explainable, and enterprise-ready migration model. Monitoring is continuous, not occasional. Dashboards and policy engines flag anomalies as they appear, allowing teams to fix problems before they cascade.

Orchestrated Data Migration: Real-World Examples

Vodafone

Vodafone migrated its SAP estate to Google Cloud with SAP and Accenture. On go-live day, more than 300 SAP VMs were cut over across multiple environments, with zero business disruption. The key was orchestration: one control room, one timeline, one set of logs.

Moves that mattered

Treating the program as a single orchestrated cutover rather than a series of isolated moves.
Tight collaboration across vendors and the operator reduced handoffs and kept one plan of record.
Vodafone had clear success criteria around “business disruption,” not just technical cutover, which kept focus on outcomes that matter to the business.

What you can copy

Orchestrate the whole cutover from one control room. Keep one timeline, one run history, one set of logs.
Plan with all partners in the room. Joint planning across platform, application vendor, and SI reduces surprises.
Tie the ERP move to analytics goals. Proximity to your data platform speeds the payoff.

HSBC

HSBC built new risk tools on Google Cloud using BigQuery and Dataflow. They achieved a 10x speedup on risk models while maintaining governance and explainability. Their approach paired scale with oversight, showing how orchestration and human review can coexist.

Moves that mattered

Building on a single platform with managed services helped centralize logs, lineage, and access, which supports audit.
Pairing new models with human oversight aligns with current regulatory expectations in high-risk domains like financial crime and model risk. HSBC’s Dynamic Risk Assessment work with Google Cloud’s AML AI is a public example of that pattern.
Focusing on governance and explainability in parallel with speed.

What you can copy

Put compute, storage, and warehouse on one platform to cut handoffs and speed governance work.
Treat validation and explainability as first-class features, not add-ons. Use managed services that expose logs and metrics your auditors will ask for.
Design for scale from day one. Elastic dataflow and warehouse services let you move from pilot to full load without rewriting the system.

AI Data Migration Best Practices: What Enterprises Should Take Away

Composable, Orchestrated Pipelines

Key idea: Unify all AI-automated steps within a single, explainable workflow.

How to: Run discovery, profiling, ELT, validation, and sign-off as one pipeline. Keep one run history and one set of logs in a centralized monitoring system like ELK or Datadog.

User-Centric Interfaces & Collaboration

Key idea: One interface for both business and technical users.

How to: Select a platform with role-based dashboards, integrate with enterprise ticketing tools, and add in-line data previews. Enable commenting, versioning, and approvals directly inside the workflow tool.

Interoperability & Extensibility

Key idea: All tools and agents must “talk to each other”.

How to: Let tools exchange data and metadata. Pass lineage and configs across steps so context never gets lost. Build an integration catalog documenting which systems can interoperate and how.

Continuous monitoring and quality enforcement

Key idea: Quality gates should be part of the pipeline, not a separate task.

How to: Track row counts, nulls, distribution shifts, and key business metrics on every run. Use data diffs to prove parity before the switch. Fail closed on red checks and alert the owner.

Human-in-the-loop governance

Key idea: Use AI for speed; require human sign-off for critical, sensitive, or unusual logic.

How to: Route changes that touch money, health, or identity to named owners. They can approve, edit, or reject. Keep a record of every decision.

Key tools for AI-driven data migration

Category	Tool	What it does	Notable AI/controls	Best for
Orchestration & ELT	Matillion Data Productivity Cloud	Unified ELT with AI features and lineage	AI assistant, pipeline building with control	Cloud warehouses
Data management suite	Informatica IDMC	End-to-end integration, governance, quality	CLAIRE copilot, observability, lineage	Large enterprises, compliance
DB migration (CDC)	AWS DMS	Full load + CDC with low downtime	Built-in CDC and auditability via logs	Heterogeneous DB moves
DB migration (Azure)	Azure Database Migration Service	Online migrations to Azure	Near-zero downtime patterns	SQL Server and other DBs to Azure
DB migration (GCP)	Google Cloud Database Migration Service	MySQL/Postgres to Cloud SQL	Zero-downtime patterns	GCP estates
Ingestion & CDC (OSS)	Debezium	Open-source CDC connectors	Row-level change capture	Real-time sync, cutovers
Streaming integration	Kafka Connect	Scalable source/sink connectors	Centralized configs, offsets	High-volume pipelines
Transform & docs	dbt	SQL-based transforms with lineage/docs	Auto-generated docs and catalog	Transparent business logic
Data quality (OSS + SaaS)	Great Expectations (GX)	Declarative data tests	Expectations, GX Cloud	Regression safety nets
Data observability	Soda	Metrics-level anomaly detection	Adaptive AI alerts	Detect breaks fast
Data observability	Monte Carlo	End-to-end monitors and lineage	AI-powered anomaly detection	Enterprise observability
Validation & parity	Datafold	Cross-DB data diff and lineage	Automated parity reports	Fast, auditable cutovers
Lineage standard	OpenLineage/Marquez	Open lineage spec + reference app	Standardized events	Traceability across tools

Addepto’s Tips for Successful AI Data Migration

Make AI explainable. Treat AI suggestions as change requests. Show the rule, the code, and the data it touched so reviewers can approve or edit.
Version everything. Store jobs, configs, and policies in Git. Tag each cutover. You get clear rollbacks and an audit trail.
Plan the cutover. Use full load plus CDC to keep downtime short. Dry-run the switch. Have a rollback ready.
Set approval gates. Mark sensitive tables. Add human sign-off.
Prove parity. Run data diffs and business totals before cutover. Treat failures as stop signs.
Plan rollback. Backups and replays. A short runbook for on-call.

Closing Thoughts

The real question for enterprises isn’t “should we use AI for migration?”, it’s “how do we design a model where AI, orchestration, and human oversight work together?”.

AI on its own makes migrations faster, but without orchestration and oversight, it creates as many risks as it solves. The future every enterprise needs is one where migrations run smoothly, stand up to audits, and deliver clean, usable data from day one.

That’s what we build at Addepto. If you want to modernize without the overruns and failures that plague most migrations, let’s talk.

Our mission is to deliver AI strategies that align with business logic, regulatory expectations, and long-term growth.

Category:

Data Engineering

Share this article: