Introducing ContextClue Graph Builder — an open-source toolkit that extracts knowledge graphs from PDFs, reports, and tabular data!

in Blog

September 01, 2025

Data Migration with AI: Technical Challenges and Lessons from Real-World Practice

Author:




Edwin Lisowski

CSO & Co-Founder


Reading time:




9 minutes


On paper, data migration sounds simple: move the data, switch on the new system, and unlock the benefits of cloud or advanced analytics. In practice, most executives know the reality: migrations drag on for months, cost more than planned, and sometimes lose data in the process.

Industry surveys keep repeating the same theme: migration projects fail or overrun at alarming rates.

AI looked like the breakthrough. Tools that map, clean, and validate data automatically promised faster, safer projects. But when companies bolted AI onto already messy processes, the results often backfired. Instead of clarity, they got automation no one could explain, pipelines with no clear owner, and audits that didn’t hold up.

If AI was supposed to solve these problems, why are enterprises still struggling?

How We Got Here & Why Data Migration Still Fails

Migration is the gateway to modernization. Cloud offers scale and lower cost. Analytics promises sharper insights. Mergers create efficiency. But none of it happens until the data is moved and usable.

The mechanics are straightforward: copy data from one system to another, test it, and cut over to the new environment while retiring the old. The complexity lies in scale.

For decades, migrations ran on hand-written scripts, legacy ETL tools, and endless mapping workshops. Accuracy depended on people catching mistakes. It was slow, brittle, and expensive.

When AI arrived, migration looked like the perfect use case. AI could scan schemas, suggest mappings, detect anomalies, even clean inconsistent records. And it worked, up to a point. It quickly turned out that without governance, AI “fixed” data in ways that broke compliance, produced mappings that failed in production, and created dashboards full of errors no one could explain.

The Challenges That Keep AI Data Migrations Failing

  • Too many tools. Profiling, mapping, and testing often live in separate applications. Fragmented tools and multiple interfaces mean context gets lost between teams. Effort gets duplicated, and errors creep in
  • No shared lineage. Without a shared record of changes, it’s impossible to trace who did what or why. Automation steps such as profiling, ETL, transformation, and validation often run in isolation, breaking traceability and making audits painful.
  • Opaque rules. Automated mappings and cleanups can produce results, but they are rarely explainable. When auditors ask for proof, teams often have little more than a log file. That undermines compliance and trust in the process.
  • Light testing. Without parity checks and alerts, issues slip into production. Blind automation can move or expose data in ways that no one has tracked, creating openings for leaks and regulatory breaches.
  • Pilots don’t scale. A script that works on a small dataset often collapses in production. Without shared configuration, monitoring, and rollback plans, migrations stay stuck in the lab and fail to grow into enterprise-ready systems.

The New Model: Orchestrated, Human-in-the-Loop AI Approach

Most data migrations fail not because AI isn’t powerful enough, but because the pieces don’t connect.

Enterprises are now learning the hard way that running processes in isolation creates blind spots, brittle handoffs, and high failure rates.

The solution? Treating migration as a composable, orchestrated system where every step flows into the next. This is what business orchestration does: connecting disparate tools under one roof.

Platforms like Matillion, Datafold DMA, or Hevo are gaining traction because they replace silos with pipelines that connect discovery, ETL, validation, and monitoring. Central control gives teams end-to-end visibility: trace lineage, explain AI decisions, and intervene in real time.

  • AI provides speed. Automated mapping, cleaning, and validation accelerate projects.
  • Orchestration provides control. Every step is connected, monitored, and auditable.
  • Humans provide judgment. Domain experts review, override, or approve decisions in sensitive domains like finance or healthcare.

This way AI stops being a risky bolt-on and becomes part of a governed, explainable, and enterprise-ready migration model. Monitoring is continuous, not occasional. Dashboards and policy engines flag anomalies as they appear, allowing teams to fix problems before they cascade.

Orchestrated Data Migration: Real-World Examples

Vodafone

Vodafone migrated its SAP estate to Google Cloud with SAP and Accenture. On go-live day, more than 300 SAP VMs were cut over across multiple environments, with zero business disruption. The key was orchestration: one control room, one timeline, one set of logs.

Moves that mattered

  • Treating the program as a single orchestrated cutover rather than a series of isolated moves.
  • Tight collaboration across vendors and the operator reduced handoffs and kept one plan of record.
  • Vodafone had clear success criteria around “business disruption,” not just technical cutover, which kept focus on outcomes that matter to the business.

What you can copy

  • Orchestrate the whole cutover from one control room. Keep one timeline, one run history, one set of logs.
  • Plan with all partners in the room. Joint planning across platform, application vendor, and SI reduces surprises.
  • Tie the ERP move to analytics goals. Proximity to your data platform speeds the payoff.

HSBC

HSBC built new risk tools on Google Cloud using BigQuery and Dataflow. They achieved a 10x speedup on risk models while maintaining governance and explainability. Their approach paired scale with oversight, showing how orchestration and human review can coexist.

Moves that mattered

  • Building on a single platform with managed services helped centralize logs, lineage, and access, which supports audit.
  • Pairing new models with human oversight aligns with current regulatory expectations in high-risk domains like financial crime and model risk. HSBC’s Dynamic Risk Assessment work with Google Cloud’s AML AI is a public example of that pattern.
  • Focusing on governance and explainability in parallel with speed.
What you can copy
  • Put compute, storage, and warehouse on one platform to cut handoffs and speed governance work.
  • Treat validation and explainability as first-class features, not add-ons. Use managed services that expose logs and metrics your auditors will ask for.
  • Design for scale from day one. Elastic dataflow and warehouse services let you move from pilot to full load without rewriting the system.

AI Data Migration Best Practices: What Enterprises Should Take Away

  • Composable, Orchestrated Pipelines

Key idea: Unify all AI-automated steps within a single, explainable workflow.

How to: Run discovery, profiling, ELT, validation, and sign-off as one pipeline. Keep one run history and one set of logs in a centralized monitoring system like ELK or Datadog.

  • User-Centric Interfaces & Collaboration

Key idea:  One interface for both business and technical users.

How to: Select a platform with role-based dashboards, integrate with enterprise ticketing tools, and add in-line data previews. Enable commenting, versioning, and approvals directly inside the workflow tool.

  • Interoperability & Extensibility

Key idea: All tools and agents must “talk to each other”.

How to: Let tools exchange data and metadata. Pass lineage and configs across steps so context never gets lost. Build an integration catalog documenting which systems can interoperate and how.

  • Continuous monitoring and quality enforcement

Key idea: Quality gates should be part of the pipeline, not a separate task.

How to: Track row counts, nulls, distribution shifts, and key business metrics on every run. Use data diffs to prove parity before the switch. Fail closed on red checks and alert the owner.

  • Human-in-the-loop governance

Key idea: Use AI for speed; require human sign-off for critical, sensitive, or unusual logic.

How to: Route changes that touch money, health, or identity to named owners. They can approve, edit, or reject. Keep a record of every decision.

Key tools for AI-driven data migration

Category Tool What it does Notable AI/controls Best for
Orchestration & ELT Matillion Data Productivity Cloud Unified ELT with AI features and lineage AI assistant, pipeline building with control Cloud warehouses
Data management suite Informatica IDMC End-to-end integration, governance, quality CLAIRE copilot, observability, lineage Large enterprises, compliance
DB migration (CDC) AWS DMS Full load + CDC with low downtime Built-in CDC and auditability via logs Heterogeneous DB moves
DB migration (Azure) Azure Database Migration Service Online migrations to Azure Near-zero downtime patterns SQL Server and other DBs to Azure
DB migration (GCP) Google Cloud Database Migration Service MySQL/Postgres to Cloud SQL Zero-downtime patterns GCP estates
Ingestion & CDC (OSS) Debezium Open-source CDC connectors Row-level change capture Real-time sync, cutovers
Streaming integration Kafka Connect Scalable source/sink connectors Centralized configs, offsets High-volume pipelines
Transform & docs dbt SQL-based transforms with lineage/docs Auto-generated docs and catalog Transparent business logic
Data quality (OSS + SaaS) Great Expectations (GX) Declarative data tests Expectations, GX Cloud Regression safety nets
Data observability Soda Metrics-level anomaly detection Adaptive AI alerts Detect breaks fast
Data observability Monte Carlo End-to-end monitors and lineage AI-powered anomaly detection Enterprise observability
Validation & parity Datafold Cross-DB data diff and lineage Automated parity reports Fast, auditable cutovers
Lineage standard OpenLineage/Marquez Open lineage spec + reference app Standardized events Traceability across tools

Addepto’s Tips for Successful AI Data Migration

  • Make AI explainable. Treat AI suggestions as change requests. Show the rule, the code, and the data it touched so reviewers can approve or edit.
  • Version everything. Store jobs, configs, and policies in Git. Tag each cutover. You get clear rollbacks and an audit trail.
  • Plan the cutover. Use full load plus CDC to keep downtime short. Dry-run the switch. Have a rollback ready.
  • Set approval gates. Mark sensitive tables. Add human sign-off.
  • Prove parity. Run data diffs and business totals before cutover. Treat failures as stop signs.
  • Plan rollback. Backups and replays. A short runbook for on-call.

Closing Thoughts

The real question for enterprises isn’t “should we use AI for migration?”, it’s “how do we design a model where AI, orchestration, and human oversight work together?”.

AI on its own makes migrations faster, but without orchestration and oversight, it creates as many risks as it solves. The future every enterprise needs is one where migrations run smoothly, stand up to audits, and deliver clean, usable data from day one.

That’s what we build at Addepto. If you want to modernize without the overruns and failures that plague most migrations, let’s talk.

Our mission is to deliver AI strategies that align with business logic, regulatory expectations, and long-term growth.



Category:


Data Engineering