Addepto is now part of KMS Technology – read full press release!



Intermodal Transportation Data Platform: Unifying Data for Travel Operations

A leading multinational IT provider serving the air transport industry partnered with the European Union to explore new market opportunities beyond traditional aviation, recognizing that passenger journeys don’t end at the airport gate. Working with the Client, we developed an Intermodal Data Platform that unifies aviation, maritime, and rail data into a single system.

Built on Databricks as a scalable data mesh architecture, the platform enables proactive disruption management across entire multi-modal journeys while establishing the infrastructure for future AI and machine learning applications.

The platform now powers real-time operational decisions for transportation operators in Athens, with the technical foundation ready to expand across European cities and support increasingly sophisticated intelligent capabilities.



Meet Our Client


Our client is a multinational information technology company providing comprehensive IT and telecommunications services to the air transport industry. Serving airlines, airports, ground handlers, and governments worldwide, they deliver solutions for passenger processing, baggage handling, aircraft operations, and more.


Case Study Shortcut


Challenge


icon

Fragmented Data Across Transportation Modes


Aviation, maritime, and rail data existed in completely separate systems with no integration. Airports managed their terminals effectively but had zero visibility into external factors – train delays, port congestion, strikes blocking access routes. Connected journeys combining multiple modes (flight + train, flight + cruise) were invisible to operators, despite being increasingly common. Each transportation provider operated in a data silo.

icon

Reactive Instead of Proactive Disruption Management


Operators learned about disruptions only after passengers had already missed connections. A delayed flight meant a missed cruise departure, but port operators had no advance warning to coordinate. Without cross-modal communication, each operator worked in isolation, unable to prevent cascading disruptions or take preventive action like deploying additional shuttles or opening extra gates.

icon

No Foundation for Scalable Intelligence


Each transportation mode used different data standards, update frequencies, and even conflicting definitions of basic concepts like “delay” or “cancellation.” There was no existing infrastructure capable of unifying this heterogeneous data landscape, maintaining quality at scale, expanding to multiple cities, or supporting future AI/ML capabilities. A fundamentally new architectural approach was required.

Goal


The primary objective was to build a scalable data engineering foundation that unifies incompatible transportation data ecosystems while supporting both immediate operational needs and future AI/ML applications. The platform needed to process real-time streaming and batch data simultaneously, normalize diverse schemas while preserving source integrity, and deliver sub-minute disruption detection across transportation modes.


  • Data mesh implementation on Databricks: Design a federated architecture ingesting 10+ disparate sources (FIDS, AMS via CDC, AIS maritime tracking, GTFS rail feeds, weather APIs, event data) while maintaining domain autonomy and enabling cross-domain analytics

  • Medallion architecture with SCD2 historization: Implement Bronze/Silver/Gold layers that preserve complete lineage, track temporal changes through Slowly Changing Dimensions Type 2, and support both real-time queries and historical analytics

  • Near real-time intelligence layers: Build specialized detection engines for delays, cancellations, and diversions that process streaming data, apply mode-specific business rules, and generate alerts with sub-5-minute latency

  • API-ready data synchronization: Establish automated pipelines from Gold tables to Cosmos DB, exposing normalized datasets through REST APIs while separating analytical and operational workloads

  • Horizontal and vertical scalability: Create standardized data contracts and modular ingestion patterns for rapid city onboarding, while establishing infrastructure for future ML models, predictive analytics, and LLM-powered data interaction

  • Data quality automation: Implement monitoring, validation, and alerting frameworks that handle schema drift, missing data, and inconsistent external feeds, ensuring operator trust in system insights

Outcome


The platform now processes streaming and batch data from over 10 distinct sources, providing Athens operators with unprecedented visibility into connected travel patterns. The medallion architecture ensures data quality and traceability, while the intelligence layers actively monitor for disruptions across all three transportation modes.

More importantly, the platform is built to grow in two directions. Horizontally, it can easily expand to new cities and additional data sources. Vertically, it supports increasingly advanced AI/ML capabilities and predictive models. This creates a compounding effect: the more data we collect, the more use cases become possible—from predictive analytics to automated decision support.

What started as an operational tool is designed to evolve into a comprehensive intelligence platform for European transportation networks.



Before


  • Data siloed across separate aviation, maritime, and rail systems with no integration
  • No visibility into connected journeys or passenger connection risks
  • Reactive disruption management after incidents already impacted passengers
  • Manual data analysis across disconnected systems
  • Limited to single-mode operational insights
  • No external event awareness (strikes, protests, weather) affecting airport access
  • Platform limited to single location with no expansion framework


After


  • Unified data mesh platform consolidating all transportation modes in Databricks
  • Real-time monitoring of multi-modal journeys with risk identification for at-risk connections
  • Proactive delay, cancellation, and diversion detection enabling preventive action
  • Automated intelligence layers with near real-time alerts and normalized data
  • Cross-modal coordination capabilities supporting operators at airports, ports, and rail stations
  • Integrated external data sources providing comprehensive situational awareness
  • Scalable architecture designed for easy replication across European cities

Integrate those solutions in your company


Contact below and let us design and integrate solutions tailored to your business needs


Let's talk

Case Study Details


Approach


Data Mesh Architecture for Diverse Transportation Sources


  • Transportation data originates from fundamentally different systems with incompatible formats and update frequencies. The platform accepts this reality through a data mesh architecture that ingests diverse inputs - real-time maritime tracking, batch rail schedules, CDC streams from airport systems - and progressively refines them through layered processing. This flexible approach enabled rapid onboarding of new sources without requiring platform restructuring.

Medallion Architecture: Bronze, Silver, Gold


  • Three distinct data layers form the platform's core structure on Databricks. Bronze captures raw data exactly as received, preserving complete lineage for reprocessing. Silver normalizes and cleanses this data, applying SCD2 historization to track schedule and status changes over time. Gold delivers business-ready aggregations and intelligence layers that power operator dashboards. This layered approach balances data quality with downstream flexibility.

Specialized Intelligence Layers for Disruption Detection


  • Three purpose-built detection engines analyze disruptions in real-time. The Delay Intelligence Layer monitors schedule deviations across all modes and flags at-risk connections. The Cancellation Intelligence Layer unifies cancellation data from disparate sources into a single, normalized view. The Diversions Intelligence Layer tracks unexpected route changes in aviation and maritime operations. These engines convert raw operational data into immediate, actionable operator alerts.

Databricks as the Unified Development Platform


  • The entire development lifecycle — ingestion, transformation, testing, and access management — runs within a single Databricks environment, with curated Gold-layer data transferred to Cosmos DB (exposed via APIs for frontend dashboards and visualizations), significantly accelerating development compared to fragmented toolchains while also enabling future AI/ML use cases such as natural language data interaction and predictive modeling.

API Layer for Operator Applications


  • Gold layer tables synchronize to Cosmos DB and expose data through a .NET API, powering the Athens operator dashboard. This architecture separates compute-intensive analytical processing (Databricks) from high-frequency operational queries (Cosmos DB), optimizing performance for both workload types.

Designed for Dual-Direction Growth


  • Two expansion paths drive platform evolution. Horizontal growth incorporates new cities and data sources through standardized contracts and modular ingestion patterns. Vertical growth layers advanced analytics, machine learning predictions, and autonomous decision support onto the existing foundation. Both directions leverage the same core architecture, ensuring genuine scalability.

Automated Data Quality Management


  • External feeds introduced significant challenges: missing data, format inconsistencies, and unexpected schema changes. Automated monitoring with real-time alerting catches quality issues immediately. For particularly volatile sources like the Port Authority's Excel-based schedules, resilient parsing logic with fallback strategies maintains data flow despite format changes. This quality framework ensures operators can confidently act on system insights.

Technology


Databricks

Databricks

Cosmos DB

Cosmos DB

Our team


Vadym Mariiechko

Vadym Mariiechko

Data Engineer

Bartosz Obstawski

Bartosz Obstawski

Data Engineer

Madgalena Bogdał

Madgalena Bogdał

Project Manager



Our Team Expert Opinion




Take the next step


Schedule an intro call to get know each other better and understand the way we work


Let's talk

About Addepto


About us


We are recognized as one of the best AI, BI, and Big Data consultants


We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Our customers love to work with us

Let's discuss
a solution
for you



Edwin Lisowski

will help you estimate
your project.













Required fields

For more information about how we process your personal data see our Privacy Policy





Message sent successfully!