Cutting Data Costs by 70%: Optimising Databricks for a European Fashion Retailer

Home » Cutting Data Costs by 70%: Optimising Databricks for a European Fashion Retailer

When a European fashion retailer’s data-related costs spiralled beyond what the scale of the business justified, the problem was clear: the data ingestion and transformation workflows were not optimised, and nobody had the knowledge nor the capacity to fix them. Our team was engaged to optimise costs, stabilise the platform, and – as a further step – design a scalable, future-ready data architecture aligned with cross-functional business needs.

Meet Our Client

The Client is a mid-sized European fashion retailer with several decades of operating history. Despite its scale, the company retains a distinctly family-owned culture, with a loyal, long-tenured workforce. Within the organisation, the Business Intelligence department serves as the central hub for all data-related operations — collecting raw data from online stores, physical outlets, and social media channels, then cleaning, consolidating, and enriching it before delivering it to the teams that need it, from merchandising to the management board. The BI team built and maintained the underlying data platform in-house over the years, and it is this platform that the current engagement is migrating toward Databricks.

Case Study Shortcut

Challenge

A platform built on concentrated, undocumented knowledge

The data platform had been designed and maintained in-house, with no formal documentation or standardized processes in place. As a result, knowledge of how critical pipelines operated remained concentrated within a small group of individuals. The business became fully dependent on those who built the system to keep essential data flows running — creating operational fragility, increasing risk over time, and ultimately becoming a bottleneck to scaling and further development.

Infrastructure costs growing without governance or visibility

The infrastructure setup on Databricks was not really suited for the ETL workloads running on the platform as the configuration of the compute clusters available for data processing far exceeded the required resources. On top of that, due to the irregular usage patterns by the stakeholders, some of the clusters were permanently available despite being heavily under-utilised and spending majority of the time in idle state. Both of those factors resulted in an infrastructure bill disproportionate to the size of the business — with no tooling to understand what was driving it.

A hybrid architecture built on an unsupported technology

One of the main pain points in the whole ETL process was a legacy orchestration & processing framework; the technology was chosen by the Client some time ago but has since reached the end of life and received no further development since. This presented a number of challenges in and of itself, as it limited the adoption of newer Databricks capabilities and certain compute configurations on the platform and there were no prospects of expanding the scope of services the framework is compatible with in the ever changing stack of data-related technologies.

No transparency into data processes or their business value

The organisation had no visibility into how data moved through its systems — which pipelines were actively used, how frequently they were executed, and which data products were generating real business value. Without this level of transparency, the BI function operated largely as a black box, making it difficult to assess the impact of individual data processes. As a result, it was impossible to make informed decisions about where to invest, where to cut costs, and where improvements would deliver the most meaningful commercial impact

Goal

The challenges identified had clear business consequences — rising costs, operational fragility, and a lack of visibility that made informed decision-making impossible. Each goal below maps directly to one of those consequences, and the sequence is deliberate: earlier phases create the conditions — financial and technical — for the ones that follow.

Immediate cost reduction through compute optimisation: The first priority was to bring infrastructure spend under control by addressing misconfigured compute tiers, redundant always-on clusters, and unnecessarily enabled acceleration features. Achieving this quickly was strategically essential — the savings generated in this phase directly funded the expansion of the engineering team and created the runway for everything that follows.

Full migration off legacy architecture onto Databricks: With costs stabilised, the focus shifts to eliminating the hybrid Databricks and custom platform setup entirely. Consolidating onto a single, fully supported platform removes the operational complexity of maintaining two environments, unlocks modern Databricks capabilities currently out of reach, and resolves the setup constraints tied to the custom orchestration solution.

Data lineage and pipeline transparency: Once the ETL process and the costs it generated is under control, the next step is to provide some transparency into the end-to-end data flow from source systems all the way up to business outputs. Mapping these dependencies allows the organisation to understand which data products are actively used, which are critical to business operations, and where investment in data quality would deliver the greatest return.

Cost attribution and ROI-driven governance: Alongside lineage, the engagement will introduce per-domain cost attribution — breaking down infrastructure spend by business area rather than absorbing it into a single BI budget. This gives stakeholders the information they need to evaluate the commercial value of each data process: identifying pipelines worth investing in further, and making the case for discontinuing those that generate no meaningful return.

Disaster recovery and controlled development practices: The final layer is operational resilience — formal data backups, replication, and structured access controls on production environments. This ensures the platform can recover from failure and that changes to live systems follow a governed, auditable process rather than ad hoc intervention.

Outcome

The first phase of the engagement focused entirely on financial stabilisation — deliberately so. By addressing the most acute cost drivers quickly, we created budget headroom that allowed the Client to expand the project team with a second dedicated engineer. That expansion is already in place. The migration and governance work now underway would not have been possible without it.

Before

Split between Databricks and unsupported custom solution; no migration plan
Under-utilised compute clusters, set up with “max resources, always on” strategy in mind
€230,000 forecasted for 2026
No lineage, no cost attribution, no disaster recovery, no production access controls

After

Migration actively underway; full Databricks consolidation in progress
SLA-bound cluster pools aligned with business demands and properly scaled to ETL workloads
~€70,000 — approximately 70% reduction
Governance framework scoped and sequenced; implementation begins Q3

Integrate those solutions in your company

Contact below and let us design and integrate solutions tailored to your business needs

Let's talk

Case Study Details

Approach

Concentrated knowledge is a business risk

A platform built on custom processes and internal expertise will function effectively — until the conditions that created it change. When the business needs to scale, integrate new capabilities, or hand the system over, the absence of documented processes and standardised architecture turns a working solution into a constraint.

Databricks is not plug-and-play

Databricks is one of the most powerful data platforms available — but it is a developer platform, not a SaaS tool. Every configuration decision has a cost implication: compute tiers, cluster setup, job scheduling, feature activation. Without the right expertise, the same platform that can deliver outstanding performance can just as easily generate bills that have no relation to the value being produced.

We cut costs first — deliberately

Without early savings, there would have been no budget to grow the team and no runway for the harder work ahead. Cost reduction was the entry point that made everything else possible.

Two changes drove most of the early savings

Photon — Databricks' Spark acceleration engine — was enabled on every single cluster, essentially doubling the compute costs for workloads that did not always need it. On top of that, some of the compute clusters were running permanently to serve queries that arrived a few times a day. Disabling Photon where unnecessary and consolidating clusters to match actual demand removed the bulk of wasted spend immediately.

Custom orchestration software is blocking further progress

It is unsupported, cannot integrate with modern Databricks features, and forces dependent workloads onto expensive compute configurations. Migration of custom solution is the only way forward — and that is exactly what the current phase of the engagement is delivering.

Cost visibility is how the business takes back control

When all infrastructure costs are absorbed into a single BI budget with no attribution, there is no basis for evaluating whether individual data processes are worth what they cost. Per-domain cost visibility changes that — making it possible to identify which pipelines justify further investment and which consume resources without generating a return.

Technology

Databricks

Apache Spark

Microsoft Azure

Power BI

Our team

Michał Żak

Senior Data Engineer

Bartosz Adamiec

Data Engineer

Our Team Expert Opinion

Databricks gives you enormous capability, but it asks something in return: you need to know what you are doing, and you need to know why. The platform the Client had was genuinely impressive in what it had achieved — a real testament to the engineers who built it. What it needed was a layer of cost discipline and structural clarity that is very hard to develop while trying to maintain your in-house legacy solution and day-to-day operations. Our job is to bring that layer in, without losing what was already working, and to make sure every technical decision we take has a clear answer to the question: what does this do for the business?

Michał Żak Senior Data Scientist at Addepto – Addepto

Take the next step

Schedule an intro call to get know each other better and understand the way we work

Let's talk

Cutting Data Costs by 70%: Optimising Databricks for a European Fashion Retailer

Meet Our Client

Case Study Shortcut

Challenge

A platform built on concentrated, undocumented knowledge

Infrastructure costs growing without governance or visibility

A hybrid architecture built on an unsupported technology

No transparency into data processes or their business value

Goal

Outcome

Before

After

Integrate those solutions in your company

Case Study Details

Approach

Concentrated knowledge is a business risk

Databricks is not plug-and-play

We cut costs first — deliberately

Two changes drove most of the early savings

Custom orchestration software is blocking further progress

Cost visibility is how the business takes back control

Technology

Our team

Our Team Expert Opinion

Take the next step

About Addepto

We are recognized as one of the best AI, BI, and Big Data consultants

We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Let's discuss
a solution
for you

Meet Our Client

Case Study Shortcut

Challenge

A platform built on concentrated, undocumented knowledge

Infrastructure costs growing without governance or visibility

A hybrid architecture built on an unsupported technology

No transparency into data processes or their business value

Goal

Outcome

Before

After

Integrate those solutions in your company

Case Study Details

Approach

Concentrated knowledge is a business risk

Databricks is not plug-and-play

We cut costs first — deliberately

Two changes drove most of the early savings

Custom orchestration software is blocking further progress

Cost visibility is how the business takes back control

Technology

Our team

Our Team Expert Opinion

Take the next step

About Addepto

We are recognized as one of the best AI, BI, and Big Data consultants

We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Let's discuss a solution for you

Other case studies

Intermodal Transportation Data Platform: Unifying Data for Travel Operations

Implementing an MLOps Platform for Seamless Transition from Concept to Deployment

Transform Engineering Chaos into Strategic Clarity

Let's discuss
a solution
for you