Addepto is now part of KMS Technology – read full press release!



Cutting Data Costs by 70%: Optimising Databricks for a European Fashion Retailer

When a European fashion retailer’s data-related costs spiralled beyond what the scale of the business justified, the problem was clear: the data ingestion and transformation workflows were not optimised, and nobody had the knowledge nor the capacity to fix them. Our team was engaged to optimise costs, stabilise the platform, and – as a further step – design a scalable, future-ready data architecture aligned with cross-functional business needs.



Meet Our Client


The Client is a mid-sized European fashion retailer with several decades of operating history. Despite its scale, the company retains a distinctly family-owned culture, with a loyal, long-tenured workforce. Within the organisation, the Business Intelligence department serves as the central hub for all data-related operations — collecting raw data from online stores, physical outlets, and social media channels, then cleaning, consolidating, and enriching it before delivering it to the teams that need it, from merchandising to the management board. The BI team built and maintained the underlying data platform in-house over the years, and it is this platform that the current engagement is migrating toward Databricks.


Case Study Shortcut


Challenge


icon

A platform built on concentrated, undocumented knowledge


The data platform had been designed and maintained in-house, with no formal documentation or standardized processes in place. As a result, knowledge of how critical pipelines operated remained concentrated within a small group of individuals. The business became fully dependent on those who built the system to keep essential data flows running — creating operational fragility, increasing risk over time, and ultimately becoming a bottleneck to scaling and further development.

icon

Infrastructure costs growing without governance or visibility


The infrastructure setup on Databricks was not really suited for the ETL workloads running on the platform as the configuration of the compute clusters available for data processing far exceeded the required resources. On top of that, due to the irregular usage patterns by the stakeholders, some of the clusters were permanently available despite being heavily under-utilised and spending majority of the time in idle state. Both of those factors resulted in    an infrastructure bill disproportionate to the size of the business — with no tooling to understand what was driving it.

icon

A hybrid architecture built on an unsupported technology


One of the main pain points in the whole ETL process was a legacy orchestration & processing framework; the technology was chosen by the Client some time ago but has since reached the end of life and received no further development since. This presented a number of challenges in and of itself, as it limited the adoption of newer Databricks capabilities and certain compute configurations on the platform and there were no prospects of expanding the scope of services the framework is compatible with in the ever changing stack of data-related technologies.

icon

No transparency into data processes or their business value


The organisation had no visibility into how data moved through its systems — which pipelines were actively used, how frequently they were executed, and which data products were generating real business value. Without this level of transparency, the BI function operated largely as a black box, making it difficult to assess the impact of individual data processes. As a result, it was impossible to make informed decisions about where to invest, where to cut costs, and where improvements would deliver the most meaningful commercial impact

Goal


The challenges identified had clear business consequences — rising costs, operational fragility, and a lack of visibility that made informed decision-making impossible. Each goal below maps directly to one of those consequences, and the sequence is deliberate: earlier phases create the conditions — financial and technical — for the ones that follow.


  • Immediate cost reduction through compute optimisation: The first priority was to bring infrastructure spend under control by addressing misconfigured compute tiers, redundant always-on clusters, and unnecessarily enabled acceleration features. Achieving this quickly was strategically essential — the savings generated in this phase directly funded the expansion of the engineering team and created the runway for everything that follows.

  • Full migration off legacy architecture onto Databricks: With costs stabilised, the focus shifts to eliminating the hybrid Databricks and custom platform setup entirely. Consolidating onto a single, fully supported platform removes the operational complexity of maintaining two environments, unlocks modern Databricks capabilities currently out of reach, and resolves the setup constraints tied to the custom orchestration solution.

  • Data lineage and pipeline transparency: Once the ETL process and the costs it generated is under control, the next step is to provide some transparency into the end-to-end data flow from source systems all the way up to business outputs. Mapping these dependencies allows the organisation to understand which data products are actively used, which are critical to business operations, and where investment in data quality would deliver the greatest return.

  • Cost attribution and ROI-driven governance: Alongside lineage, the engagement will introduce per-domain cost attribution — breaking down infrastructure spend by business area rather than absorbing it into a single BI budget. This gives stakeholders the information they need to evaluate the commercial value of each data process: identifying pipelines worth investing in further, and making the case for discontinuing those that generate no meaningful return.

  • Disaster recovery and controlled development practices: The final layer is operational resilience — formal data backups, replication, and structured access controls on production environments. This ensures the platform can recover from failure and that changes to live systems follow a governed, auditable process rather than ad hoc intervention.

Outcome


The first phase of the engagement focused entirely on financial stabilisation — deliberately so. By addressing the most acute cost drivers quickly, we created budget headroom that allowed the Client to expand the project team with a second dedicated engineer. That expansion is already in place. The migration and governance work now underway would not have been possible without it.



Before


  • Split between Databricks and unsupported custom solution; no migration plan
  • Under-utilised compute clusters, set up with “max resources, always on” strategy in mind
  • €230,000 forecasted for 2026
  • No lineage, no cost attribution, no disaster recovery, no production access controls

 



After


  • Migration actively underway; full Databricks consolidation in progress
  • SLA-bound cluster pools aligned with business demands and properly scaled to ETL workloads
  • ~€70,000 — approximately 70% reduction
  • Governance framework scoped and sequenced; implementation begins Q3

 


Integrate those solutions in your company


Contact below and let us design and integrate solutions tailored to your business needs


Let's talk

Case Study Details


Approach


Concentrated knowledge is a business risk


  • A platform built on custom processes and internal expertise will function effectively — until the conditions that created it change. When the business needs to scale, integrate new capabilities, or hand the system over, the absence of documented processes and standardised architecture turns a working solution into a constraint.

Databricks is not plug-and-play


  • Databricks is one of the most powerful data platforms available — but it is a developer platform, not a SaaS tool. Every configuration decision has a cost implication: compute tiers, cluster setup, job scheduling, feature activation. Without the right expertise, the same platform that can deliver outstanding performance can just as easily generate bills that have no relation to the value being produced.

We cut costs first — deliberately


  • Without early savings, there would have been no budget to grow the team and no runway for the harder work ahead. Cost reduction was the entry point that made everything else possible.

Two changes drove most of the early savings


  • Photon — Databricks' Spark acceleration engine — was enabled on every single cluster, essentially doubling the compute costs for workloads that did not always need it. On top of that, some of the compute clusters were running permanently to serve queries that arrived a few times a day. Disabling Photon where unnecessary and consolidating clusters to match actual demand removed the bulk of wasted spend immediately.

Custom orchestration software is blocking further progress


  • It is unsupported, cannot integrate with modern Databricks features, and forces dependent workloads onto expensive compute configurations. Migration of custom solution is the only way forward — and that is exactly what the current phase of the engagement is delivering.

Cost visibility is how the business takes back control


  • When all infrastructure costs are absorbed into a single BI budget with no attribution, there is no basis for evaluating whether individual data processes are worth what they cost. Per-domain cost visibility changes that — making it possible to identify which pipelines justify further investment and which consume resources without generating a return.

Technology



Our team


Michał Żak

Michał Żak

Senior Data Engineer

Bartosz Adamiec

Bartosz Adamiec

Data Engineer



Our Team Expert Opinion




Databricks gives you enormous capability, but it asks something in return: you need to know what you are doing, and you need to know why. The platform the Client had was genuinely impressive in what it had achieved — a real testament to the engineers who built it. What it needed was a layer of cost discipline and structural clarity that is very hard to develop while trying to maintain your in-house legacy solution and day-to-day operations. Our job is to bring that layer in, without losing what was already working, and to make sure every technical decision we take has a clear answer to the question: what does this do for the business?


Michał Żak Senior Data Scientist at Addepto – Addepto

Take the next step


Schedule an intro call to get know each other better and understand the way we work


Let's talk

About Addepto


About us


We are recognized as one of the best AI, BI, and Big Data consultants


We helped multiple companies achieve their goals, but - instead of making hollow marketing claims here - we encourage you to check our Clutch scoring.

Our customers love to work with us

Let's discuss
a solution
for you



Edwin Lisowski

will help you estimate
your project.













Required fields

For more information about how we process your personal data see our Privacy Policy





Message sent successfully!