When a European fashion retailer’s data-related costs spiralled beyond what the scale of the business justified, the problem was clear: the data ingestion and transformation workflows were not optimised, and nobody had the knowledge nor the capacity to fix them. Our team was engaged to optimise costs, stabilise the platform, and – as a further step – design a scalable, future-ready data architecture aligned with cross-functional business needs.
The Client is a mid-sized European fashion retailer with several decades of operating history. Despite its scale, the company retains a distinctly family-owned culture, with a loyal, long-tenured workforce. Within the organisation, the Business Intelligence department serves as the central hub for all data-related operations — collecting raw data from online stores, physical outlets, and social media channels, then cleaning, consolidating, and enriching it before delivering it to the teams that need it, from merchandising to the management board. The BI team built and maintained the underlying data platform in-house over the years, and it is this platform that the current engagement is migrating toward Databricks.
The data platform had been designed and maintained in-house, with no formal documentation or standardized processes in place. As a result, knowledge of how critical pipelines operated remained concentrated within a small group of individuals. The business became fully dependent on those who built the system to keep essential data flows running — creating operational fragility, increasing risk over time, and ultimately becoming a bottleneck to scaling and further development.
The infrastructure setup on Databricks was not really suited for the ETL workloads running on the platform as the configuration of the compute clusters available for data processing far exceeded the required resources. On top of that, due to the irregular usage patterns by the stakeholders, some of the clusters were permanently available despite being heavily under-utilised and spending majority of the time in idle state. Both of those factors resulted in an infrastructure bill disproportionate to the size of the business — with no tooling to understand what was driving it.
One of the main pain points in the whole ETL process was a legacy orchestration & processing framework; the technology was chosen by the Client some time ago but has since reached the end of life and received no further development since. This presented a number of challenges in and of itself, as it limited the adoption of newer Databricks capabilities and certain compute configurations on the platform and there were no prospects of expanding the scope of services the framework is compatible with in the ever changing stack of data-related technologies.
The organisation had no visibility into how data moved through its systems — which pipelines were actively used, how frequently they were executed, and which data products were generating real business value. Without this level of transparency, the BI function operated largely as a black box, making it difficult to assess the impact of individual data processes. As a result, it was impossible to make informed decisions about where to invest, where to cut costs, and where improvements would deliver the most meaningful commercial impact
The challenges identified had clear business consequences — rising costs, operational fragility, and a lack of visibility that made informed decision-making impossible. Each goal below maps directly to one of those consequences, and the sequence is deliberate: earlier phases create the conditions — financial and technical — for the ones that follow.
The first phase of the engagement focused entirely on financial stabilisation — deliberately so. By addressing the most acute cost drivers quickly, we created budget headroom that allowed the Client to expand the project team with a second dedicated engineer. That expansion is already in place. The migration and governance work now underway would not have been possible without it.
Databricks
Apache Spark
Microsoft Azure
Power BI
Michał Żak
Senior Data Engineer
Bartosz Adamiec
Data Engineer
Databricks gives you enormous capability, but it asks something in return: you need to know what you are doing, and you need to know why. The platform the Client had was genuinely impressive in what it had achieved — a real testament to the engineers who built it. What it needed was a layer of cost discipline and structural clarity that is very hard to develop while trying to maintain your in-house legacy solution and day-to-day operations. Our job is to bring that layer in, without losing what was already working, and to make sure every technical decision we take has a clear answer to the question: what does this do for the business?
Discover how AI turns CAD files, ERP data, and planning exports into structured knowledge graphs-ready for queries in engineering and digital twin operations.