in Blog

March 24, 2020

Data Warehouse Migration – A Worthwhile Endeavour Or A Waste Of Time?

Author:




Edwin Lisowski

CSO & Co-Founder


Reading time:




7 minutes


How data rules the world, and how data warehouse rule data

It’s not exactly a bold statement to say that most organizations collect increasingly more data each year. Businesses acquire and interact with customers, and to keep track of everything. All new events must be collected and logged into the database. Facebook has been reported to generate 4 petabytes of data each day. The entire universe of accumulated data is projected to be 44 zettabytes in 2020, from 4.4 zettabytes in 2013. That would be 44 BILLION of 1 terabyte hard drives you’re probably accustomed to! Keeping up with the always growing volumes of data may seem frightening, but there is no reason to be afraid. There are tools to do the job. Let’s take a closer look at data warehouse migration.

Data Warehouses (DWHs) exist to help with that. They accommodate all incoming data in one, central unit, organizing them to be readily available for people within the company. They have proven to be essential for a variety of organizations: institutions, governmental bodies, large enterprises and small businesses alike. Different users have different needs so to reflect that, Data Warehouses can also be arranged and organized in a myriad of ways. However, we can differentiate two big groups among them. Legacy DWHs and cloud-based DWHs.

Introduction to data warehouse migration

The former are primarily older, operating on-premise and present some shortcomings. They’re often working on old hardware and software. Being physically on-site makes it a financial and organizational burden to upscale and downscale. The latter is situated in the cloud, utilizes a system on which storing and computing are done separately. Allowing for upscaling without the worry of the old infrastructure making it impossible. They’re also using a SQL framework, making it much easier to work with than legacy DWHs.

The common case is that over time there is also a progressively bigger number of attributes ascribed to data. Making both storing and working with it increasingly more complex.

Having that and having to store increasingly bigger volumes of data in mind, there comes a point when things for users of legacy DWHs start to get a little rough. Queries submitted to the database are turning up sluggishly. Time dedicated for maintenance is eating up time that should be spent on analysis, and upscaling. It simply doesn’t seem cost-efficient as the surrounding infrastructure would have to be upgraded as well.

database, programming, computer, company, man

You may also find it interesting – Data Analytics Consulting Services.

Old VS new data warehouse – the deffirences

To picture this juxtaposition, let’s use an analogy. Imagine that a developing business is a person that wants to achieve some fitness goals. They have some ideas about what sort of physique they would want to sport, how strong they would like to be and what would that take. They are presented with two training plans – plan A makes you a fit person, able to both be agile and strong while also keeping yourself well and healthy – balanced in most regards, but with room for improvement because of its well-thought-out base. The other one makes you a heavyweight strongman, able to lift very large weights, but at a price of very heavy maintenance (a lot of nutrition, supplements, massages.) If you find the maintenance costs too high, scaling down will be a very tough and time-consuming effort.

It’s easy to figure out that plan A is meant to be a cloud-based DWH, and plan B is a legacy, on-site DWH. While in real life it’s easier to transform from person A to person B, in the digital world it’s the other way around, thanks to the ability of DWH migration.

Simply put, the on-premise Data Warehouse solution is obsolete for businesses that want to be able to handle their database with speed, agility, and ease. Legacy systems require a lot of care and maintenance as they’re constantly running on near-full capacity, making them prone to failures and outages.

Data Warehouse, problems, code, work

Data warehouse migration: the pitfalls

The case for the modernization of a Data Warehouse seems very strong, but there are still things to be considered. Perhaps the greatest case against migrating to the new infrastructure is the process itself, as it needs to be handled very carefully. There are a number of problems that can occur during the migration such as breach of security, data loss or failure to meet the deadline resulting in cascades of other problems.

Rest assured, the path to modernization may seem dangerous, but with preparation, careful planning, monitoring, it will be a breeze. All the potential pitfalls can be avoided with some extra caution, awareness, and taking some prerequisite steps.

1. Data security

Data security is a hot topic nowadays – we keep hearing of lapses and breaches in the security of big companies that cause them to leak huge amounts of confidential data. Catastrophes like these don’t happen all that often, but better to be safe than sorry.

Fortunately, there are tools that can aid you in making sure the data is safe. They vary depending on the cloud service provider. For example, Google’s DWH service BigQuery uses Cloud Identity and Access Management and SQL security views; that allows for data encryption both in transit and at rest, so the information is secure even during the migration process itself.

Data Security

2. Junk data

The second issue may be a tricky one – making sure that you migrate data you need and getting rid of junk data. Hoarding all data sometimes is a necessity as there are not enough working hands to sort through it but a lot of times it’s simply not given much thought. DWH migration is a very good opportunity to look back on the information you have amassed and picked the bits that actually are worthwhile to you.

Of course, the action takes time at first, but it’s very beneficial in the long run – the Extract, Transform, Load process is much faster, the database you eventually end up with is much more accurate and up to date and less stored data means fewer costs.

3. Data silos and time constraints

Going over the deadline when it comes to Data Warehouse migration projects is not uncommon, but that doesn’t mean it should be happening in the first place. One of the reasons for it is the existence of data silos and their uncareful handling. Data silos are created when you accumulate large amounts of hermetically stored data that are in a format incompatible with the rest of the system. Attempts to connect them to the rest of the network are often ad-hoc, haphazard solutions that are designed to work for the time being.

This is one of the problems that doesn’t seem too much of a nuisance at first, but can secretly grow to giant proportions – and you’ll find out at a later time, for instance when you try to get it ready for migration. To ensure everything runs smoothly, data needs to be in a unified format before the migration, and if it’s not, it can be crippling for the entire operation.

computer, programming, code

The outcome of data warehouse migration

To wrap up, the entire process may be sometimes described as convoluted and hard to execute, but in the fruits of your labor will be all worth it. Statistics show that migrating your DWH to the cloud can net 41-52% cuts in the total cost of operation. Once you decide to scale up or scale down, the process is substantially easier than in the case of legacy DWHs, with added agility, efficiency, and comfort.

I hope these tips and comparisons helped you to understand Data Warehouse Migration and maybe even make a decision if you’re thinking about it. Remember – it’s always better to take control of your data before it gains control over you.

If you would like to take advantage of your data and translate it into valuable business insights – contact our Addepto Team!



Category:


Data Analytics