As your company grows and you start to think about taking advantage of big data and business intelligence, creating a data warehouse is one of the very first steps to take. Data warehouses play a key role in business intelligence, and, therefore, need to be well-thought-out. In this article, we are going to bring this question forward. We will talk about the data warehouse design principles, and also tackle the agile data warehouse design issue. Let’s find out what do you need to know about data warehouse design!
First things first. The data warehouses are immensely important in any company that wants to use business intelligence and big data in order to improve the decision-making process within it, and, thus, speed up its development and growth. Although they aren’t on the front line, they play an essential role in the background. In fact, without them, business intelligence wouldn’t have been possible! But what exactly are data warehouses? And how to design a data warehouse? You’re about to find that out.
What is a data warehouse?
If you run a company, most likely, you store your data in various places and forms. For instance, let’s say you have:
- Customer data in your CRM software
- Bills and invoices in PDF and JPG files
- Your website stats in Google Analytics
- Your e-mail data in XLS/CSV files
And so on.
You see, nowadays, data comes in various sources and forms. The idea behind building a data warehouse is to combine all these different forms of data in one, versatile tool. And that tool is a data warehouse! There are also data lakes, but that’s a subject for a separate article.
We can say that data warehouses are repositories that collect data from various systems and sources and use online analytical processing (OLAP) to query that data for better business insights. Naturally, there is no single all-purpose data warehouse. It has to be built and designed in accordance with the requirements of a given company.
Data warehouse design from the technical point of view
The data warehouse is based on an RDBMS (Relational Database Management System) server, which is a central information repository that is surrounded by other components necessary to make the entire environment functional. Obviously, the structure of a data warehouse may vary depending on its purpose and the company’s needs.
However, we can indicate five major components of a DW:
- The database (it’s the central element, the foundation of the entire DW structure)
- ETL tools (data within a data warehouse has to be preprocessed before it can be transferred into a data warehouse, and this is what ETL tools are for)
- Metadata (it specifies the source, usage, and features of data stored within a data warehouse)
- Query tools (they allow you to interact with the data warehouse system and obtain necessary insight/knowledge)
- BUS (the data warehouse bus determines the flow of data and allows to integrate data marts into one large data warehouse)
That’s the theory, but usually, you don’t have to build a data warehouse entirely from scratch. There are many platforms available on the market that can be used with ease in order to build your own, customized warehouse. The most popular applications and software are Snowflake, Amazon Redshift, BigQuery, IBM Db2, Teradata Vantage, Panoply, IBM PureData System for Analytics (PDA), Azure SQL Data Warehouse, Oracle Exadata Cloud Service.
Data warehouses can be built on-premise (locally, within your company’s structure) or in the cloud. The second option is usually cheaper and quicker to implement.
Data warehouses–two approaches
Generally speaking, there are two major approaches to design a data warehouse:
- The top-down design
- The bottom-up design
What’s the difference between these two approaches? In the first approach, you start ‘at the top’–by designing the data warehouse structure first. In the second approach, you start ‘at the bottom’–by creating data marts first.
Now, a short digression: data marts are smaller versions of data warehouses or, rather, data warehouse subsets. They are focused on a single subject, while a data warehouse is designed to address many problems and comprise many subjects within one company.
In the first approach, the data warehouse, a centralized repository for the entire company, is built first. In such a situation, data marts are made after data warehouse and serve as a support for the central structure. In the second approach, data marts are built first, and, then, they are integrated into one, larger data warehouse. In this approach, the integration is implemented using a data warehouse bus.
The bottom-up design is quicker and more flexible, but it can be difficult to maintain. On the other hand, the top-down design is easier to maintain and robust against business changes, but it’s slower and more complex to set up. In many instances, it’s also more expensive, especially at the very beginning.
To sum up, there is no single universal approach. It all depends on many factors, and each company should decide which data warehouse design strategy is best.
Agile data warehouse design
Agile comprises various approaches to software development and is based on adaptive planning, evolutionary development, early delivery, and continual improvement. Also, it encourages rapid and flexible response to change.
According to Agile, data warehouse design should be done in such a way:
Individuals and interactions over processes and tools.
Working data warehouse systems over comprehensive documentation.
Collaboration with end-users and stakeholders over contract negotiation.
Responding to change over following a detailed plan.
There is one primary goal in the Agile approach: To build the data warehouse systems that work, add real, measurable value to the organization, and are built with high-quality standards in mind. Simply put, it’s all about making a useful solution, and not art for art’s sake.
We work in the same way. We always want to address the real customers’ needs and help them to do business in a quicker, more effective, and better way. This is our main goal, and this is what you can expect from cooperation with us!
Steps in data warehouse design
In general, business intelligence companies like Addepto design data warehouses on behalf of their clients. Our experience and know-how ensure that the data warehouse you receive is though-out and tailored to your company’s needs.
If you’d like to see what the data warehouse designing process looks like, we have a short description for you right here:
THINK ABOUT YOUR NEEDS AND REQUIREMENTS
As we told you earlier, there is no one versatile data warehouse. It has to be built in accordance with your needs. That’s why every department in your company should be engaged in this process. You have to know what do you need the data warehouse for. Every department needs to understand the purpose of it, how it will benefit them, and what kind of results they can expect. Only then, it’s possible to design a data warehouse that’s efficient and useful.
SET UP THE PHYSICAL ENVIRONMENTS
There are three significant environments within a data warehouse–development, testing, and production. Why do you need these three elements? All these environments have different resource needs, therefore trying to combine all functions into one structure can be terrible for performance. Naturally, there can be more environments, designed to suit specific business needs, for instance, the quality assurance environment.
It’s the process of creating a data model for the data stored in a data warehouse. It’s all about designing data flow and its distribution within the data warehouse. Data modeling is a crucial element in designing a data warehouse. This stage helps you visualize the relationships between data, it’s flow, compliance and security processes, and other vital elements. All in all–this stage answers the question, “What is all that going to look like?”
DESIGN THE ETL PROCESS
As you know from one of our previous articles, ETL is a data-related, three-stage process. In general, it is about transferring data into a data warehouse. First, you have to EXTRACT data from different source systems. Then, data needs to be TRANSFORMED and finally LOADED into a given data warehouse system. If you’d like to know how exactly this process happens, we encourage you to read our article about the ETL process.
DESIGN THE OLAP CUBES
The data warehouses use online analytic processing (OLAP) to query data stored within them for better business insights. OLAP allows DW users to analyze data from multiple systems at the same time. And the OLAP cube is a data structure created and optimized for data analysis. At the end of the ETL process, data is loaded into an OLAP cube where information is pre-calculated in advance for further analysis. In other words, cubes help you analyze data stored in the data warehouse.
Now, it’s time to tackle the front-end issue. Simply put–your data warehouse needs visualization if you want it to be a valuable asset in your company. Front-end development is all about how users will access data for analysis and run reports. Thanks to front-end visualization, users can immediately understand and apply the results of data queries. That’s why it’s so important not to go overboard with front-end design. Keep it simple and legible. If you’re struggling with this stage, we advise you to go back to point 1. Think about your needs and requirements first. Moreover, at this point, you ought to ensure access to the data warehouse from any device.
OPTIMIZATION AND ROLLOUT
The last thing to do is to optimize your data warehouse. Try to maximize workflows in order to speed up queries and the way your data warehouse works. Conduct necessary tests and initial training for your staff. Only then can they fully benefit from this new solution. Your employees have to understand how “their” data warehouse works and how it can improve their everyday work. Devote at least 2-3 weeks and help your staff get acquainted with a data warehouse and learn how to use it.
To sum up, data warehouses can be immensely helpful. With this foundation, you are ready to implement business intelligence into your company and start working far more efficiently. BI improves the decision-making process and gives you an entirely new insight into the data you possess. But there is one vital condition. You have to understand WHAT IS IT FOR. Only then can you fully benefit from the fantastic tool the data warehouses are!
If you are still before taking this step, and you’d like to find out more about the ways business intelligence can enhance your business–don’t hesitate and call Addepto! Our experts are always more than happy to invite you to the tremendous world of BI and big data. See you soon!