Proper application of Business Intelligence (BI) allows you to drill down into the organization’s data and draw conclusions from information to gain a competitive advantage on the market. The first step to achieving an effective BI is a well-designed data warehouse. Data warehouse design is the process of building a solution for data integration from many sources that support analytical reporting and data analysis. A badly designed data warehouse exposes you to the risk of making strategic decisions based on erroneous conclusions.
In this article, we will look at the high level of the data warehouse design process, starting from the collection of requirements up to the implementation itself. The article will also help you not to make key mistakes related to the implementation of the data warehouse.
Requirements gathering for BI and EDW design
Collecting requirements is the first stage of the data warehouse design process. The purpose of the phase is to define the criteria for successful implementation of the data warehouse. The requirements for analysis and reporting as well as hardware, software, testing, implementation and training of users should be specified. The organization’s long-term business strategy should be as important as current business and technical requirements.
After defining the business and technical strategy, the next step is to determine how the organization will backup the data warehouse and how to recover the system in the event of a failure. The development of an emergency recovery plan while collecting requirements ensures that the organization is ready to react quickly to both direct and indirect threats of data loss.
Determination of the physical environment for ETL, OLAP and database
The next step is to determine the physical environment of the data warehouse. At least there should be separate physical application servers and databases, as well as separate ETL / ELT, OLAP processes and reports configured for development, testing and production. By building separate physical environments, we must ensure that all changes can be tested before transferring them to production. Development and testing should be done without stopping the production environment, and if data integrity becomes suspicious, IT staff can investigate the problem without affecting the production environment.
Data modeling using Star Schema or Snowflake approach
After defining the requirements and physical environments, the next step is to determine how data structures will be available, combined, processed and stored in the data warehouse. This process is known as data modeling. During this phase of data warehouse design, data sources are defined. Identifying the source of the data and, what is equally important, the availability of this data is crucial to the success of the project. After identifying the data sources, the data warehouse team can start building logical and physical structures based on set requirements.
Extract, Transform, Load (ETL)
The purpose of ETL (Extract, Transform and Load) is to provide optimized data loading processes without losing data quality. The ETL process takes the most time during development and consumes the most time during implementation. Identifying data sources during the data modeling phase can help reduce ETL development time. Failure at this stage of the process may lead to poor performance of the ETL process and the entire data warehouse system.
OLAP or Tabular cubes designing
On-Line Analytical Processing (OLAP) is an engine that provides infrastructure for ad-hoc queries and multidimensional analyzes. Requirements for dimensions and measures of OLAP cubes must be specified at the beginning of the data warehouse design process. Below are three key elements of OLAP design:
Grouping measures – numerical values that we want to analyze (such as: revenues, number of customers, the number of products purchased by customers or the average purchase amount).
Dimensions – they complement and extend the analysis of measures to areas such as the geographical region, month or quarter.
Data granulation – the lowest level of detail that we want to include in the OLAP data set.
You need to make sure that OLAP cube processing is optimized during development stage. After data warehouse updating, the OLAP cubes should be updated quickly. Failure to update any of them in a timely manner can result in poor system performance. Taking the time to explore the most efficient OLAP cube generation path can reduce or eliminate performance-related problems after the data warehouse is deployed.
Business Intelligence Front-End development
After defining business requirements, placing physical environments, modeling data and designing ETL processes, the next step is related to the choice of the method and form of sharing data contained in the enterprise data warehouse (EDW). Choosing the right front end tool (Power BI, Tableau, Looker) is to ensure the way in which users will access data for ad hoc analysis, pre-defined reports and dashboards. Various options are available, including the construction of a front-end in-house part in your own strength or the purchase of an off-shelf product. Either way, keep in mind the requirements of business users to ensure the best quality of shared data. The most important element of the entire process is secure access to data from any device – desktop computer, laptop, tablet or phone. The tool should enable the development team to modify the back end structure when changing the reporting requirements at the enterprise level. It should also provide a graphical user interface (GUI) that allows users to customize reports. The OLAP engine and data warehouse may be the best in its class, but if users are not able to use data effectively, the data warehouse becomes an expensive and useless data repository.
Report and dashboard development
Most end users typically use data warehouses only to generate reports or dashboards. As we mentioned in the front-end development section – the ability to quickly and efficiently select report criteria is an important feature of generating them from a data warehouse. Distribution options are another important factor. In addition to receiving reports via a secure web interface, users may need reports sent as an e-mail attachment or as a spreadsheet. Controlling the flow and visibility of data is another aspect of the development of the reporting system. Developing user groups with access to specific data segments should ensure security and data control. A well-designed data warehouse should be able to handle new requests related to ad hoc reporting and
ETL performance tuning
The first part of the article recommended creating separate programming and test environments. As a result, organizations can provide improved system performance using ETL, query processing and delivery of reports without interrupting the current production environment. You need to ensure that the development and test environments, hardware and applications mimic the production environment – so that the productivity improvements developed during the development phase will work more efficiently on the production environment.
After developing the data warehouse system in accordance with business requirements, there is time to test it. Testing or quality assurance is a step that should not be omitted, as it will allow the data warehouse team to reveal and resolve problems before the first deployment. Failure to complete the testing phase may lead to delays in the completion or completion of the data warehouse project.
Enterprise Data Warehouse implementation & deployment
The decision whether the system will be available to all will depend on the number of end users and how they will access the data warehouse system. Another important aspect of system implementation, which is often overlooked, is the training of end users. The training should be carried out regardless of how intuitive the GUI is from the point of view of the data warehouse team and programmers. If the tool for end users is difficult to use and “incomprehensible”, then at some point they will stop using it, leaving out all the advantages of the system.
Understanding the best practices in designing a data warehouse
Designing a data warehouse is a time-consuming and demanding undertaking. Good and bad aspects appear at every step. However, if the organization devotes more time to develop robust requirements at the very beginning – the next steps of the process will be more smoothly and lead to successful implementation of the data warehouse.
See how we help leading manufacturing company to enhance management efficiency
by implementing automated reporting system.