The data vault method for modeling the data warehouse was born of necessity. Data warehouse projects classically have to contend with long implementation times. This means that business requirements are more likely to change in the course of the project, jeopardizing the achievement of target implementation times and costs for the project.
To improve implementation times, Dan Linstedt introduced the Data Vault method for modeling the core warehouse. The key design principle involves separating the business key, context, and relationships in distinct tables as hub, satellite, and link.
The data vault is currently the established modeling standard for modelling the core data warehouse because of the many benefits it offers. These include the following:
• Easy extensibility enables an agile project approach
• The models created are highly scalable
• The loading processes can be optimally parallelized because there are few synchronization points
• The models are easy to audit
But alongside the many benefits, Data Vault projects also present a number of challenges. These include, but are not limited to, the following:
• A vast increase in the number of data objects (tables, columns) as a result of separating the information types and enriching them with meta information for loading
• This gives rise to greater modeling effort comprising numerous unsophisticated mechanical tasks
How can these challenges be mastered using a standard data modeling tool?
The highly schematic structure of the models offers ideal prerequisites for generating models. This allows sizable parts of the modeling process to be automated, enabling Data Vault projects to be accelerated dramatically.
Which specific parts of the model can be automated?
The standard architecture of a data warehouse includes the following layers:
Both the staging area and the raw vault are very well suited for automation, as clearly defined derivation rules can be established from the preceding layer.
Should automation be implemented using a standard modeling tool or using a specialized data warehouse automation tool?
What are the arguments in favor of using a standard tool such as the erwin Data Modeler?
Using a standard modeling tool offers many benefits:
So far so good. But can the erwin Data Modeler generate models?
A special add-in for the erwin Data Modeler has been developed specifically to meet this requirement: MODGEN. This enables the potential for automation in erwin to be exploited to the full.
It integrates seamlessly into the erwin user interface and, in terms of operation, is heavily based on comparing models (complete compare).
The following specific functionalities are implemented in MODGEN:
To support a modeling process that can be repeated multiple times, during which iterative models are created or enhanced, it is essential that generation be round-trip capable.
To achieve this, the generation always performs a comparison between the source and target models and indicates any differences. These can be selected by the user and copied during generation.
The generation not only takes all the tables and columns into consideration as a matter of course (horizontal modeling), it also creates vertical model information.
This means the relationship of every generated target column to its source column as data source is documented. Source-to-target mappings can therefore be generated very easily using the model.
Integrating the source and target model into a web portal automatically makes the full impact and lineage analysis functionality available.
If you are interested in finding out more, or if you would like to experience MODGEN live, please contact our partner heureka.
Author details: Stefan Kausch, heureka e-Business GmbH
Stefan Kausch is the CEO and founder of heureka e-Business GmbH, a company focused on IT consultancy and software development.
Stefan has more than 15 years’ experience as a consultant, trainer, and educator and has developed and delivered data modeling processes and data governance initiatives for many different companies.
He has successfully executed many projects for customers, primarily developing application systems, data warehouse automation solutions and ETL processes. Stefan Kausch has in-depth knowledge of application development based on data models.
heureka e-Business GmbH
Untere Burghalde 69
Tel.: 0049 7152 939310