Posts From This Author
About Our Authors
What Do Data Modelers Like to Do?
By Tom Haughey on July 16, 2010View Full Bio →
We all know that the process of data modeling is one of taking information requirements and organizing them into a structure. This structure consists of entities, attributes and relationships, and supplementary business rules. We all know that this represents a set of structural business rules. But what do data modelers really like to do?
They like to model, which means that sometimes they may over-model. They like to normalize, which means that sometimes they may over-normalize. They like to abstract, which means that sometimes they may over-abstract. Let’s take these one by one.
Data modeling is a creative process. But what do you model? Peter Drucker once said that an organization should ask itself three questions: “What business are we in? What business will we be in? What business should we be in?” These questions are important to data modeling. If you need to create a model of the way the business is (maybe with some enhancements), but you create one of the way the business will be, then the new model is usually a slight mismatch for the need. But, if you build a model of the way the business should be, when you need one for the way it is, then there will be a big mismatch. You probably have over-modeled the problem. In fact, the “should be” model may be totally unusable because it is possible that there is no source for much of the data. The message here is to establish the proper perspective before you begin modeling, which will potentially avoid over-modeling.
One purpose of normalization is to eliminate update anomalies. Say, customers have several addresses. Well, two customers happen to have the same address, meaning that they have the same value for their address. Fetsie Dobbs lives at 150 Main Street, New York, NY 12345. So does Fred Smoot. We create a model with three entities: CUSTOMER has CUSTOMER ADDRESS (which is an associative entity) which is an ADDRESS. The idea is that if the address changes, we only update it in one place, ADDRESS. Just one problem. Addresses don’t usually change. It is people’s having an address that changes. 150 Main Street is still 150 Main Street, even if the Fedster moves. I contend that to have three entities in this case is over-modeled. Of course, there are cases where the three entities are necessary and useful, but in so many cases that I have seen, a separate entity with nothing but addresses (which are really just locations) is more than the business needs. However, before determining the solution in data modeling you should always consider the pros and cons of different alternatives, and the consequences of each.
Data models should be designed to facilitate easy change. In fact, they can even be designed so that they will never change. Consider NAME-VALUE pairs. An example is a model with three entities, CUSTOMER has CUSTOMER ATTRIBUTE which is an ATTRIBUTE. Each row in CUSTOMER has a set of predefined attributes, of course. But a customer needs a new attribute that you did not plan for. So you can add a row to ATTRIBUTE, and finally assign this new row to CUSTOMER ATTRIBUTE where you also give it a value. For example you want to give customer 12345 a LIQUID NET WORTH (LNW) attribute worth $2MM. There is no LNW attribute in CUSTOMER. No problema. You add LNW to ATTRIBUTE, assign the attribute to the customer in CUSTOMER ATTRIBUTE, and give it a value of $2MM in CUSTOMER ATTRIBUTE. Wow. No big deal. Unless you’re writing SQL. What does SQL use for the column header? The column name. What’s the column name? “Attribute”! Not too user friendly. OK if you are Houdini in SQL, you can display the proper attribute name. There is a place for abstractions like NAME-VALUE pairs. They are very extensible. However, they are devoid of business clarity. For this reason, a new data modeling guideline should be established: all data modelers are now required to write SQL in their first two years.
What is the general point in all of this? Sure, data models should be business oriented, easy to change, allow for reuse of data, and flexible. But data modelers need to be practical and have business clarity. As in all things in life, balance is required. So think twice before creating that clever new structure consisting of THING, THING-TYPE, THING-THING, and THING-THING TYPE. The wife of James Joyce, the famous Irish author of Ulysses and Finnegan’s Wake, once said to her husband, “Why don’t you write things that people can read?”
Follow all Expert Blog updates by subscribing to the
RSS feed.
About the Author
Tom Haughey is considered one of the four founding fathers of Information Engineering in America. He is currently President of InfoModel, LLC, training and consulting company. His courses on data management, data warehousing, and software development have been delivered to Fortune 100 companies around the world.
There have been no comments yet.




















