Posts From This Author
About Our Authors
Entity versus Attribute Definitions
By Malcolm Chisholm on June 13, 2010View Full Bio →
In my last couple of posts I reflected on some of what I had learned in writing my new book Definitions in Information Management (available via www.data-definition.com) and in this post I want to address one aspect of the need to coordinate entity and attribute definitions.
At first sight, a data modeler might be a little puzzled as to why this particular topic would need any particular consideration. After all, don't we simply have to provide a definition for each entity and each attribute we put into a data model? This, of course, is true, but because the requirement is obvious does not mean that any approach to it will be necessarily simple. Indeed there is a strong case to be made for the need to have strict standards governing how entity definitions will be handled separately from attribute definitions.
Redundancy of Definition
An initial problem is redundancy in definition. Ideally we only want to define one concept, one time, and in one place. Suppose we have a Customer entity with a primary key attribute of Customer Surrogate Key. The primary key is always something of paramount interest to data modelers and there will be a temptation to include aspects of its definition in the Customer definition. For instance, the "surrogate" key may contain intelligence, e.g. it is nine characters long, but the first three characters represent Office Code. This fact might be so interesting to a data modeler that it is put into the entity definition of Customer and also the attribute definition of Customer Surrogate key.
We now have elements of the definition of Customer Surrogate Key at both the entity level (for Customer) and the attribute level (for Customer Surrogate Key itself). This immediately creates a maintenance problem of the same kind that Ted Codd pointed out happens when we do not normalize database design in general. It might perhaps be acceptable if we always found out all the facts of a definition at one instant of time and they never changed. If no additional facts or changed facts could appear then there would be no problem.
Problems of Distributed Definitions
But definition is not just a product. It is also a process, like bringing an image slowly into focus, and this process needs to be managed well. Definitions have to be continuously improved, and have to react to changes in the business. If elements of the definition of an entity are placed in one or more of its attributes, or aspects of the definition of an attribute are put at the level of its entity, then definition governance can quickly become unmanageable. These definitions are in a way "denormalized" and nobody can be sure where to go to update them. Over time inconsistencies will likely start to appear between the various places they are located.
A second issue is that of search. If anyone wants to understand a definition, then all of the definition should be associated with the entity or attribute concerned. A user looking at the definition of Customer Surrogate Key cannot be expected to know they must also look into the definition of Customer to find an additional part of the definition.
Definitions take effort, in both their production and governance, and we all need to improve the level of maturity with which we handle them. What belongs at the entity level should be kept at the entity level, and what belongs at the attribute level should be kept at the attribute level.
Follow all Expert Blog updates by subscribing to the
RSS feed.
About the Author
Malcolm Chisholm, Ph.D. has over 25 years of experience in enterprise information management and data management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management, and business rules.
There have been no comments yet.




















