Tom Haughey

Up Front and Personal

By Tom Haughey on January 28, 2011
View Full Bio →

The term, “up front”, is bandied about a lot these days, often with ambiguous meanings. You can read statements such as, “traditional modelers require a comprehensive, detailed data model up front”. Maybe it’s about time somebody defined these terms. This blog will address two definitions of “up front”. It will also address when a data model needs to be detailed and what is the meaning of comprehensive. Some definitions are necessary to provide a meaningful context. This blog will use a BI/DW environment to illustrate this.

First, contrary to what so many say, the most common method for building data warehouses today is not a waterfall method at all but is a RAD (Rapid Application Development) method. It is based on the following simple principles. As you can see, we are talking about small increments:

  1. Think big and with a view to the future but build incrementally
  2. Divide the DW functionality into increments
  3. Ensure these increments are small (< =3 months)
  4. Timebox the work (<=1 month per phase)
  5. Deliver an early version of the data 1/3 of the way through an increment
  6. Form small cross-functional teams

Second, let us assume for our discussion the following development life cycle for BI/DW. These steps apply to each increment, thereby their total duration is a maximum of three months. You can assume roughly one month for each of the three phases.

Initiation

  1. Planning and startup
  2. Scope definition (Defining capabilities in the increment to be delivered)
  3. Brainstorming and gathering the necessary usage requirements
  4. Ends with install a live DB with base data (base data is non-aggregated data)

Exploration

  1. User/deliver team interacts with the data to test it and the human interface
  2. ETL team tests the end-to-end process and tests the data store for content and quality
  3. Data team iteratively tunes the physical database and may aggregate data
  4. Ends when time-boxed targets are attained

Implementation

  1. Traditional production rollout
  2. User training and preparation
  3. Ends with release of the new increment
 
The first problem is that “up front” appears to have two possible meanings. The dictionary says it means “paid or due in advance; beforehand”. Let’s look at it two ways.

1. The Very Beginning of a Project (steps 1 and 2 above). Since agile projects are measured in weeks, not months or years, this definition of “up front” appears to involve the first few days (even hours!!!) of an Agile project. In a “traditional” project, steps 1 and 2 consume only hours to days. In 28 years in the data management business, I have never heard a single human being say that a comprehensive, detailed model has to be defined within the first few days of a project. So this interpretation of “up front” is meaningless.

2. Everything that Occurs Before Design. The end of “up front” is the beginning of design (design starts in step 4 above). The data model is the product of step1- 3 and is input to step 4. In step 4 the database is designed, optimizations are performed, and the database is installed. Traditionalists do indeed say that a detailed data model needs to be defined before database design can be completed and a database installed. If this is what is meant by the term “up front”, then this term is a misnomer because 1/3rd of the work is done during these steps. The data model indeed needs to be created by this time.

The next point is that e traditionalists are supposed to require a data model that is “comprehensive and detailed”. Let’s take the term, “detailed” first. Steps 1 and 2 above occur at the very beginning of the project. The data model produced through step 2 is very general and not detailed at all. Detail would be premature, if not impossible. The product of step 2 is a model that is sometimes called a High Level or Conceptual Data Model. It is a precursor to the Detailed Level Data Model. It contains main business entities, only major attributes, their relationships, and may contain many-to-many relationships. The Agile Common Domain Model only roughly fits here because by Agile definition it is not a precursor to the Detailed Data Model. The model produced by step 3, after a lot of information gathering has been done, is detailed and normalized but does not have to contain 100% level of detail. About 85% completion is sufficient. The remainder can be picked up in design. But yes, this model is detailed.

The term “comprehensive” is troublesome. Some seem to imply that the traditionalist model must cover a huge scope, larger than the increment(s) being developed. Take a company that sells stuff. Say the increment we are working on covers only direct sales, but not internet sales or equipment service. Hypothetically, others would contend that the traditionalist model would cover all three in detail from the get-go, maybe even more. I challenge that. The model has only to cover the current increment. In any case, their implication is that the scope is bigger than the increment being developed. Admittedly, in the past, data modelers have gotten carried away with large detailed models. But today is today and that tendency in data modeling is almost completely behind us.

There are at least three ways to evolve a data model, assuming you start with a High Level orConceptual Model. 1. do a broad High Level Data Model (taking no more than a few hours or days to do), then divide it into smaller increments for detailed modeling. 2. separate into smaller increments from the beginning so that the High Level and Detailed Level Model are smaller increments from the get-go. 3. do a broad High Level Model and an equally broad Detailed Level Model but split into smaller increments in design.

We have seen each of these work successfully. They are just different.  But never, never, does the term “broad” imply a very big project done all at once, because the definition of a very big project is one that is “too big to do.”

In summary, development should be incremental. The term “up front” is an ambiguous and misleading term. At the very beginning of a project, one needs just a High Level or Conceptual Data Model that covers only the scope of the proposed increment. Going into design, we do require a comprehensive, detailed data model. At this point, the model should be detailed but does not need to exceed the scope of the development increment.

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

Tom Haughey is considered one of the four founding fathers of Information Engineering in America. He is currently President of InfoModel, LLC, training and consulting company. His courses on data management, data warehousing, and software development have been delivered to Fortune 100 companies around the world.

There have been no comments yet.

Name:

Email:

Comment:

The color of grass is usually...?

Notify me of follow-up comments?