Tom Haughey

Agile Development and Data Modeling

By Tom Haughey on October 26, 2010
View Full Bio →

What is Agile Development

Agile software development refers to a group of software development methodologies based on iterative development, where requirements and solutions evolve through collaboration between self-organizing cross-functional teams

Agile is now so overused a term that it is important to clarify it. You hear statements like the following all the time: “The business needs to be agile. To respond to business needs, we need to be agile. To compete in today’s fragile environment, we need to be agile. We need to develop an agile IT infrastructure” This is “agile” as a general quality (an adjective in lower case) and is a desirable general quality of many things.

“Agile” as a thing (a “noun” in title case) represents a set of principles, methods and techniques. It is usually expressed as Agile Methods or just Agile, as in Agile Data Modeling or Agile Software Development. In this usage, “Agile” is almost anthropomorphic, as in expressions like “Agile recommends short intervals; Agile encourages change. Agile data modeling advises…”

Desiring to acquire “agile” the quality has nothing intrinsically to do with using “Agile” the thing. The business can become “agile” by using many different means, including but not limited to “Agile”, the software development method.

What is Data Modeling?

According to Scott Ambler, data modeling is the act of exploring data-oriented structures.  “Traditionalists” find this definition myoptic. “Evolutionary data modeling is data modeling performed in an iterative and incremental manner. Agile data modeling is evolutionary data modeling done in a collaborative manner.”

According to “traditionalists”, data modeling is the process of defining the data needs by classifying the objects of interest, characterizing them and interrelating them. Data modeling is based on business rules, business metrics and process needs. Data modeling has always been performed in an iterative and incremental manner. The data model has always been expanded and enriched in a collaborative manner. In my 28 years of involvement in data management, no qualities of data modeling have been more consistently reiterated, not even non-redundancy. It is absurd to imply that traditional data modeling is done in one continuous act or that it is done all upfront by an isolated team without involving Subject Matter Experts and without sensible examination of requirements.

Sure, there have been excesses in traditional data model - just as there have been excesses in the use of Agile.

In summary, traditional data modeling is incremental, evolutionary and collaborative (and thereby agile) in its own right.

Agile Assessment of Traditional

The implications of Agile proponents like Scott Ambler is that “the traditional approach of creating a (nearly) complete set of logical and physical data models up front or ‘early’ isn’t going to work.” One issue with a statement like this is what does “up front” or “early” mean. He says that the main advantage of the traditional approach is “that it makes the job of the database administrator (DBA) much easier – the data schema is put into place early and that’s what people use.” Actually, the main advantages are that it is a clear expression of business information requirements plus developers have a stable base from which to work. They say that it requires the designers “to get it right early, forcing you to identify most requirements even earlier in the project, and therefore forcing your project team into taking a serial approach to development.” On the contrary, data and process modeling, and thereby data design and program design, should be done in a flip-flop manner. You collaborate on the requirements, model some data, model some processes, and iterate this process till the modeling is done – using a white-board and Post

He says, “Second, it doesn’t support change easily. As your project progresses your project stakeholders understanding of what they need will evolve, motivating them to evolve their requirements. The business environment will also change during your project, once again motivating your stakeholders to evolve their requirements. In short the traditional way of working simply doesn’t work well in an agile environment. One critical technique is database refactoring.” Agile is right. We have to admit that the traditional SDLC has been resistant to change.

Traditional Transaction Processing

But remember this. The traditional SDLC (System Development Life Cycle), whatever its faults, has successfully delivered the core systems that run business across the world. Imagine delivering a new large brokerage trading system in 2-week intervals, or going live with a space shuttle project 2-weeks at a time, or delivering a robotic systems for heart surgery in 2-week intervals. Much, but not all, of Agile development has focused on apps like web-based systems and smaller, non-strategic systems.

Next Blogs

In the next few blogs, we will take a close look at Agile Modeling and provide a serious and fair assessment of its value and place. We will evaluate its core principles, take a close look at refactoring, examine what “up front” means, and offer some practical adaptations for making development more agile while still honoring the goals of enterprise data management. This also means we will take a serious and honest look at traditional data modeling. 

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

Tom Haughey is considered one of the four founding fathers of Information Engineering in America. He is currently President of InfoModel, LLC, training and consulting company. His courses on data management, data warehousing, and software development have been delivered to Fortune 100 companies around the world.

Tom Bilcze
November 1, 2010

I agree that the term “agile” is grossly over-used. To many in IT it is just an excuse used to build an app quickly with little upfront analysis and design. Developers see no harm in coding and recoding in a continuous loop until it is right. I agree with your assessment that traditional data modeling can be agile. It’s all about collaboration while delivering a larger project in smaller chunks that are iterative and build upon each other. I mostly see the downfall in this agile approach not from the methodology but from the practitioners. Let’s fact it; most IT folks are not very collaborative and are not conversant in nature. This is the weakness that agile development needs to overcome.

Michael Silves
November 9, 2010

What does it mean to say “One critical technique is database refactoring”?  Is there an agile way to change the structures of tables in a data warehouse in two week increments? Am I missing something?

I have attended several Agile training sessions and every one assumed that you had a set of well-defined data structures as a given. As nearly as I could tell the only refactoring involving data they did had to do with rearranging the structures of user views to accommodate differing user requirements.

Tom Haughey
December 1, 2010

Your question about database refactoring is excellent, as is your comment on the existence of the logical model for proper DW implementation. This month’s Blog addresses database refactorings. Let me briefly address your points one by one.

“What does it mean to say ‘One critical technique is database refactoring’”?

This is virtually a quote from Agilists. Yes, they strongly advocate database refactoring and have written several articles and even books on it.

Database refactoring represents a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. Database refactoring is more difficult than code refactoring. Code refactorings only need to maintain behavior. Database refactorings also must maintain existing integrity and other business rules. The term “database” includes structural objects, such as tables and columns, and logic objects such as stored procedures and triggers.

Before continuing, let me emphasize some principles I have advocated for years, especially for the DW:
•  Divide deliverables into increments,
•  Make the increments short (not greater than 3 months),
•  One-third of the way through an increment, deliver a populated database for testing (especially for DWs)
•  Timebox the work; and
•  Use small, cross-functional teams.

I contend that the total cost of ownership of delivering a system in very short increments (such as two weeks) and refactoring the data to get it right, is higher than using a broader, but reasonable, scope with a solid data model. Doing this is not BDUF (big design up front) and it doesn’t take forever. It’s common sense.

“Is there an agile way to change the structures of tables in a data warehouse in two week increments? Am I missing something?”
•  Not all Agilists require two-week increments.
•  Agile proposes many types of database refactorings, such as structural, data quality, referential integrity, architectural, procedural and transformations.

The issues are somewhat different when refactoring an OLTP or a DW database due to the large volumes of data in a typical DW environment. If a refactoring entails reloading large tables in a DW, then the cost to refactor can be very high. In a large DW, this could take days! In such cases, Agilists are unrealistic on the impact of refactoring in a DW environment. Actually, I believe that Agile proponents are missing something in this regard.

“I have attended several Agile training sessions and every one assumed that you had a set of well-defined data structures as a given.”

•  Not all Agilists agree on this point. My position is that the data model is essential to success but that a data model should cover a reasonable scope. See the principles above. If the scope is too large or complex, it will take much time to complete. Nevertheless, I contend that two weeks per deliverable is too short and will require refactoring and many transformations. Incidentally, Scott Ambler says the logical data model is useless. See my earlier blogs on this point.

“As nearly as I could tell the only refactoring involving data they did had to do with rearranging the structures of user views to accommodate differing user requirements.”

•  This view refactoring is really more like code refactoring.
•  Agilists, as I say, do advocate database refactoring, as described above.

In summary, Agile does recommend database refactoring. Six forms of refactoring are described. Refactoring large DW databases has different requirements that smaller OLTP databases. Data modeling is incremental and iterative but miniscule increments (such as two-weeks) will require extensive refactoring.

Tom Haughey
December 1, 2010

To Tom Bilcze, thank you for your insight. I couldn’t agree with you more. One big lesson from Agile is the importance of delivering practical results quickly. In the 1990’s there were so many CASE projects that went on forever without delivering anything. This gave “traditional” data modeling a bad name. On the other hand, sometimes the Agile approach starts to sound like an embodiment of the old adage: “We never have time to do it right but we always have time to do it over!” You are right; the three critical characteristics are incremental, iterative and collaborative.

cbemerine
August 14, 2011

I was asking myself, at what point Agile and/or Agile/Scrum methodology do the data structures, data models and objects traditionally get defined and built?  Certainly not in the stories where details are not gleaned until conversations occur. 

This led me to your post, love it! 

Of course data modeling, data structures and objects evolve over time…and if they ever stopped evolving, I would be concerned for the health of the company and the future of my role with them.

I look forward to reading more…CB

Francois Cartier
September 15, 2011

From my experience dealing with I.T. people using Agile programming where the database is not pre-existing, they will design the tables that will suit their immediate needs and ask the data architect to make any adjustments that will not impact unduly the code they have already written (the code is the specs) because they have a very tight schedule but they promise to revisit on the next iteration which is usually a repeat of the above.

There was an old 10:1 average cost ratio between data structure change and process structure change based on the assumption the data structure is shared by multiple processes. Unfortunately, one “solution” I saw being adopted on the Agile side is to build silos first, share later (another MDM project on the horizon).

John Zackman’s “scrap and rewrite, scrap and rewrite…” comes to mind. I see quality being sacrificed for the sake of expediency, most of the time. But it is essentially a project, scope, focus and resource management problem that I.T. professionals have never been able to resolve. A given methodology is just like a hammer. It’s a tool that is not appropriate at times. Besides, any new methodology has to prove itself first, by the results rather than by the rationalization.

What we need it not one methodology or another, anyway, but a methodology set that allows for hybrids, experimentation, adaptation to circumstances and resistance to the few individuals who would try to flout it while expected to follow it.

Terry Bunio
February 13, 2012

Interesting Blog article. I just came across it today. Sorry for joining the conversation late. Here are my thoughts:

http://bornagainagilist.wordpress.com/2012/02/13/agile-data-modeling-still-a-ways-to-go/

Name:

Email:

Comment:

The color of grass is usually...?

Notify me of follow-up comments?