Posts From This Author
About Our Authors
Visual Normalization
By Alec Sharp on January 28, 2011View Full Bio →
This series of posts has discussed the many advantages of drawing an Entity-Relationship Diagram so that dependency flows consistently from top to bottom. One particular advantage we looked at was that this convention allows one to explain normalization in a fairly natural way that doesn’t rely on relational theory. I know, I know – only a data modeler would claim that normalization can be explained in a “natural way,” but it’s all relative – there are certainly explanations out there that border on the unnatural. Don’t believe me? – do a quick web search on “Boyce-Codd Normal Form” or “Fifth Normal Form” and see if the definitions don’t have your eyes glazing over within seconds, like mine did when I was researching this post. And we’re data management professionals!
As we saw in my last post, 1st, 2nd, and 3rd Normal Forms can be reduced to some simple questions about attributes (multiple values? redundant?) and a “down, up, or out” guideline that works as long as the model is drawn top-down. Now we’ll move up the food chain to 4th and 5th Normal Forms, and how some graphic guidelines can make these easy to understand. While we’re up in that rarefied atmosphere, we’ll also consider the idea that we’ve ended up with more normal forms than we really need.
As usual, I note that if you haven’t read the earlier posts on this topic, this one won’t make much sense.
Explaining normalization
In the last post, I noted that “(very few)… data management professionals can explain normalization in a way that is comprehensible to mere mortals. In fact, they usually can’t even explain it in a way that is incomprehensible.”
Predictably, this assertion was quickly deflated. Two weeks ago, I was teaching Advanced Data Modeling in Bangalore, and I asked if someone could actually explain 2nd Normal Form to me. I was surprised to get very clear answers from some of the interns in the class – much better than I could have done when I was just out of university. Recalling the proverb “Old age and treachery will always triumph over youth and skill,” I immediately moved on and asked for an explanation of 4th or 5th Normal Form. Once again I had the upper hand, because no one could provide one. That isn’t a comment on the skill of the people in the class, though – it’s a comment on how those higher-level normal forms are often explained. Let’s see if we can simplify it, but first, here we are, giving a big “Bangalore Thumbs Up” to data modeling in general:

The Upper Reaches – Getting into 4NF and 5NF
Let’s get right to it – you can only have a violation of 4NF or 5NF is if you have an associative entity with three or more parents, or, in other words, a relationship that brings together three or more other entities. Using the guidelines we’ve learned, that means that any entity with three or more crowsfeet on the top edge is immediately a candidate to see if it violates these higher-order normal forms. Similar to the lower normal forms, you must already have achieved 1st, 2nd, and 3rd Normal Forms before 4th and 5th are a concern.
No problem!
The preceding means that if the diagram below from last month’s post depicted a structure that was in 3NF, there’s no need to check for 4NF and 5NF, because there are no 3-way (or higher) associatives; therefore, those higher normal forms are a non-issue:
.jpg)
Maybe a problem…
On the other hand, there are at least two situations in the following model that require investigation for possible 4NF or 5NF violations. “Associative 1” and “Associative 2” are each three-way associative entities, so we have to check to see if a more accurate way to model the relationship among the three parents is with some combination of two-way associatives. If so, depending on what the final model looks like, the original structures violated either 4NF or 5NF.

Even though “Associative 3” is a two-way associative classified by a reference entity, it will also be worth investigating because it does establish a relationship among three entities.
Note that it doesn’t matter what types of entities are associated – “Associative 2,” for instance, relates a Kernel, a Characteristic, and another Associative. (It might, on the other hand, have a problem with transitivity, but that’s a topic for a future post.)
The really critical point – there’s no way to tell if 4NF or 5NF are violated simply by inspecting the model for higher-order associatives. That will tell you where there might be a problem, but only knowledge of the relevant business rules and constraints can tell you if there is a problem. In other words, business involvement is essential, which is why your data model diagram must be backed up with narrative definitions, rules, and assertions, as well as relevant examples or scenarios to demonstrate the situation. We’ll start on that topic in the next post.
4NF and 5NF – What’s the Difference?
The difference between 4NF and 5NF violations in most practical situations is such a fine line that I don’t worry about it – all really that matters is that a “large” (e.g., three-way) associative is replaced with (or “decomposed into,” if you prefer) two or more “smaller” (e.g., two-way) associatives. Sometimes I correct a situation and I’m not sure if I’ve corrected a 4NF or 5NF problem, but I don’t lose any sleep over it.
We’ll only consider three-way associatives here, but the rule is the same as the numbers get bigger. For instance, a four-way associative violates 4NF or 5NF if it can be replaced with two or more smaller (e.g., three-way or two-way) associatives. Of course, if one of the resulting smaller associatives (e.g., a three-way associative) is still “larger” than it should be you haven’t achieved 4NF or 5NF yet. Like Beethoven, you must continue decomposing.
Schematics to illustrate the cases
Let’s look at some examples that will illustrate the structural (and graphical) differences between the two before plugging in some real examples. We start with three entities, kernels in this case, “A,” “B,” and “C.”
The following might violate 4NF or 5NF because of the three-way associative entity “A-B-C.”

4th Normal Form
If the correct model is the one shown below, the original violated 4NF because two independent associations – “A-B” and “B-C” – had erroneously been combined into one association.

The original “overly large” “A-B-C” association introduced all sorts of problems, such as the fact that we can’t record one relationship, e.g. among instances of “A” and “B,” unless an instance of “C” is also involved. (We might fake the structure out with null values or dummy data, but that’s hardly a solution.) We’ll also incur redundancy because some combinations will be recorded multiple times, e.g. if the same instances of “A” and “B” are related to three different instances of “C,” then the relationship between those “A” and “B” instances will be recorded three times in the three instances of “A-B-C” that will be necessary.
5th Normal Form
If, on the other hand, the correct model is as follows, the original violated 5NF because three independent associations – “A-B”, “B-C”, and “A-C” – had erroneously been combined into one association.

The 5NF violation introduced exactly the same problems as a 4NF violation did, and is conceptually identical – a large associative should have been modeled as multiple, smaller associatives. So what’s the difference? Not much, really, except that in the 5NF case the independent relationships (associations) form a “round trip” via the addition of “A-C.” That is, you can start at “A” then traverse via “A-B” to “B” and then via “B-C” to “C” and ultimately via “A-C” back to your starting point, “A.” For the life of me, I can’t see why this is worth another normal form. Perhaps there’s some technical reason, but in every published example I’ve seen the difference between 4NF and 5NF boiled down to whether or not there was a “cyclical dependency” present. (I think that’s the term I’ve seen.)
Whatever Normal Form
Compounding the confusion, the correct solutions to overly large associatives in practice often look different than the textbook examples. Next month we’ll breathe some life into these schematics by using real examples, but, for now, just consider the “shape” of the two solutions below. Each of them illustrates an extremely common pattern that emerges when correcting a structure (like our “A-B-C” starting point) that exhibits potential 4NF or 5NF problems.

I think both of these indicate that the original three-way associative was a 4NF violation, but I’m really not sure. What do you think?
Your assignment
It wouldn’t seem like a proper post if I didn’t include an assignment. Next month, I’ll provide some examples to illustrate 4 and 5NF, but in the meantime, you should try to create some examples from your experience. After all, the whole point of this series of posts was for you to apply the guidelines at your organization. Come up with a plausible three-way associative, and then illustrate how it violates 4 or 5NF, and how it should be modeled. I look forward to hearing from you via email.
Your assignment, part 2
Worrying about 4NF and 5NF is important in your operational data to ensure high integrity. The needs in your BI environment are different. Part 2 of your assignment is to explain and demonstrate why a typical dimensional model (or star schema) is a case study in 4NF and 5NF violations.
Why be normal?
I remember (barely) a popular bumper sticker from decades past that asked “Why be normal?” In this post, the question is “Why be over-normal?” It seems to me that when I look at 1st through 5th Normal Forms, I only see three really unique situations. Maybe we could reduce the five forms to three, and make them all a little easier to explain. I’ll accept that there are theoretical differences among the five, but practically… not so much. Let’s see if you agree with my reasoning.
First Normal Form
This would remain unchanged from the “old” 1NF in which we say that an entity, table, or record is unnormalized (it’s in UNF or 0NF) if it contains multivalued attributes. Moving those multivalued attributes “down” into characteristic or associative entities, as appropriate, puts the data structure into First Normal Form.
The “new” Second Normal Form
When expressed in terms of attribute dependencies on partial keys, or on non-key attributes, 2NF and 3NF seem to be quite different. In fact, they’re essentially the same – an attribute has been placed in the “wrong” entity (or table or record…) such that a single value of that attribute might be recorded multiple times. In other words, the attribute is redundant. The structure is normalized by moving the attribute into the new or existing entity it is actually a fact about. The difference between 2NF and 3NF is simply whether the attribute was moved “up” the family tree, or moved “sideways.” I’d suggest that because both of these deal with eliminating attribute redundancy, they should be collapsed into the “new” Second Normal Form.
The “new” Third Normal Form
As with 2NF and 3NF, the formal definitions of 4NF and 5NF conceal their essential similarities. As described in this post, both deal only with associative entities that have three or more parents – that is, “higher order” associatives. And in both cases, whether it was 4NF or 5NF that was violated, that “large” associative should have been modeled as two or more “smaller” associatives. Both of these normal forms could safely be collapsed into the “new” Third Normal Form.
The “New Normal” in summary:
- First Normal Form has been achieved when multivalued attributes have been dealt with;
- Second Normal Form has been achieved when redundant attributes have been dealt with;
- Third Normal Form has been achieved when large (three-way or higher) associatives have been dealt with.
This still leaves lots of headroom for academics to head “to infinity and beyond” in their quest for even more normal forms. I’ll wait here on Earth, ready to see if what they come up with can be reduced for consumption by mere mortals.
Next time around
I think that’s almost enough on the topic of how to organize an ERD (top-down by dependency!) and how that makes normalization easier. The next post will wrap this up by providing some real-life examples of solutions to 4NF and 5NF problems, but we’ll also look at a topic I alluded to earlier – the need to “use your words” as well as examples and scenarios to help your subject matter experts help you build the right model. After that, who knows? I’ll probably dip into my grab-bag of topics and see what gets selected – maybe how to review a data model, perhaps some notes on why primary keys are just about the last thing a modeler should worry about, or, better yet, maybe you have a topic you’d like me to address. As always, I’d love to hear from you.
Follow all Expert Blog updates by subscribing to the
RSS feed.
About the Author
Alec Sharp has managed his consulting and education business, Clariteq Systems Consulting Ltd., for close to 30 years. Serving clients from Ireland to India, and Washington to Wellington, Alec has expertise in a rare combination of fields - data management, business analysis, business process improvement, and enterprise architecture.
Why Be Normal? Love it—check out the http://www.erwin.com/whybenormal contest as well.
Nice post Alec. A good, clear description of 4NF and 5NF in “real language”.
October 19, 2011
Once again a GREAT post! I never could articulate normalizaton and relied upon good organizational skills and a severe case of OCD. This makes it very clear. You always here database people say “don’t worry about 4/5NF - that is overly normalized” but that is in a way wrong and shows a lack of understanding. When the A-B-B-C pattern is implemented however you still end up with the original problem - just now you have the “business rules” in place with A-B and B-C. I’ve found sometimes its essential to implement those rules and sometimes not, depending on the situation. Thanks again for this great article - it really helped improve my understanding of normalization!





















January 28, 2011