Alec Sharp

Top-Down How-To

By Alec Sharp on October 20, 2010
View Full Bio →

In last month’s post I suggested that if you want your data model diagram to be comprehensible to mere mortals, one of the most important things you can do is lay it out so dependency “flows” in a single direction. (Actually, I was more emphatic than just “suggesting” it.) Further, given that most people, at least in Western cultures, expect a concept like “parent-child” dependency to be depicted in a top to bottom fashion (think org chart and family tree) it follows that entity-relationship data models (vs. dimensional models) should be drawn top-down, parent to child. Let’s look at some specific guidelines.

Drawing guidelines – the basics
In the “boxes and lines” diagrams we like to draw, I’ve been told that the single most important visual communicator of information is the X,Y (side-to-side, up-down) position of a box. If time is an element, for instance, we infer that a box placed to the right of another happens later in time. Relative size also matters, but apparently not as much. Our diagrams should take advantage of this, but more than 90% of models I review (closer to 100%) don’t.

In a process workflow model, where dependency is based on time or sequence, I have a simple guideline – “flow lines leave the right edge of a step, and enter the left edge of a step.” That means arrowheads only point to the right. There’s a rare exception in the case of iteration (looping) but, in general, the result is that dependency in a workflow model flows from left to right. Here’s a simplified example:

Without the “left-to-right” guideline, there’s a tendency for diagrams to employ “bottom-to-top,” “side-to-top,” etc. flows, usually in an attempt to compress the diagram. The result is a more compact diagram that is also less useful because it doesn’t convey the true dynamics of the process, such as being excessively sequential. In every single case when I’ve redrawn a model “left-to-right” the stakeholders have found it more informative, sometimes vastly so.

I can’t offer quite such a simple guideline for data models, because there are lines representing both dependent and non-dependent relationships, but we can be almost that simple. Here’s how.

Case 1 – dependent relationships
In any relationship involving dependency, say a Performance Review for a Worker, or an Assignment for a Worker, there is a child and a parent entity, and these relationships will always go up and down. Note that the parent may in turn be a child of (dependent on) some other entity. Anyway, the guideline is that in a dependent relationship, the line connects at the bottom edge of the parent entity and the top edge of the child entity.

Except in very rare cases, dependent relationships are 1:M (one-to-many) relationships, so we can also say that the relationships on the bottom edge of an entity will always have a cardinality of “one,” and the relationships on the top edge will always (almost) have a cardinality of “many.” There will never (absolutely never) be a “many” – a crowsfoot – on the bottom edge of an entity. (Otherwise, you have the “dead crow” I introduced last month.) There will almost never be a “one” – a hashmark – on the top edge of an entity, except if it’s a subtyping relationship.

Case 2 – non-dependent relationships
Relationships which don’t involve dependency – that is, they don’t involve a parent and a child – will always go side to side.

Non-dependent relationships can be either 1:M or M:M. Examples include an associating relationship, such as Worker enrolls in Benefit Program (M:M) or a classifying relationship, such as Benefit Program Type classifies Benefit Program (1:M). M:M (many-to-many) relationships will always connect the side of one entity to the side of another entity (or the same side of the same entity, if the relationship is recursive.) Note that I didn’t provide an example of a 1:1 relationship between entities, because that’s almost invariably an error – either incomplete analysis of the situation, or inadvertently splitting a single entity into two. Your DBA might do that in the physical database design, say, to separate a table into public and secure tables connected by a 1:1, but we’re discussing modeling, not physical design.

In summary:

  • The top edge of an entity is reserved for relationships with one or more parent entities;
  • The bottom edge of an entity is reserved for relationships with one or more child (dependent) entities;
  • The side edges of an entity are reserved for relationships with other entities where there is no dependency.

Note that these side-to-side relationships can be mandatory for one of the participating entities, but that’s a different thing than dependent. For instance a relationship from an entity to a classifying (reference) entity (Benefit Program to Benefit Program Type) will usually be mandatory for the classified entity, but we wouldn’t say that a Benefit Program is a child of a Benefit Program Type.

Drawing guidelines – applied to the four entity types
Now we can apply those guidelines to the four entity types. If you’re reading a data modeling blog, you surely know that even the largest and most complex data models contain only four different types of entities. Sorry to bore you with what you already know, but a quick review will ensure we’re using the same language and concepts. 

Those four are kernel, characteristic, associative, and reference. There are five if you include subtypes, but those are always subtypes of one of the primary four. You might use different names, such as fundamental for kernel, and attributive for characteristic, or any of a host of names I’ve heard for reference entities – classification, type, codeset, lookup, and so on. Whatever the name, we can define the structural role they play, and state some rules (sorry – guidelines) for how they should be drawn. I’ll provide examples, although they’ll be limited given the constraints of working within a blog.

Kernel entities
As Chris Date once said, these are “what it’s all about,” which is why “fundamental” is probably an even better term. (I’m a creature of habit – I learned “kernel” first.) They are independent, and most everything else in the model is more details about the kernels, or about the events, transactions, and relationships among them. Depending on the circumstances, Worker, Organization Unit, and Building could all be kernel entities.

  • Draw these close to the upper edge of the model, or at least in such a way that all their dependent (characteristic and associative) entities are arranged below them.
  • The upper edge of a kernel will never have a relationship attached, because, by definition, they aren’t dependent on anything else.
  • Anything that has independent physical existence in the real world should be drawn as a kernel, even if it seems that they are dependent on something else. For instance, Building might at first appear to be dependent on another entity such as Site or Campus, and Worker might seem to be dependent on Organization Unit, but when the boundaries of a Site or Campus are redrawn (as they eventually will!) and the Building ends up in a different site, or when the Worker gets transferred to a different Organization Unit, it’ll be clear that the relationships weren’t actually dependency-based.

Characteristic entities
A characteristic entity is the child of (is dependent on) a single parent entity, which could be any other type of entity. It depicts repeating (multi-valued) descriptive information about its parent. Performance Review could be a characteristic of Worker, Position could be a characteristic of Organization Unit, and Floor could be a characteristic of Building.

  • Draw these below their parent, at the “many” end of a 1:M relationship. Major entities could have a great many characteristics, so they can’t all be drawn directly below the parent, unless you “stretch” it so it’s very wide. However, they should still be drawn “lower” than their parent.
  • The upper edge of these entities will always have exactly one relationship attached, to whatever other entity they are dependent on. If the cardinality at the attached end of the relationship is anything other than “many,” take a closer look.

Associative entities
An associative entity is dependent on (is the child of) two or more parent entities, which could be any other types of entity. An associative entity depicts a relationship among two or more other entities, and provides a means of recording facts about the relationship. Enrollment could be an associative between Worker and Benefit Program, and Assignment could be an associative between Worker and Organization Unit (or Position.) Both would probably have more explicit names, as other enrollments and assignments likely exist.

  • Draw these between and below their parents, at the “many” end of two or more 1:M relationships. Again, these can’t all be drawn directly between and below their parents, but in any case, they can be drawn “lower” than their parents.
  • The upper edge of these entities will always have at least two relationships attached to whatever other entitles they associate. If the cardinality at the attached end of any of the relationships is anything other than “many,” take a closer look.
  • In a conceptual model, a simple association will start with a M:M, side-to-side relationship. When it is fully resolved, at the logical level, it will become an associative entity with 1:M relationships entering its top edge.

Reference entities
At first blush, reference entities seem to be the most straightforward of the four, but in practice exceptions will arise that might leave you wondering “what type of entity is this?” In most cases, a reference entity standardizes a set of attribute values – it’s the data modeling equivalent of a drop-down list. Most reference entities are independent, although in some cases they are part of a hierarchical classification, so one is an independent reference (e.g., Product Category) and the other is a dependent reference (e.g. Product Type.) (Or does that make it a characteristic?) The relationship from a reference entity to the entity it describes is often called a classifying relationship, so “classifies” / ”is classified as” is the name I most often use.

  • Draw these to the side of, and if possible, slightly higher than the entity they are classifying.
  • The relationship from the reference entity to the classified (or described) entity is most often 1:M, but can certainly be M:M, as when classification changes over time. For instance, a Worker might have one Employment Type at a time, but many over time, so a M:M relationship would be appropriate.
  • Most reference entities are independent, and so will have no relationships attached to their top edge. The exception is the case where a classification hierarchy is involved.
  • Only the most critical reference entities will be shown on a conceptual model, for instance where they serve to connect two otherwise separate parts of the model. Some modelers never show them – they prefer to record “allowable values” for an attribute. I always include them because once they’re visible, they often “grow” into something richer than a simple reference.
  • The original classification scheme didn’t include reference entities, but they were added because otherwise some entities were classified as kernels when they clearly weren’t “fundamental,” while the “true” kernels ended up classified as characteristics. This unnatural situation was resolved by adding a second type of independent entity, the reference, although confusion can still arise.

Advantages
The primary advantage is that models drawn this way are easier to understand, but they are also much easier to inspect for correctness. Many times I’ve redrawn someone else’s model to comply with these guidelines and easily spotted errors that others had missed. For instance, exceptions to the norm stand out like a sore thumb, or “a big red flag,” as one student said.  If you see a “many” (crowsfoot) on the bottom edge of an entity, that’s just plain wrong. And if you see a “one” (hashmark) on the top edge of an entity, it’s a “big red flag” that you should investigate.

Another advantage is that you can inspect side-to-side relationships to see that you have connected them at the lowest reasonable level. For instance, should the Assignment entity connect the Worker at a more granular level, to the Position rather than the Organization Unit? Similarly, should the Organization Unit be assigned to a Building, or at the more granular Floor level? 

Objections, anyone?
The usual objections I hear include”
“But lines might cross!”
“But our model will get really wide and shallow!”
“But our model will get really skinny and deep!”

The first (crossing lines) will almost always be the case, but my advice is… get over it! I often characterize the “no crossed lines” guideline as the “obsessive-compulsive” guideline in that it obsesses over something that doesn’t matter all that much. Visually making dependency clear is more important than avoiding all crossed lines.

Ultimately, when your model grows to 100, 500, or more entities, lines are going to cross, and it will be difficult to maintain strict top-down dependency. But who are these models for? Surely not the business client or other mere mortals, the people I’ve set out to engage. I generally restrict models I show my subject matter experts to somewhere between 15 and 25 entities, sometimes a few more. These are “views” of the full model, usually showing either a subject area or the subset of the model that supports a particular process. C-level executives, administrative staff, and shop floor technicians alike find these models perfectly understandable and relevant.

Constraining these subset models to that size eliminates the problem of very wide or very deep models. That said, even when I apply these guidelines to very large models, the “too wide and shallow” or “too skinny and deep” variations never seem to arise. Perhaps there’s some natural order to these things that keeps the two dimensions in balance.

Pop quiz
Here’s the simple diagram I showed last month – count how many of each of the four entity types are depicted, based on nothing but the visual cues I’ve described. Come back next month for the answer.

Next month
Now that we’ve covered these guidelines for entity types and how to lay them out, we can move on to the original point in this series of posts, which was to demonstrate how much easier it is to explain (and confirm) normalization. That’s what next month’s post will cover. In the meantime, why not take an existing model, redraw it according to these guidelines, and see what you think?

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

Alec Sharp has managed his consulting and education business, Clariteq Systems Consulting Ltd., for close to 30 years. Serving clients from Ireland to India, and Washington to Wellington, Alec has expertise in a rare combination of fields - data management, business analysis, business process improvement, and enterprise architecture.

There have been no comments yet.

Name:

Email:

Comment:

What is missing: North, South, East?

Notify me of follow-up comments?