Alec Sharp

What’s in a name?

By Alec Sharp on April 19, 2010
View Full Bio →

My last two posts touched on the “human side of data modeling,” so it was time to roll up the proverbial sleeves and get into some “real” data modeling issues. I started to write about a favourite topic – generalization – that I thought would interest most everyone. Less-experienced data modelers often generalize poorly, or not at all, while many experienced modelers overdo it, which isn’t much better. “Yes,” I thought to myself, “I have a winning post on my hands.”

 Too bad that’s not where it ended up. While working through an example that involved an associative entity resulting from a recursive M:M relationship, I explained why “Part” would be a better name than “Parts List.” Before I knew it, I was following the “entity naming” thread, and just like pulling a thread from a cheap suit, my original post unraveled. Hence this month’s topic – naming the entities in a data model. Like generalization, this is often handled poorly by both new and experienced modelers.

 We’ll look at three principles for entity naming, followed by a rogue’s gallery of bad (but common!) patterns in entity names. Thanks to my friend Karen Lopez (twitter.com/datachick) for pointing out some of those poor entity names.

What’s in a name?

Entity naming is the most fundamental activity in data modeling because there isn’t anything more basic than establishing what the entities are, and that can only begin with naming the possibilities. How can you decide whether or not “site” is a necessary entity without first giving the concept a name – Site – to discuss? The fact is, you’ll probably have several potential names – Site, Location, Campus, Facility, etc. – but choosing the best is a topic for that post on generalization that I didn’t quite write.

 A quick point of clarification – some would argue that we should be talking about naming entity types as opposed to naming entities. For instance, Writer is an entity type (a kind of thing that we need to keep track of) while Malcolm Chisholm, Steve Hoberman, and Tom Haughey are entity instances (individual occurrences of an entity type.) Strictly speaking, entity type and entity instance are the correct terms, but most data modelers simply refer to entities and instances.

 Somewhere, I’m sure, there is a graduate level Philosophy course on naming things, but we’ll confine ourselves to considering three core principles that data modelers need to keep in mind:

 1 – An entity should be named with a singular noun.

The trick I use to come up with a good name, or to test a suggestion, is to simply ask “what is one of these things?” Not “what is the whole set of them called?” or “what do you intend to produce with them?” (such as a list or report) – just “what is one of them?” The first reason is that the definition you will eventually write for an entity will describe the criteria that a single instance must meet in order to qualify. Consider the entity Employee, which I have seen named Staff, Workforce, or some other collective (not singular) noun. That’s a problem, because if your definition begins with “A Staff is a…” or “A Workforce is a…” then you’re defining the set of instances, not what it takes to qualify as a single instance.

 The second reason arises when you’re creating assertions which need to refer to single instances. For instance, it might be true that “a Staff may be responsible for one or more Dependants,” and “Dependants are the responsibility of one Staff,” but that isn’t nearly specific enough if the actual rule is “a Dependant is the responsibility of one and only one Employee.” Being precise with assertions inevitably uncovers new requirements, for instance that a Dependant can actually be the responsibility of multiple Employees. Here’s another trick – when stating relationship cardinality assertions (always in both directions!) begin the phrase with the word “each,” and emphasize it when you’re reviewing the model with subject matter experts. For instance, “Each Employee may be responsible for one or more Dependants” and “Each Dependent must be the responsibility of only one Employee.” Start with the singular!

 By the way, I know that Employee may not be general enough because of contractors and interns, so Worker might be better, and it’s probably a role played by a Person, but again, those discussions are for the generalization topic. Also, you might persuade me that a table in a relational database should be named in the plural, because the table represents a collection of all known instances. Or, you might not, but it doesn’t matter because we’re talking about data models, not databases.

 2 - That singular noun should be specific, business-oriented, and recognizable.

This is probably self-evident, so I’ll keep it brief:

Specific because a data model is a communication tool, and the more ambiguity we can drive out, the better. Your model might include different sorts of assignments, so Position Assignment and Project Assignment are more specific than just Assignment.

Business-oriented because you’re naming business entities, not trying to follow SQL table-naming standards. Ent-Emp01-Dev doesn’t cut it from a business perspective. Recognizable because the gist of the model should be evident to most audiences without having to read all the definitions. You need to try to use terms that would be familiar in the model’s environment. This is tricky because the common, recognizable terms you have to choose from probably carry some loaded meaning, but you’ll have to try. Just because you’re having trouble distinguishing Program, Project, and Phase doesn’t mean you can avoid the issue by settling on Coordinated Work Effort, a term that no one will recognize.

 3 - The entity name must refer to the essence, not the implementation.

This takes some practice, but always keep in mind that you are naming the thing itself, not an artifact of how it is currently recorded. Remember the analyst’s dictum, what not how. For example, the name Employment Application Form might imply a paper form, but that’s restrictive because the application might be in an XML message, a voice communication, or a telepathic thought transmission. The essence (“what is one of these things?”) is an Employment Application. Note that we seem to have circled back to generalization again…

 Specific names to avoid

There are four specific variations on entity names that we see often but should be discouraged. Using “Customer” as the starting point, the rogue’s gallery is:

1.  Customer Record (or Customer Master or Customer Header or …)

Wrong, wrong, and wrong. The entity is Customer, which represents the customer (“the thing itself”) not a record of the customer. We’re modeling a business, not a database.

2.  Customer Detail

The evil sidekick of Customer Record/Master/ Header. This isn’t specific enough because it doesn’t make clear what sort of detail it represents – legal structure, billing arrangements, operating locations, …?  We must be specific.

3.  Customer Contact List

A single instance is simply a Customer Contact – a list is what you get when you report on multiple instances. Remember the first rule – singular nouns.

4.  Customer History

Everyone hates it when I point this out, because it’s an act of faith for most modelers that an entity that can record previous values should have “history” in the name, but that should be avoided. Why? Well, just as with “List,” whether names, addresses, or purchases are the topic, a history is the set of multiple instances. Worse, it’s misleading, because in a properly normalized model that entity can probably represent past, present, or future values. More accurate (and generalized!) names would be Customer Name, Customer Address, and Customer Purchase.

 Wrapping up

In the past couple of weeks I’ve had very successful consulting engagements with two major corporations where my focus was on business processes. In both cases, a key part of how I helped was to force precision in language by pressing the guidelines I’ve covered in this post. Imagine how helpful they might be if you apply them to data modeling!

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

Alec Sharp has managed his consulting and education business, Clariteq Systems Consulting Ltd., for close to 30 years. Serving clients from Ireland to India, and Washington to Wellington, Alec has expertise in a rare combination of fields - data management, business analysis, business process improvement, and enterprise architecture.

Karen Lopez
April 20, 2010

Great post!  I’d also like to add that most models should avoid terminology that is overly local if the model applies to international data.  For instance, entities such as

- Food Stamp Program (if it includes other such programs)
- State (if it includes other locations such as Provinces, too)

*may* be misleading. 

The localization naming issue really comes into play when we are get to attribute or column names.  I see columns like “dollar amount” when clearly there are going to be all kinds of currency amounts included.

Maybe a blog post is coming on attribute/column names?

Alec Sharp
April 28, 2010

Great to hear from you, Karen, especially since a Twitter conversation with you inspired the post!
Let me add one more, which shows up on so many US web sites / systems and drives people from every other country on the planet crazy:
- Zip Code (which is a local name for a Postal Code, which is what it’s called everywhere else.)

A post on column naming sounds like a great idea - I’ll start collecting notes. Before that, though, I’ll do one on relationship naming, inspired by the 800 entity model I reviewed on which *every* relationship had the same name. Stay tuned for more!  wink

Alec

Name:

Email:

Comment:

The color of grass is usually...?

Notify me of follow-up comments?