Expert Blogs

Malcolm Chisholm

The Canonical Data Model

By Malcolm Chisholm on March 8, 2010
View Full Bio →

When I started data modeling it never really occurred to me that I was doing anything more than designing databases. Of course, designing databases is an extremely important and valuable activity, but there is more than can be done with data models. Today, the focus of data management, if not IT, is on the integration of data in order to get value from data. This is a massive paradigm shift from the attitude of IT’s mission being to automate business processes by building transaction applications. And everyone is finding it very difficult.

I believe that data models can play an extremely important role in helping to achieve data integration, and it is a role that is rather different to designing physical databases. Oddly, my attention was drawn to this by reading I had to do for a project involving messaging. Middleware may seem an unlikely area to have any involvement with data models, but it really does.

Middleware delivers messages from senders to recipients in near real time. One of the common strategic architecture options for middleware is to build an Enterprise Service Bus (ESB). This is like a common highway in the enterprise along which all messages flow. This is architecturally superior to having many single point-to-point messaging interfaces between applications. Point-to-points are often built by undisciplined programmers who are essentially hackers. They tightly couple the applications and prevent them from evolving as the business changes.

An ESB by contrast is supposed to loosely couple applications, but to do that it must have common message formats. In a point-to-point environment, an entity like Account will have many (perhaps hundreds) of unique incompatible and undecipherable message formats. In an ESB there should be one. But this requires discipline. If the hackers (sorry, “programmers”) run wild they will simply recreate virtual point-to-points within the ESB for each particular linkage between a producer and consumer. The value of the ESB is then largely lost.

What the messaging gurus recommend is to build a canonical data model (CDM). The term “canon” is used in ecclesiastical regulation, and many people think it means something like “blessed”. In fact, “canonical” is derived from the Greek kanon, i.e. “a rule or practical direction”. It shares its root concept with “normalized” , which means “according to rule”. The canonical data model (for messaging) is a set of XML schemas from which all message schemas can be derived.

So how do you get to a canonical data model? The answer given by the messaging gurus is to talk to the data modelers and derive it from the enterprise data model. Thus the major use case for an enterprise data model today is not for instantiating databases, but to facilitate messaging. It is a use case all enterprise data modelers should embrace.

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

Malcolm Chisholm, Ph.D. has over 25 years of experience in enterprise information management and data management and has worked in a wide range of sectors. He specializes in setting up and developing enterprise information management units, master data management, and business rules.

George McGeachie
March 15, 2010

I agree wholeheartedly, Malcolm. A big stumbling block so far is the poor integration with XML Schemas (XSDs) provided by most data modelling tools.  For successful management of XML, these tools must provide a dedicated XML modelling facility, completely integrated with the generation and traceability facilities provided, allowing us to forward-engineer XSDs in a controlled, repeatable fashion.  Most data modelling tools do allow us to generate XSDs, and keep the settings we used to generate them, but keep no record of what we’ve generated and when.  The saved settings allow us to repeat the process, but there is no traceability, nothing we can use to provide impact analysis.
It’s analogous to generating Oracle database schemas directly from Logical Data Models, without using a dedicated Physical Data Model to describe the schema.  If we did that, we’d never be able to tell where our data is in databases, and it would be difficult to tell if the schema has been tinkered with since it was generated.  As Yoda might say, “Back in the Dark Ages, we would be”.
Some UML tools use a special profile to model XML schemas, and one mainstream data modelling tool I know of has a dedicated XML Model, which can be generated from a relational Physical Data Model, and will soon allow XML models to be generated from Conceptual and Logical Data Models.  Similar capabilities are provided by at least one mainstream metadata repository product.  Where are the rest of the data modelling tool vendors?  They’re letting Information Management down.

Manoj
April 18, 2010

You mentioned in your article above that the “So how do you get to a canonical data model? The answer given by the messaging gurus is to talk to the data modelers and derive it from the enterprise data model.”
Can one go the other way. I.E if there is a well defined industry domain specific messaging schema can that be used to build your data model.

George McGeachie
April 19, 2010

Manoj

Since I made my earlier comment, I’m now in almost the position you describe.  We’re going to build our own enterprise LDM, and ‘magically’ map it to the OAGIS message schemas.  That’ll probably mean that we can’t magically create or update the XML schemas, but I would like to manage a single XML model if I can, so we don’t rely on text searching across a multitude of XML Schemas for impact analysis and change control.

Francisco Correia
August 17, 2010

Hi, Malcolm.

I´m the brazilian who buy your books - the third one is on it´s way at the present moment.
Back in 1990, Datamation published (October and November issues) a report written by - believe it ! - Edgar F. Codd on the subject Mastering the Art of Database Fusion. Have you ever read them ? I think they are great stuff on this post subject - data integration, definitions, canonical models.

Name:

Email:

Comment:

Please enter the word you see in the image below:


Notify me of follow-up comments?