Posts From This Author
About Our Authors
How to Model XML
By Steve Hoberman on December 20, 2010View Full Bio →
Welcome back. XML is everywhere and often our only knowledge into the structure of applications is through XML interfaces. I asked in a recent design challenge (if you don’t yet receive these challenges, you can sign up here: www.stevehoberman.com/challenges.htm) whether an XML document such as this one is a logical or physical data model:
<recipe name="bread">
<ingredient amount="4" unit="cup">Flour</ingredient>
<ingredient amount="10" unit="tablespoons">Yeast</ingredient>
<ingredient amount="2" unit=" cup">Water</ingredient>
<ingredient amount="1" unit="teaspoon">Salt</ingredient>
</recipe>
The results of this challenge were very interesting (see the results here), and there were two ideas that came out of this discussion that changed the way that I look at an XML document. One idea is distinguishing the XML document itself from its schema, and the other idea is that an XML document gives you half the relationships you would expect from a data model. Let’s talk about each of these ideas.
An XML document like the recipe one above is really an instance of some schema, whether explicitly defined or implied. An XML schema, such as a Document Type Definition (DTD) or XML Schema Document (XSD) specifies the rules for the data in an XML document much the same way as a data model specifies the rules for the data in a database structure. Therefore, without the schema the XML document represents an instance of data like an entity instance from a data model. Philip Kelley, who was one of the folks who responded in our design challenge, made this realization: “Your example is a recipe instance, perhaps how to make bread. This is great for describing how to make bread, but it's a sample—it’s not a template or a design model, in that it doesn't describe all the options, restrictions, and other criteria on how to properly build an appropriate document.”
So it really comes down to type verse instance. When I teach data modeling I often make the analogy that entity instances are like rows in a spreadsheet. Therefore, we can view an XML file also in this spreadsheet format such as this:
Entity: Bread
|
Ingredient Name |
Ingredient Amount |
Ingredient Unit |
|
Flour |
4 |
cup |
|
Yeast |
10 |
tablespoons |
|
Water |
2 |
cup |
|
Salt |
1 |
teaspoon |
It gets more complicated than this, but this is the idea. So reverse engineering an XML document into a data model often leads to quite a bit of guesswork, similar to what the analyst goes through when reverse engineering a legacy application (especially one that is not relational-based) into a data model. I am currently working on an application now where we are studying lots of XML documents (that don’t have defined schemas) and often making guesses and assumptions as to what the implied structure really is.
The second interesting observation around how XML relates to a data model was made by Norman Daoust in this same design challenge, where he made this statement: “An XML schema can generally be translated into a logical data model. However XML documents frequently only indicate the cardinality of relationships on one end of the relationship, not both ends. The example XML document indicates that a recipe is associated with many ingredients, but doesn’t include any indication of whether an ingredient can be associated with more than one recipe.”
This is a very important observation – that is, the XML document gives only half the relationship. Can the child such as an ingredient belong to more than one parent? Maybe yes, maybe no. We need to examine more data, find the XML schema, ask a business expert, use our own experience, or a combination of any of these.
Note: I received the following Warnier-Odd diagram from Michael Silves, who saw this notation as a way to depict XML structures (see his related comment below). Great feedback, Michael. Thanks for the diagram.

Until the next blog!
Follow all Expert Blog updates by subscribing to the
RSS feed.
About the Author
Steve Hoberman is one of the world’s most well-known data modeling gurus. He understands the human side of data modeling and has evangelized “next generation” techniques. Steve taught his first data modeling class in 1992 and has educated more than 10,000 people about data modeling and business intelligence techniques since then.
I think the entity here is Recipe. Bread is just an instance of a recipe.
It seems to me that the old Warnier-Orr diagrams provide a very nice way to graphically portray an xml structure. They work best for hierarchical data structures and provide for sequence, iteration, and selection within hierarchies and hierarchy levels. Just like xml.
I have an example I created with Word but this format doesn’t allow text boxes or graphics so I sent it as an email.
I think the entity here is Recipe. Bread is just one instance of the entity.





















January 6, 2011