Welcome!

Welcome to Data Modeling Made Simple with CA ERwin Data Modeler r8. For many of you, this is your first introduction to data modeling. For others, it might be a refresher, or a guideline on how to use the CA ERwin Data Modeler tool. This web page has examples and template files that accompany the book, so you can use both in tandem to learn data modeling with CA ERwin Data Modeler. Our goals are to (1) make it simple (2) make it informative and (3) have some fun. If you’ve met either one of us in person, you’ll know that we like to keep things lighthearted. If we must work, let’s work hard, but keep it fun. Data modeling is more than a job or a career - it is a mindset, an invaluable process, a healthy addiction, a way of life. Remember to Keep It Simple, and enjoy the ride!

Overview

Data Modeling Made Simple with CA ERwin Data Modeler r8 will provide the business or IT professional with a practical working knowledge of data modeling concepts and best practices, and how to apply these principles with CA ERwin Data Modeler r8. You’ll build many CA ERwin data models along the way, mastering first the fundamentals and later in the book the more advanced features of CA ERwin Data Modeler. This book combines real-world experience and best practices with down to earth advice, humor, and even cartoons to help you master the following ten objectives:

  1. Understand the basics of data modeling and relational theory, and how to apply these skills using CA ERwin Data Modeler
  2. Read a data model of any size and complexity with the same confidence as reading a book
  3. Understand the difference between conceptual, logical, and physical models, and how to effectively build these models using CA ERwin’s Data Modelers Design Layer Architecture
  4. Apply techniques to turn a logical data model into an efficient physical design and vice-versa through forward and reverse engineering, for both ‘top down’ and bottom-up design
  5. Learn how to create reusable domains, naming standards, UDPs, and model templates in CA ERwin Data Modeler to reduce modeling time, improve data quality, and increase enterprise consistency
  6. Share data model information with various audiences using model formatting and layout techniques, reporting, and metadata exchange
  7. Use the new workspace customization features in CA ERwin Data Modeler r8 to create a workflow suited to your own individual needs
  8. Leverage the new Bulk Editing features in CA ERwin Data Modeler r8 for mass metadata updates, as well as import/export with Microsoft Excel
  9. Compare and merge model changes using CA ERwin Data Modelers Complete Compare features
  10. Optimize the organization and layout of your data models through the use of Subject Areas, Diagrams, Display Themes, and more

Section I provides an overview of data modeling:  what it is, and why it is needed. The basic features of CA ERwin Data Modeler are introduced with a simple, easy-to-follow example.

Section II introduces the basic building blocks of a data modeling, and how they are used, including entities, relationships, keys, and more. How-to examples using CA ERwin Data Modeler are provided for each of these building blocks, as well as ‘real world’ scenarios for context.

Section III covers the creation of reusable standards, and their importance in the organization. From standard data modeling constructs such as domains to CA ERwin-specific features such as UDPs, this section covers step-by-step example of how to create these standards in CA ERwin Data Modeling, from creation, to template building, to sharing standards with end users through reporting and queries.

Section IV discusses conceptual, logical, and physical data models, and provides a comprehensive case study using CA ERwin Data Modeler to show the interrelationship between this models using CA ERwin’s Design Layer Architecture. Real world examples are provided from requirements gathering, to working with business sponsors, to the hands-on nitty-gritty details of building conceptual, logical, and physical data models with CA ERwin Data Modeler r8.

From the Foreword by Tom Bilcze, President, CA Technologies Modeling Global User Community:

Data Modeling Made Simple with CA ERwin Data Modeler r8 is an excellent resource for the ERwin community. The data modeling community is a diverse collection of data professionals with many perspectives of data modeling and different levels of skill and experience. Steve Hoberman and Donna Burbank guide newbie modelers through the basics of data modeling and CA ERwin r8. Through the liberal use of illustrations, the inexperienced data modeler is graphically walked through the components of data models and how to create them in CA ERwin r8.

As an experienced data modeler, Steve and Donna give me a handbook for effectively using the new and enhanced features of this release to bring my art form to life. The book delves into advanced modeling topics and techniques by continuing the liberal use of illustrations. It speaks to the importance of a defined data modeling architecture with soundly modeled data to assist the enterprise in understanding of the value of data. It guides me in applying the finishing touches to my data designs.

Chapter 1: What is a data model?

This chapter introduces the concept of a data model using a map analogy, and explains how the data model is such a powerful wayfinding tool for multiple audiences at three different levels of detail (conceptual, logical, and physical). 

Excerpt from this chapter:

"I gave the steering wheel a heavy tap with my hands as I realized that once again, I was completely lost. It was about an hour before dawn, I was driving in France, and an important business meeting awaited me. I spotted a gas station up ahead that appeared to be open. I parked, went inside, and showed the attendant the address of my destination.

I don’t speak French and the attendant didn’t speak English. The attendant did, however, recognize the name of the company I needed to visit. Wanting to help and unable to communicate verbally, the attendant took out a pen and paper. He drew lines for streets, circles for roundabouts along with numbers for exit paths, and rectangles for his gas station and my destination, MFoods...."

Chapter 2: Why do we need a data modeling tool?

This chapter explains the benefits and high level features of a data modeling tool.

Excerpt from this chapter:

"In our example from Chapter 1, we were in a small town in France, where a pencil and paper were enough to draw a simple wayfinding map using symbols and pictures to get the point across. Now imagine if we visited a client in Beijing—a sprawling, metropolitan city that moves at a rapid pace. There is no time to stop at a gas station and have a conversation with the attendant in order to ask directions. With cars and bicycles zipping past us, we need to make a decision quickly regarding which road to take. Unfortunately, the road signs are of little help, since we never studied Chinese in school! ... "

Chapter 3: Getting Started with CA ERwin Data Modeler

This chapter provides an overview of the key features of CA ERwin Data Modeler, along with a step-by-step example of creating your first data model.

Excerpt from this chaper:

"CA ERwin Data Modeler has a long history within the data modeling community, and has evolved to include a wealth of features and functions to support the data management professional. It’s beyond the scope of an introductory book to cover all of the features of the tool. In this book, we’ll provide an overview of the main features of the product that are important to the typical data management professional. We’ll outline them in this chapter, and indicate where we’ll take a deeper look at each feature later in the book. In this chapter, we’ll walk you through a simple example of creating your first data model with CA ERwin Data Modeler, so you’ll be familiar with the basic features before we go into greater detail...."

Chapter 4: What are entities?

This chapter discusses what an entity is, how and why it is used, and how to create an entity in CA ERwin Data Modeler.

Excerpt from this chapter:

"As I walked around the room to see if any students had questions, I noticed someone in the last row had already finished the exercise. I walked over to where she was sitting and looking over her shoulder, noticed only a handful of boxes on the page. The large box in the center contained the word ‘Manufacturing’. I asked her for her definition of ‘Manufacturing’. “Manufacturing is the production process of how we turn raw materials into finished goods. All the manufacturing steps are in this box.”

The data model boxes (also known as ‘entities’), however, are not designed to represent or contain processes. Instead, they represent the concepts that are used by the processes. The Manufacturing entity on her model was eventually transformed into several other entities, including Raw Material, Finished Goods, Machinery, and Production Schedule.

This chapter defines the concept of an entity and discusses the different categories (Who, What, When, Where, Why and How) of entities. Entity instances are also defined. The three different levels of entities, conceptual, logical, and physical, are also explained, as well as the concepts of a weak versus a strong entity. We'll conclude this chapter with an example of how to create entities in CA ERwin Data Modeler."

Chapter 5: What are data elements?

This chapter provides an overview of data elements, as well as hands-on examples of how to work with data elements using CA ERwin Data Modeler.

Excerpt from this chapter:

"Data elements add descriptive ‘flavor’ to the entities in your data model. For example, if we have a Customer data entity, what additional information is needed to track or describe customers: name, age, gender, etc?  This chapter defines the concept of a data element and the three different levels at which a data element can exist: conceptual, logical, and physical. Domains and the different types of domains are also discussed. We’ll conclude the chapter with hands-on examples of how to work with elements and domains in CA ERwin Data Modeler."

Here are the domains for each of the following three data elements.

Email Address

Based upon information from Wikipedia:

An e-mail address is a string of a subset of characters separated into 2 parts by an “@”, a “local-part” and a domain, that is, local-part@domain. The local-part of an e-mail address may be up to 64 characters long and the domain name may have a maximum of 255 characters. However, the maximum length of the entire e-mail address is 254 characters.

The local-part of the e-mail address may use any of these characters:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 through 9
  • Characters ! # $ % & ‘ * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
  • Additionally, quoted-strings (i.e.: “John Doe”@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice.

Gross Sales Amount

A format domain of Decimal(15,4). Both negative and positive numbers are acceptable.

Country Code

As part of the ISO 3166-1993 standard, Country Code is two characters in length, and is a list domain consisting of over 200 values. Here is a partial list:

Code

Definition and Explanation

AD

Andorra

AE

United Arab Emirates

AF

Afghanistan

AG

Antigua & Barbuda

AI

Anguilla

AL

Albania

AM

Armenia

AN

Netherlands Antilles

AO

Angola

AQ

Antarctica

AR

Argentina

AS

American Samoa

AT

Austria

AU

Australia

AW

Aruba

AZ

Azerbaijan

ZM

Zambia

ZR

Zaire

ZW

Zimbabwe

ZZ

Unknown or unspecified country

Just the codes beginning with ‘A’ or ‘Z’ are shown. ‘ZZ’ is an interesting country, and illustrates how easy it is to circumvent a business rule. That is, if we don’t know the country and Country Code is required, we can always assign a ‘ZZ’ for ‘Unknown’. 

Chapter 6: What are relationships?

This chapter provides an overview of relationships, as well as hands-on examples of how to work with relationships using CA ERwin Data Modeler.

Excerpt from this chapter:

"The relationships between data entities are as important as the definitions of entities and elements that we’ve discussed in the previous chapters, and are key to understanding business requirements. This chapter defines rules and relationships and the three different levels at which relationships can exist: conceptual, logical, and physical. Data rules are distinguished from action rules. Cardinality and labels are explained so that you can read any data model as easily as reading a book. Other types of relationships, such as recursive relationships and subtyping are discussed as well. Hands-on examples of how to create the various relationship types in CA ERwin Data Modeler are included..."

EXERCISE: Reading a Model

Recall the model:

Each Employee may process one or many Transactions.

Each Transaction must be processed by one Employee.

Each Transaction may be either an Order, Credit, or Debit.

Each Order is a Transaction.

Each Credit is a Transaction.

Each Debit is a Transaction.

Each Employee may contact one or many Customers.

Each Customer must be contacted by one Employee.

Each Customer may place one or many Orders.

Each Order must be placed by one Customer.

Each Order may contain one or many Order Lines.

Each Order Line must belong to one Order.

Each Product may appear on one or many Order Lines.

Each Order Line must reference one Product.

Each Product may contain one or many Products.

Each Product may belong to one Product.

Chapter 8: Optimizing Model Presentation

This chapter covers Subject Areas, Diagrams, Formatting Themes, and Drawing Objects in CA ERwin Data Modeler, and how you can use these features to enhance the organization and presentation of your models.

Excerpt from this Chapter:

"We’ve just spent several chapters discussing the various technical components that go into building a data model, and learned a bit about the relational theory that help make a data model structurally sound. Although these components are critical to the design of a data model, unless you present the model effectively to your audience, you may not get the buy-in and success you are looking for. Like many things in life, it’s the presentation that matters, and the look and layout of your model can be critical in enhancing your audience’s understanding of the underlying concepts which, ultimately, drives their acceptance of the model and your project in general. This chapter covers Subject Areas, Diagrams, Formatting Themes, and Drawing Objects in CA ERwin Data Modeler, and how you can use these features to enhance the organization and presentation of your models..."

Chapter 7: What are keys?

This chapter provides an overview of relationships, as well as hands-on examples of how to work with relationships using CA ERwin Data Modeler.

Excerpt from this chapter:

"There is a lot of data out there, but how do you sift through it all to find what you’re looking for? That’s where keys come in. Keys allow us to efficiently retrieve data, as well as navigate from one physical table to another. This chapter defines keys and distinguishes between the terms candidate, primary, and alternate keys. Surrogate keys and foreign keys and their importance are also explained. Examples of how to leverage keys in CA ERwin Data Modeler are included..."

There are three terms within this definition that require an explanation: ‘unique’, ‘identifier’, and ‘Customer’.

DOCUMENT UNIQUENESS PROPERTIES

The term ‘unique’ is ambiguous and could easily be interpreted differently by readers of this definition. To maintain clarity and correctness, these questions should be answered within the definition:

  • Are identifier values ever reused?
  • What is the scope of uniqueness?
  • How is the identifier validated?

DOCUMENT THE CHARACTERISTICS OF THE IDENTIFIER

We can describe the actual identifier in more detail including addressing these areas:

  • Purpose. For example, perhaps the identifier is needed because there are multiple source systems for Customer data, each with their own Id. To enable a common set of data to be held about them, this identifier needed to be created to facilitate integration and guarantee uniqueness across all customers.
  • Business or surrogate key. Document whether the identifier is meaningful to the business (i.e. the business or natural key) or whether it is a meaningless integer counter (i.e. the surrogate key).
  • Assignment. Document how a new customer identifier is assigned. The party that is responsible for creating new identifiers should also be mentioned.

DEFINE THE CUSTOMER

Because definitions should stand on their own, we also can define customer within this definition. We can reference the concept’s definition of customer.

Chapter 9: Personalizing Your Workspace

This chapter discusses how to personalize your workspace in CA ERwin Data Modeler to match your own unique workflow and preferences. Topics include customizing the workspace and creating on-demand UI components, modifying toolbars, customizing keyboard shortcuts, and using the Action Log and undo/redo features. 

Excerpt from this Chapter:

"In the last chapter, we discussed how to customize the organization and layout of your models for presenting to the various audiences in your organization. Equally important, however, is the environment that you see and use in your daily workflow of building and maintaining models. CA ERwin Data Modeler has a number of ways to personalize the workspace, toolbar, and workflow options to fit your needs. We’ll cover a number of these in this section... "

Chapter 10: Domains

This chaper covers domains in detail, including the theory behind domain design and usage, as well as practical applications of how to use domains in CA ERwin Data Modeler to help promote enterprise standards and consistency.

Excerpt from this chapter:

"Recall from Chapter 5 that a domain is a complete set of possible values that can be assigned to an element, and that domains can be used for validation criteria. Domains can be used to help create standardized data elements that have common formats, ranges, validation criteria, and even naming conventions.

For example, we may wish to create a reusable domain for the data element first name, shown in Figure 10.1. Once we’ve agreed on a common data type and length, we want to make sure that all attributes use the same criteria. We can also use domains to create naming conventions, and even standardized definitions. For example, you’ll see that all attributes using this domain use the same convention of entity name + domain name: customer first name, employee first name, vendor first name. We can create these customized names and definitions with the help of CA ERwin Data Modeler’s macro language..."

Figure 10.1 Using Domains to Promote Consistency

Chapter 11: Validation Rules

This chapter covers the use of Validation Rules to help enforce the business rules of your organization by establishing a default value, list of values, or a range of acceptable values for an object. Creation of Validation Rules in CA ERwin Data Modeler and their relationship to domains is covered through hands-on examples.

Excerpt from this chapter:

"Validation rules help enforce the business rules of your organization by establishing a default value, list of values, or a range of acceptable values for an object. You can enforce validation rules through a domain, or assign them directly to a table, column, entity, or attribute. Defining validation rules as reusable components helps create consistency across the organization as all groups are able to enforce the same business rules and valid value sets.

The following are some business rules that can be enforced through validation rules:

  • A customer can order coffee in cup sizes of small, medium, or large
  • The age of a contest participant must be between 18 and 65
  • All customers in the loyalty program join at the Silver level before earning enough points to become Gold or Platinum
  • The states in the New England region are: CT, NH, MA, ME, RI, VT."

Chapter 12: Naming Standards

This chapter discusses the usage of Naming Standards to ensure naming consistency across the organization. Specific features related to CA ERwin Data Modeler are covered such as Naming Standards, Validation Rules, Name Hardening, and the Glossary.

Excerpt from this chapter:

"Standardizing the way in which objects are named in your models is important to enforce consistency and reuse across the organization. It is hard enough to produce consistent definitions of core data assets, such as Customer. This task becomes even more difficult when differing naming conventions hide the meaning of objects or make them difficult to locate. For example, there may be objects named Cust, Customer, Cstmr, Client, etc.—do these all have the same meaning, and is there data stored in these objects that should be consolidated?

Enforcing Naming Standards

Naming Standards can be enforced in CA ERwin Data Modeler to help ensure that modelers adhere to a consistent naming convention so that objects are named consistently, helping to promote reuse, increase data quality, and reduce overlap and data redundancy. Tools that assist in the creation of enterprise naming standards in CA ERwin Data Modeler are the Naming Standards Editor and Glossary. Through the Glossary, you can define a list of words and abbreviations that modelers can use to name entities, tables, attributes, columns and domains. Naming Standards can be used in the following scenarios in CA ERwin Data Modeler:

  • Deriving physical names from logical names in a logical /physical model
  • Deriving a physical model from a logical model
  • Deriving a logical model from a physical model
  • Checking naming standard compliance in a logical or physical model

Naming Standards aredeveloped outside of a given data model and stored in a

separate file with the extension .nsm. A naming standards (.nsm) file can then be attached to one or many data models or data model templates so these standards can be enforced across models in the enterprise..."

Chapter 13: Datatype Standards

This chapter covers Data Type Standards to create custom data type mappings from logical to physical models or from one physical platform to another.

Excerpt from this Chapter:

CA ERwin Data Modeler provides default logical-to-physical data type mappings for the major database platforms in the market. If your organization has its own data type mappings that differ from the defaults, you can change these mappings to suit your individual needs by creating a Datatype Standards file, which has the extension of .dsm.

There are two main use cases for creating mappings in a Datatype Standards file:

1. To create mappings between a logical model and one or more physical models

2. To create mappings between physical models, for example, if you are migrating from one database platform to another.

Chapter 14: User-Defined Properties (UDPs)

This chapter discusses User Defined Properties, or UDPs, in CA ERwin Data Modeler, and how they can be used to create custom properties for model objects to store metadata information unique to your environment.

Excerpt from this Chapter:

"A data model contains a wealth of information about an organization’s data, and CA ERwin Data Modeler has a wide range of properties and object types used to describe the details of this data. Since every organization’s needs are unique, however, you may wish to add your own custom properties of the objects in your model. For example, you may wish to track the data steward for a particular entity, the security level of a given table, or attach a specifications document to a model. User-Defined Properties, or UDPs, allow you to create custom properties to add customized metadata documentation to your model."

Chapter 15: Model Templates

This chapter covers Model Templates and how you can use templates to create a set of reusable standards and share them with others across the organization. 

Excerpt from this Chapter:

"Once you’ve created a set of reusable standards in CA ERwin Data Modeler, you’ll want to share them with others across the organization. Model templates are a great way to do this. You can save any model as a template to save its settings and contents for quick reuse. In this way, templates provide the ideal way of maintaining a consistent look and feel across different models, as well as standards for naming, domains, data types, etc. After you save a template, you can use it as the basis for creating new models.

In the past several chapters, we’ve created a number of organizational standards, and we’d like a way to promote the usage of these standards that ensures consistency, but also make it easy for the team to implement. Templates are an ideal way to do this."

Chapter 16: Reporting, Querying, and Editing

This chapter covers the various ways of reporting and querying information in CA ERwin Data Modeler including: SAP Business Objects Crystal Reports, the Query Tool and the ODBC interface, the Bulk Editor, and the metadata bridges.

Excerpt from this Chapter:

"Communicating with the various stakeholders across the organization is a key part of achieving success and buy-in for your data modeling project. After spending time and effort creating the business rules and data structure definitions in CA ERwin Data Modeler, it’s important that this information is shared, in order to achieve the true benefits of reuse and standards. And not everyone needs or wants to read a data model (as surprising as that may be to us!). There are a number of ways to share information using CA ERwin Data Modeler: generating reports from SAP Business Objects Crystal Reports, creating queries into other reporting tools using the ODBC interface, using spreadsheet-style editing with the Bulk Editor, or even exporting metadata directly into other tools using the metadata bridges."

Chapter 17: What are conceptual data models?

This chapter provides an overview of conceptual data models: their definition and use-cases in an organization, and hands-on examples of how to create a conceptual data model. The Baker Cakes example is introduced, which walks through not only the technical steps of creating a conceptual data model, but tips on communicating with your business sponsor as well.

Excerpt from this Chapter:

"Think of the conceptual data model as the roadmap for your organization’s key information concepts. It is a set of symbols and text representing the key concepts and rules binding these key concepts for a specific business or application scope, for a particular audience, that fits neatly on one page[1]. Limiting the conceptual model to one page is important because it forces the modeler and participants to select only key concepts. On one page, for example, we can fit 10 or 20 or 50 concepts, but not 500. A good rule of thumb, therefore, is to ask yourself if the audience for this model would include this concept as one of the top 10 or 20 or 50 concepts in their business. This will rule out concepts that are at too low a level of detail. They will appear in the more detailed logical data model. If you’re having trouble limiting the number of concepts, think about whether or not there are other concepts into which the ones you’re discussing could be grouped. These higher concepts are the ones you should be including in the conceptual data model..."


[1]Don’t get tricky here. By “one page”, we mean 8 ½ x 11, 8 ½ x 14, or similar sized paper, but not a plotter-sized piece of paper!

Chapter 18: What are logical data models?

This chapter provides an overview of logical data models: and overview of their use and function, as well as the relational theory behind important practices such as normalization. The Baker Cakes example is expanded to cover logical data modeling for detailed business requirements and normalization.

Excerpt from this Chapter:

"A logical data model (LDM) is a business solution to a business problem. It is how the modeler captures the business requirements without complicating the model with implementation concerns such as software and hardware.

On the conceptual model, we might learn, for example, what the terms, business rules, and scope would be for a new order entry system. After understanding the requirements for the order entry system, we create a logical model containing all of the data elements and business rules needed to deliver the business solution system. For example, the conceptual model might show that a Customer places many Orders. The logical model would capture all of the details behind Customer and Order, such as the customer’s name, their address, the order number and what is being ordered.

While building the LDM, questions or issues may arise having to do with specific hardware or software such as:

  • How can we retrieve this information in less than 5 seconds?
  • How can we make this information secure?
  • There is a lot of information here. What is the best way to manage storage space?

These questions focus..." 

Chapter 19: What are physical data models?

This chapter covers the theory and practice behind the creation of physical data models including: denormalization, views, indexing, and partitioning including hands-on examples using CA ERwin Data Modeler. Comparison and merging of data models using CA ERwin's Complete Compare feature is also discussed. The Baker Cakes example is used to provide a practical example of physical data modeling.

Excerpt from this Chapter:

"The physical data model (PDM) is the logical data model modified for a specific set of software or hardware. On the CDM, we might learn, for example, what the terms, business rules, and scope would be for a new order entry system. After understanding the need for an order entry system, we create a LDM representing the business solution. It contains all of the data elements and business rules needed to deliver the system. For example, the conceptual model will show that a Customer places many Orders. The LDM will capture all of the details behind Customer and Order, such as the customer’s name, their address, and the order number. After understanding the business solution, we move on to the technical solution and build the PDM.

While building the PDM, we address the issues that have to do with specific hardware or software such as:

  • How can we retrieve this information in fewer than 5 seconds?
  • How can we make this information secure?
  • What is the best way to manage storage space?

Note that in the early days of data modeling..."