David Loshin

Trustworthy Reports vs. Improved Decision-Making: When Do You Blame the Data?

By David Loshin on November 7, 2011
View Full Bio →

The flip side of promoting the benefits of business intelligence highlights the scenarios in which the opposite occurs: in spite of the existence of a data warehouse, there are no measurable improvements in the performance of the business processes or their associated performance metrics. Yet we have been conditioned to believe that business intelligence creates the opportunity for presenting actionable knowledge that can lead to measurable business value. So then what do you do when the data warehouse is not delivering on its promise?

To explore the answer to this question, let me take a brief detour. One of our consulting practice’s service offerings is providing in-house training for best practices in data management, such as data quality, master data management, business intelligence, and data governance. I have incorporated some interactive components to that training, often including a survey with questions about the perceived business value of the topics under discussion, with a variety of buzz-term-ish selections such as “improved customer service,” “increased revenues,” “decreased operational costs,” and “improved IT-business collaboration.”

One interesting phenomenon is the frequency of certain types of answers regarding the use of a data warehouse for BI purposes, and two are of particular interest: “increased trust in generated reports” and “improved decision-making.” And if you read my previous blog post about differentiating semantics for commonly-used concepts, you might not be surprised that in many environments, those two answers are often equated. Namely, that the value of {improved data quality, master data management, or data governance} is to make business users trust BI reports and therefore they will make better decisions. Of course, this is the kind of semantic ambiguity I am secretly hoping for, since it motivates two critical aspects of the discussion.

The first is the question of metrics and measurement. When I am told that there is a desire to improve the level of trust in generated reports, I typically ask what the current level of trust is and what the expectation is for improvement. More to the point, I am asking whether there is an existing set of measures associated with the quality and usability of the data in the reports, what the current measure scores are, and what scores would be considered “acceptable” as far as trust is concerned. But often the discussion clarifies the fact that the reason the business users don’t trust the data warehouse is that the numbers showing up in reports coming out of a data warehouse are inconsistent with each other, or are inconsistent with reports from the originating operational systems.

Often the root cause of these inconsistencies is not the data values themselves, but the semantic concepts in which the values are used. Again, referring back to my previous post: we might use the term “customer” in five different ways, and the corresponding counts make sense in their original contexts. But merging and aggregating “customer” data from the five different sources is probably going to introduce some inconsistency unless everyone has agreed to the use of a specific definition for customer and they retain a level of consistency across all uses. Consistent semantics, collaboration and agreement, monitoring consistency: these are all good ideas that influence both data design/information architecture and oversight of the processes that use the data.

This suggests that there remains a great need for understanding information architecture in the context of data warehousing. This goes beyond collecting source table structure metadata (that is, data element names, sizes, and types). If you want to foster a recognition of “trust” in the data, it begins with influencing the users of that data to consider what they want to be looking at based on the questions they want to be able to answer. Those expectations should be translated into clearly-defined specifications for the data warehouse, and directly integrated into the design and development of the data warehouse model from the bottom up: business term definitions, conceptual data domains, data element concepts, and how those notions are accumulated, represented, and then presented at the data element level as well as related at the entity model level to provide a coherent and consistent view.  And despite claims from some analytics vendors suggesting that your analysis processes can bypass an underlying data warehouse, there is no doubt that the absence of an agreed-to data model for analysis will enable inconsistency and consequently, confusion, to set in.

 The second aspect, which often raises greater questions about data use, is what you could call “causality.” Recall the presumption: we want to improve decision-making by improving level of trust in the data. We want to improve the level of trust in the data through managing consistency of meaning and values. There is a presumption here that individuals making bad decisions can blame their bad decisions on the quality of the data, implying that the bad decisions are a result of the bad data.

But what if decisions are bad because the person making decisions is not good at decision-making? It is usually easy to blame the data, and in the absence of clearly-defined measures of “quality” (and therefore “trust”), a bad decision-maker can defer responsibility for decisions almost indefinitely. So let’s go back to my original question: when do you blame the data? A sound practice is to yet again differentiate between what is measurable and what is inferable. And that suggests some questions that can be used by the data management teams to engage the business users to define quantifiable measures of “trust,” such as:

-          What are the specific characteristics of the reports generated from the data warehouse that are inconsistent with what you expected to see?

-          Are there specific definition differences between what is modeled in the original source systems and the corresponding concepts in the data warehouse?

-          What measures and levels of acceptability constitute a characterization of “trustworthiness”?

Lastly, use the answers to these questions to understand where business terms, definitions, and structural specifications are inconsistent and can influence improvements in underlying information models. But also the answers to truly clarify the subtle differences between trustworthy data, which can be somewhat controlled by the data management teams, and improved decision-making, which is probably beyond the control of the data folks.

Is it a coincidence that both aspects demand good practices for managing the underlying data models? Actually, not really – the distinction between these two aspects is in interpretation. The first aspect is a perception of the values of the data while the second is an expectation of the value of that data’s quality. Both are typically gut-feelings, and instituting standards for the models helps to migrate from unmeasured feelings to measured trust.

Follow all Expert Blog updates by subscribing to the RSS RSS feed.

About the Author

David Loshin, president of Knowledge Integrity, Inc, is a recognized thought leader and expert consultant in the areas of data quality, master data management, and business intelligence.

There have been no comments yet.

Name:

Email:

Comment:

The color of grass is usually...?

Notify me of follow-up comments?