There’s no escaping data’s role in the cloud, and so it’s crucial that we analyze the cloud’s impact on data modeling.
The fourth post in this series concluded with the observation that more and more managed data lakes are being implemented in the cloud. The second post in this series focused on the Internet of Things (IoT), the constellation of web-connected devices, vehicles, buildings and related sensors and software that are greatly facilitated by the cloud.
But what exactly is “the cloud,” and what challenges and opportunities does it present to data-driven organizations and their enterprise architects and data modelers?
This post will discuss the cloud and the rapidly proliferating SaaS applications hosted therein, as both a source of data to be modeled and as a platform for cloud-based analytic database offerings like Microsoft’s Azure SQL DW and AWS Redshift, among others.
We’ll also touch on the cloud as a component in hybrid solutions that combine public cloud, private cloud, and more traditional on- premise components. Such variety of choices necessitates both a clear understanding and management of the data involved and a well-thought-out data and enterprise architecture.
According to Wikipedia, “Cloud computing is a form of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned with minimal management effort.”
Cloud computing solutions can involve a public cloud that is owned and operated by a third party, a privately owned cloud, or a hybrid implementation, which involves elements in one or more public clouds combined with elements in a private cloud or other on-premise components.
The use cases for cloud computing are myriad and, among many others, can include access to a multi-tenant SaaS application such as Salesforce, hosting of an application database (which can be either an RDBMS or a NoSQL / NewSQL data store), a data lake, an analytic database or provision of distributed infrastructure for IoT services.
A cloud computing approach provides enterprises with potential economies of scale and opportunities to reduce internal infrastructure management workload.
It also provides flexibility in terms of enterprise architecture, and it facilitates connection of a wide variety of data sources, both from inside and outside an organization, to enable business solutions previously impossible or overly complex or costly to deploy.
However, a cloud approach also means the roles of skilled and experienced enterprise and data architects and data modelers are more valuable than ever in providing an organization with a holistic view of its data.
Enterprise architects and data modelers need to be able to model data and understand the relationships and interconnections between data sets in detail, regardless of where the data is located (on premise, in one of the enterprise’s public or private cloud instances, in a SaaS provider’s multi-tenant public cloud, or in a business partner’s private cloud to which they’ve been granted access) and the type of data store (Hadoop, NoSQL, NewSQL, or a more traditional RDBMS).
Data-driven organizations need to effectively logically model data and data relationships in a flexible and robust way to provide flexibility in architecting business solutions. This is because their broader enterprise data context is increasingly likely to consist of some data that arrives or changes in batch mode, some data that arrives asynchronously as messages, and some data that constantly streams.
The interconnected nature of cloud computing solutions makes it more important than ever that a comprehensive, logical model of the enterprise’s data is captured and clearly understood in the proper context as data arrives at different rates from a proliferating number and variety of sources.
So robust data modeling and data governance capabilities and competencies and a focus on detailed business quality metadata, to provide clear naming and definitions of increasingly distributed but interconnected data, remains an organizational imperative for data-driven organizations hoping to capitalize on the promise of cloud computing.
As the spectrum of technologies offered in the cloud continues to mature, a range of capabilities previously only available as part of high-end, on-premise solutions are now becoming available to organizations architecting cloud-based or cloud-interfaced solutions.
In a previous post in this series, we touched on the recent availability of cloud-based MPP databases as an option for managed data lakes. Another capability that has become more prevalent at scale in cloud-architected solutions is “in-memory” data stores. Please join us for the sixth installment of our series, Data Modeling in a Jargon-filled World – In-memory, where we’ll discuss the increasing relevance of in-memory data stores and their impact on enterprise architecture and data modeling.